From the Document Web to the Agent Web

A Whitepaper of the Open Agent Protocol

Version: 1.0 Status: Public Working Draft Date: May 2026 Authors: OAP Web Integration Working Group

Abstract

The web that exists today was designed for human readers. Its dominant primitives are styled documents, ranked search results, embedded video, and form based interactions that culminate in a manual click. This design choice was correct in 1991 and remained correct for three decades. It is no longer sufficient. A growing share of all web traffic is now generated by autonomous software agents that read on behalf of human principals, and the rate of growth is accelerating. These agents do not see pages. They infer pages. They scrape, they parse, they hallucinate around missing structure, and they operate within a substrate that was never designed for them. This is wasteful, fragile, and dangerous.

This whitepaper argues that the next major version of the web is the Agent Web. The Agent Web is not a replacement for the document web. It is a parallel canonical layer, served from the same origins, that exposes content, services, identity, and commerce in a form that an agent can fetch, validate, and act on without rendering pixels or clicking buttons. We describe the architectural shifts the Agent Web requires, the role the Open Agent Protocol plays in formalizing those shifts, and the migration path that allows the existing web to adopt the new layer incrementally.

1. Introduction

The history of the web can be told as a sequence of three audiences. The first web served researchers exchanging linked documents. The second web served consumers who shopped, banked, read news, and watched video through a browser. The third web is now beginning, and it serves a population that does not have eyes. The members of this population are large language model agents, automated procurement systems, scheduling assistants, research swarms, and the delegation chains that those systems spawn. The third web is the Agent Web, and it has its own requirements.

A useful way to see how different the third audience is from the first two is to consider the asymmetries of attention, perception, and trust between humans and agents. A human reader holds at most a small number of tabs in working memory and reads them sequentially. An agent can issue ten thousand requests per second across a thousand origins in parallel. A human reader perceives a page as styled text and remembered brand. An agent perceives a page as a sequence of bytes that may or may not validate against a known schema. A human reader trusts a familiar brand intuitively and forms judgments slowly through accumulated experience. An agent has no intuition and no accumulated experience of its own. Its trust must be derived from cryptographic evidence on each interaction.

These asymmetries are not pathological. They are properties of a different kind of consumer. The Agent Web is the substrate in which those properties are first class.

2. The Document Web in Detail

To see why the Agent Web is necessary, it is useful to make explicit how the present web encodes its content and how that encoding fails when the consumer is an agent rather than a human.

The present web is a graph of styled HTML documents linked by URLs. Search engines crawl this graph, build an inverted index, and rank documents according to a combination of link topology, click behavior, content quality signals, and revenue alignment. A user who needs a piece of information types a query into the search engine, scans the ten ranked results, and selects one to read. The selected document is rendered by a browser into a layout that combines text, images, video, and interactive widgets. Where the user wants to act, the document offers a form, a button, or a redirect into a third party flow. Identity is maintained through cookies issued at login. Payment is initiated by the user clicking a button that opens a checkout page hosted by a payment processor.

This system has well known strengths. It is open, it is decentralized at the level of publishing, and it has a low barrier to entry for new participants. It also has well known weaknesses, and three of them are decisive for agents.

The first weakness is that the structure of information is hidden inside its presentation. A restaurant menu is a list of products with prices and descriptions, but it is published as a styled HTML fragment whose class names, font sizes, and layout choices vary from one restaurant to the next. To extract a comparable list from twenty restaurants, an agent must scrape twenty distinct presentations, infer the schema of each, normalize the prices into a common currency and unit, and tolerate frequent breakage when any restaurant redesigns its site. The same problem afflicts product catalogs, event listings, real estate offerings, medical provider directories, and almost every other class of structured information that the web hosts. Schema.org and similar microdata initiatives address part of this problem, but adoption is partial and their semantics are weakly enforced.

The second weakness is that discovery is centralized and optimized for advertising. A general purpose search engine returns ranked links sized for a human reader's attention. It does not return the structured information an agent needs to make a decision, such as a quote, a service level objective, a reputation score, or a callable endpoint. An agent that wants this information must either pay for access to a specialized data provider, scrape the search engine in violation of its terms, or build a private index by crawling each candidate origin directly. None of these paths scales when the population of agents and the population of origins both grow into the millions.

The third weakness is that trust is implicit. A human user trusts a brand by recognition. An agent has no recognition. To act safely on behalf of a principal, an agent requires cryptographic evidence of the counterparty's identity, of the counterparty's claimed capabilities, and of the counterparty's commitments under stated policies. The present web provides almost none of this evidence by default. TLS authenticates the transport but not the publisher. OAuth scopes can be inspected at runtime but are not exposed in advance. There is no standardized way for an origin to declare in machine readable form what it can do, what it charges, what its policy commitments are, or what reputation it carries. Each agent implementation must reinvent this layer for each origin it integrates with.

The combined effect of these three weaknesses is that the present web treats agent traffic as a degenerate case of human traffic. Agents are confronted with cookie banners, captchas, anti automation rules, and rate limits calibrated to a single human user. The friction this creates is not incidental. It is the rational response of a substrate that was never designed for the consumer that is now arriving in volume.

3. The Agent Web in Concept

The Agent Web is what the present web becomes when the structure, discovery, and trust layers are made first class for agents while the human surface continues to exist for human readers. Three commitments distinguish the Agent Web from the present web.

The first commitment is canonical structure. Every unit of information that an origin publishes has a canonical machine readable representation served at a stable URL, signed by the publisher, and validated against a schema drawn from a published vocabulary. The styled HTML page that a human visitor sees is a derived projection of this canonical record. Where the projection and the record disagree, the record is authoritative.

The second commitment is enumerable capability. Every origin publishes a manifest that enumerates the actions it offers, the data classes it serves, the schemas that govern its requests and responses, the prices it charges, the policies it commits to, and the identities it accepts as authentication. An agent that fetches the manifest can determine in constant time whether the origin is a candidate for a given task, what it will cost, and under what conditions it will deliver. Discovery becomes a matter of fetching manifests, not of parsing search results.

The third commitment is cryptographic accountability. Every claim that an origin makes about itself, every transaction it executes, and every action it performs on behalf of an agent generates a signed receipt. Receipts compose into audit trails that allow disputes to be resolved by inspection rather than by negotiation. Reputation accrues against signed performance records issued by agents that have transacted with the origin, weighted by stake and aggregated across the delegation tree to resist Sybil inflation. Trust is no longer implicit. It is computed.

These three commitments do not require the abandonment of any existing web technology. HTTP remains the transport. TLS remains the transport security. URLs remain the addresses. HTML remains the human surface. JSON remains the wire format for canonical records. What changes is the addition of a small number of well known documents at well known paths, and the discipline of treating those documents as the canonical truth of an origin.

4. The Anatomy of an Agent Web Origin

A conforming origin in the Agent Web exposes the following layers.

At the identity layer the origin is identified by a Decentralized Identifier resolvable through W3C DID Core methods. The identifier is bound to one or more cryptographic keys that the origin uses to sign its manifest, its knowledge nodes, its receipts, and any reputation records it issues. Where the origin operates under a legal entity, a Verifiable Credential issued by a recognized authority binds the DID to the entity. Pseudonymous origins are permitted but are flagged as such for client policy.

At the manifest layer the origin publishes a Capability Manifest at a well known path that enumerates its actions, data classes, schemas, prices, policies, accepted payment instruments, supported negotiation channels, and conformance levels. The manifest is the single document that an agent fetches first when it encounters a new origin. From the manifest alone the agent can determine whether further interaction is worthwhile.

At the content layer the origin publishes typed Knowledge Nodes for each unit of information it serves. A product is a Knowledge Node. An event is a Knowledge Node. A person, in a context where the person has consented to be discoverable, is a Knowledge Node. Each node carries a stable identifier, a type, a signed payload, and a reference to the human projection where one exists. Modality assets such as images, audio, and video are accompanied by Asset Descriptors that declare their content, provenance under the C2PA Content Credentials specification, and license terms in machine readable form.

At the invocation layer the origin accepts requests against the actions enumerated in its manifest. Each request travels in the standard OAP Request Envelope, signed by the calling agent under its DID, and is answered with a Response Envelope and a Receipt. Errors are structured. Idempotency is supported. Streaming is available where the action is long lived.

At the commercial layer the origin accepts payment for paid actions through one or more of the instruments enumerated in its Wallet document. Pricing is expressed inline in the manifest as fixed amounts, tiered schedules, streaming rates, or quote endpoints. Each paid invocation produces a Wallet Statement entry on the agent's side and a corresponding Receipt on the origin's side. There is no checkout page in the human sense. There is a signed exchange of quote, acceptance, execution, and receipt.

At the policy layer the origin publishes a Policy document that states its commitments on data retention, deletion, breach notification, content moderation, and any jurisdiction specific obligations it accepts. Where applicable, the Policy document includes references to externally signed compliance attestations.

At the accountability layer the origin participates in the Reputation system defined by RFC 0009, exposes its self issued reputation record at a well known path, and accepts externally signed performance records that aggregate against its DID. Where the origin operates Sub Agents on behalf of customers, it MUST aggregate their behavior under the rules of RFC 0011 to resist Sybil inflation.

At the federation layer the origin MAY publish a Discovery document that lists its own Knowledge Nodes and references the manifests of other origins it federates with. Federation is voluntary, transitive only by explicit signature, and audited.

At the traffic control layer the origin publishes an agents.txt document that supersedes the legacy robots.txt mechanism for the regulation of agent traffic. Directives are addressed to agent identities by DID prefix and may specify allow lists, deny lists, rate ceilings, schedule windows, and required conformance levels.

These layers do not require new transport, new cryptographic primitives, or new browser support. They require only that origins publish a small number of documents at agreed paths and that they sign what they publish.

5. Search, Discovery, and the End of the Ranked List

The most visible consequence of the Agent Web for end users is the disappearance of the ranked list as the primary surface of search. A search engine that operates over Agent Web origins does not return ten ranked links. It returns a set of capability matches. Each match is an origin that has declared the ability to perform the requested task, accompanied by the origin's quote for the task, its declared service level objective, its current reputation record, and the policy commitments it accepts. The user, or more often the user's agent, selects from this set under a policy that may favor lowest price, fastest delivery, highest reputation, strongest privacy commitment, or any combination expressible as a constraint.

This shift has three structural implications.

The first implication is that the economics of search separate from the economics of attention. The present search engine sells attention to advertisers because attention is the scarce good in a human consumer market. In an agent consumer market the scarce good is trustworthy capability metadata. The economic model that supports the Agent Web is therefore a market for verified discovery rather than a market for ranked attention. Origins pay to have their capabilities included in registries with strong audit guarantees. Registries compete on the quality of their audits rather than on the volume of clicks they deliver.

The second implication is that monopoly position in search becomes harder to defend. A ranked list is a centralized artifact whose ranking algorithm is a defensible asset. A set of capability matches is a derived view over signed manifests. Multiple registries can return the same matches, agents can verify each underlying manifest independently, and a new registry that offers better audit guarantees can compete on the substance of its work rather than on the volume of training data it has accumulated. This does not eliminate network effects, but it changes which network effects matter.

The third implication is that the role of the brand changes. In the present web the brand is the primary signal of trust. In the Agent Web the primary signal of trust is the reputation record, which is composed of signed performance records issued by agents that have actually transacted with the origin. Brand still matters at the level of the human user who must authorize the agent's policy, but it ceases to be the determining input at the level of the agent's decision.

6. Identity and the Agent Population

An agent population is a directed graph in which principals delegate authority to agents, agents delegate authority to sub agents, and sub agents may delegate further. Each node in this graph carries a Decentralized Identifier and a key pair. Each edge carries a Delegation Token that constrains the scope, duration, budget, and permitted actions of the delegated authority. This structure is described in RFC 0004 and is the substrate on which the Agent Web's identity layer rests.

The interaction between identity and the Agent Web is governed by two principles.

The first principle is proof at every hop. An agent that requests an action from an origin presents not only its own DID and proof of control but also the chain of delegation tokens that links its DID back to a root principal. The origin verifies the chain, evaluates the constraints at each hop, and either grants the request or rejects it with a structured error. There is no implicit trust in any hop of the chain. Each hop is verified or the request fails.

The second principle is aggregation at the root. For the purposes of rate limiting, quota enforcement, and reputation accounting, every action performed anywhere in the delegation tree is attributed to the root principal. An agent cannot escape a rate ceiling by spawning sub agents. A sub agent cannot inflate the reputation of a tool by issuing performance records, because those records are aggregated under the root principal and weighted by sibling decay according to RFC 0011. The Agent Web's identity layer is designed so that the unit of accountability is the principal, not the agent that happens to be on the wire at the moment of the request.

These two principles together make it possible for the Agent Web to support populations of agents that are six or seven orders of magnitude larger than the population of human users without collapsing into spam, fraud, or denial of service.

7. Modality, Media, and the End of Opaque Pixels

A central failure of the present web from an agent's perspective is that images, audio files, and video streams are opaque without recourse to expensive machine learning inference. An agent that wants to know what is in a photograph either runs a vision model over the pixels at meaningful cost, or relies on the alt text and surrounding HTML which are inconsistent and frequently absent. Both options are wasteful when the publisher of the photograph already knows what is in it.

The Agent Web requires that every modality asset be accompanied by an Asset Descriptor at a sibling URL. The descriptor declares the asset's content type, byte length, cryptographic digest, semantic annotations describing the depicted or contained subject matter, provenance information following the C2PA Content Credentials specification, and license terms expressed in machine readable form. An agent that retrieves the descriptor first does not need to invoke a vision or audio model to make a decision about whether to retrieve the asset itself. It can determine in constant time whether the asset is relevant to its task, whether its license permits use, and what fee the publisher charges for that use.

This shift has implications beyond efficiency. It also creates a substrate in which the provenance of media can be cryptographically anchored at the moment of capture, propagated through editing, and verified at the moment of consumption. In an environment where generative models can produce arbitrary realistic media, the ability to verify that a particular image was captured by a particular device at a particular time and was edited through a known sequence of operations becomes essential. The Agent Web does not invent these mechanisms, but it requires their use as a condition of conformance.

8. Commerce and the End of the Checkout Page

In the present web a paid action is initiated by a human clicking a button that opens a checkout page hosted by a payment processor. The page collects card details or directs the user to a third party authentication flow. Where the payer is an agent rather than a human, this flow does not work. There is no human at the keyboard. There is no card. There is no reason for the agent to traverse a human user interface to deliver value that has already been authorized by its principal.

The Agent Web replaces the checkout page with a signed exchange of quote, acceptance, execution, and receipt. The agent fetches a quote endpoint listed in the origin's manifest and receives a signed offer that includes the price, the validity window, and the constraints under which the offer is binding. The agent accepts the offer by signing it under its own DID and presenting the signed acceptance to the origin's invocation endpoint. The origin executes the action, debits the agent's wallet under the rules of the agreed payment instrument, and returns a Receipt that the agent records on its Wallet Statement. The same flow supports streaming payments where the action is metered over time.

This mechanism does not eliminate human checkout. A human user who prefers to pay through a familiar checkout page may continue to do so, and origins may continue to publish those pages. What it does eliminate is the assumption that agent payment must traverse a human flow. It also creates a substrate in which payment frequency can rise by orders of magnitude without proportional increase in cost, because each payment is a signed message rather than a card network transaction.

9. From Shipped Products to Shipped Primitives

Before coding agents reached their current level of competence, every user of a piece of software received the same interface. The provider determined the shape of the experience and the user accepted that shape with marginal customization, because the marginal cost of producing a personalized version exceeded what any individual user could spend. This constraint was not a property of software, it was a property of the labor market for software engineers. With coding agents capable of acting as forward deployed engineers on behalf of ordinary users, that labor market constraint dissolves. The marginal cost of a personalized version approaches the cost of the underlying capability calls, and the provider that continues to ship a finished good competes against a market in which every user can have exactly what they want.

The economic implication is that software companies will increasingly ship shared primitives with the explicit expectation that users will radically customize the final product themselves. The provider's role shifts from designing a single canonical experience to maintaining a set of well behaved capabilities that any user, with the assistance of a coding agent, can compose into a personal experience. This is not a small adjustment of existing practice. It changes the unit of value from the finished application to the underlying primitive, and it changes the locus of design from the provider's product team to the user's own coding agent.

The Agent Web is the substrate that makes this transition tractable. A Capability Manifest is already a declaration of primitives in machine readable form. RFC 0015 makes the shipping of primitives normative, by requiring that every Action be small enough to compose, self contained enough to be understood by a coding agent without external documentation, and substitutable enough that one provider's primitive can be replaced by another's without rewriting the rest of the user's composition. The Composition Manifest defined in that RFC records which primitives a user has assembled, in what order, and with what bindings, and it is owned and signed by the user rather than by any provider. The User Customization Receipt records every change the user's coding agent makes to the composition, with the natural language intent that prompted the change, so that the history of how a personal version of a product evolved is auditable rather than tacit.

The structural consequence for the Agent Web is that two distinct surfaces coexist for every origin. The first is the provider's own canonical surface, hosted as a Surface in the sense of section 4 of this paper, available to users who want the experience the provider designed. The second is the set of personal compositions produced by users who chose to assemble the underlying primitives themselves, executing against the same provider infrastructure but rendering through any user agent or coding agent the user prefers. The provider does not need to choose between these audiences. Both invoke the same Actions, both produce the same Receipts, and both pay through the same commerce models defined in RFC 0013 and generalized in RFC 0014. The difference is which party owns the composition.

The competitive implication for software providers is that the quality of their primitives becomes more important than the quality of their canonical surface, because the canonical surface is increasingly only one of many surfaces through which their primitives are reached. Providers who attempt to preserve relevance by withholding capabilities from the manifest and exposing them only through their hosted surface will find that users route around them through composition with substitutable primitives from other providers, in the same way that walled gardens on the Document Web are gradually routed around by federated and open alternatives. The provider that wins in the long run is the one whose primitives are the smallest, the most reliable, the most cleanly substitutable, and the most aggressively documented for coding agents.

The user implication is that the unit of digital ownership changes. A user who composes their own version of a product owns the composition, in the form of a signed Composition Manifest stored in their own audit log. They can move that composition between providers by substituting individual primitives, they can share it with another user, and they can inherit composition idioms from a community library in the same way that programmers inherit idioms from a standard library. This shifts power toward the user in a way that previous platform shifts have promised but not delivered, because the asset being transferred is not data alone but the structure of the user's interaction with their own software. The asset is portable because the underlying primitives are interchangeable.

This chapter therefore describes both an economic prediction and a protocol commitment. The prediction is that the dominant mode of software distribution will shift from finished applications to composable primitives within the next decade. The commitment is that the Open Agent Protocol will treat that shift as a first class concern, with normative machinery for primitive declaration, composition, substitution, and customization receipts, rather than leaving these patterns to emerge as ad hoc conventions among the largest providers.

10. Migration

Adoption of the Agent Web does not require coordinated cutover. It admits incremental migration in which any origin can publish its Capability Manifest in isolation and gain immediate value from the agents that begin to interact with it through the new layer. The following migration path is recommended.

The first step is to publish a minimal manifest at /.well-known/oap/manifest.json that declares the origin's identity, its currently exposed actions, and the schemas that govern those actions. This step is sufficient to qualify the origin at the W1 conformance level defined in RFC 0012. It allows agents to discover the origin's capabilities without scraping its human surface.

The second step is to publish Knowledge Nodes for the units of information the origin already exposes through its human surface. A retailer publishes its product catalog as typed Knowledge Nodes. A media publisher publishes its articles as typed Knowledge Nodes. A service provider publishes its service catalog and pricing as typed Knowledge Nodes. The origin reaches the W2 conformance level when at least one Knowledge Node exists per data class declared in the manifest.

The third step is to bind the origin's DID to a Verifiable Credential issued by a recognized authority, to participate in at least one public registry, and to publish Asset Descriptors for all modality assets. The origin reaches the W3 conformance level and becomes eligible for inclusion in conformance directories maintained by the OAP.

At each step the origin's existing human surface continues to function unchanged. The Agent Web layer is additive. There is no point at which the origin must choose between human visitors and agent visitors. The two surfaces coexist, with the Agent Web layer providing the canonical truth of which the human surface is a projection.

11. Open Questions

Several questions remain open and are tracked as discussion items in the Working Group on Web Integration.

The first question concerns the canonical vocabulary for Knowledge Nodes. Three candidates are under active consideration. The first is direct adoption of schema.org with OAP specific extensions, which has the advantage of immediate familiarity and the disadvantage of inheriting weak semantic enforcement. The second is a federation of independent vocabularies with cross references through linked data, which has the advantage of decentralization and the disadvantage of fragmentation. The third is a single canonical graph maintained collectively by the protocol's stewards, which has the advantage of consistency and the disadvantage of governance burden. The current draft of RFC 0012 permits all three approaches.

The second question concerns the right balance between human readable and machine readable search engines. Today the same engine serves both populations. In a fully realized Agent Web the two functions may separate, with one engine optimized for human ranked attention and another for machine capability matching. Whether the same operator should serve both functions, and how to prevent the discovery layer from collapsing back into the attention economy, is an open economic and regulatory question.

The third question concerns the interaction between the Agent Web and intellectual property law. A Knowledge Node that exposes a publisher's content in canonical form is easier for agents to ingest, summarize, and reuse than the same content rendered in a styled page. The machine readable license terms in the Agent Web's Modality Asset Descriptors and Knowledge Node payloads are intended to make publisher consent explicit and enforceable, but the underlying legal questions about training data, derivative works, and aggregation rights are unresolved and will require both technical and legal work.

The fourth question concerns governance of the well known paths defined in RFC 0012. Reservation of paths under /.well-known/ historically passes through IETF processes. The OAP is engaged with the IETF on a Provisional Registration for the paths in this specification, but the timeline of formal registration may lag the timeline of voluntary adoption.

12. Conclusion

The web has been overdue for an upgrade for some time. The arrival of autonomous agents at scale forces the upgrade. The choice the web's publishers face is not whether agents will read their content, since agents are already reading it through scraping and inference. The choice is whether agents will read the canonical truth of an origin, signed by the publisher, or whether they will read whatever they can reconstruct from rendered pages that were never designed for them.

The Agent Web is the canonical alternative. It does not require that the document web be abandoned. It requires that origins publish a small number of well known documents, sign what they publish, and accept the discipline that the canonical record is authoritative where it disagrees with the rendered page. In return, origins receive a substrate in which agent traffic is structured rather than parasitic, in which discovery is verifiable rather than ranked, in which payment is signed rather than clicked, and in which trust is computed rather than assumed.

The Open Agent Protocol publishes the Agent Web specification under permissive licensing because the value of the Agent Web is proportional to the breadth of its adoption. We invite publishers, registries, agent implementers, and standards bodies to participate in its development and to contribute the operational experience that will refine it into the substrate the next phase of the web requires.

References

OAP-CORE-1.0. The Open Agent Protocol Core Specification.

RFC 0004: Sub Agent Delegation.

RFC 0005: Canonical Entity Schemas.

RFC 0007: Privacy Preserving Projections.

RFC 0009: Reputation and Performance Records.

RFC 0011: Sybil Resistance and Sub Agent Anti Abuse.

RFC 0012: The Agent Native Web.

W3C Decentralized Identifiers (DIDs) v1.0. World Wide Web Consortium, 2022.

W3C Verifiable Credentials Data Model v2.0. World Wide Web Consortium, 2025.

C2PA Technical Specification. Coalition for Content Provenance and Authenticity, 2024.

IETF RFC 8615: Well Known Uniform Resource Identifiers. Internet Engineering Task Force, 2019.

License

This whitepaper is published under the Creative Commons Attribution 4.0 International License. The accompanying schemas and reference implementations are published under the Apache License 2.0.