OAP

The Safety and Policy Stack

A Whitepaper of the Open Agent Protocol

Version: 1.0 Status: Public Working Draft Date: May 2026 Authors: T. Fengler (Editor) Working Groups: Privacy and Governance WG and Confidentiality and Compliance Context WG

Abstract

An autonomous agent that acts on a principal's behalf must respect at least four distinct sources of policy. The first is the platform itself, which has hard limits beneath which no agent may operate regardless of its principal's preferences. The second is the organization that employs or otherwise constrains the principal, which has compliance obligations that the principal cannot waive unilaterally. The third is the scope under which the principal is currently acting, which expresses the contextual norms of a particular role or persona. The fourth is the principal's own preferences, which apply where the higher layers permit them to apply. A safety architecture that conflates these sources, or that omits any of them, will fail in production at the moment that the layers come into conflict. The Open Agent Protocol responds with an explicit four layer policy stack in which the layers are evaluated in a fixed order, in which decisions are recorded with structured explanations, and in which conflicts between layers are resolved by precedence rules that no implementation may override. This whitepaper sets out the design of the policy stack, examines its behaviour under representative conflicts, and demonstrates how it composes with major regulatory regimes including the European Union Artificial Intelligence Act, the General Data Protection Regulation, the Digital Services Act, and the relevant safety standards for high stakes domains.

1. The Inadequacy of Single Layer Policy Engines

The dominant policy engine in contemporary agent platforms is a single layer of guardrails defined by the platform operator. The guardrails enforce the platform's terms of service, refuse the small set of categorically prohibited content, and otherwise leave the agent free to act on whatever instructions arrive from the principal. The model is sufficient for a hosted assistant whose principal is the only party with relevant interests in the assistant's behaviour. It is insufficient for any agent that operates on behalf of a principal who is in turn embedded in an organization, in a profession, in a regulatory regime, or in a contractual relationship that constrains what the principal may permissibly do.

The insufficiency is not theoretical. A medical professional who instructs an agent to summarize a patient encounter is constrained by patient confidentiality obligations whose scope the platform operator does not know. A trading desk professional who instructs an agent to research a competing security is constrained by information barrier obligations whose scope the platform operator does not know. A government employee who instructs an agent to draft a public communication is constrained by the procedures of the employing agency, which the platform operator does not know. The platform operator's guardrails cannot enforce constraints that the platform operator cannot see. The constraints must be expressed at a layer that both the principal and the principal's institutional context can populate, and the policy engine must evaluate the constraints in the order in which they bind.

2. The Four Layers in Order of Precedence

The Open Agent Protocol defines four policy layers and evaluates them in a fixed order from the most general to the most specific. The order is normative. No implementation may evaluate the layers in a different order, and no layer may override a refusal issued by a higher precedence layer.

The first layer is Platform Rules. Platform Rules are the categorical limits that bind every agent on every platform. They include the absolute prohibitions on the production of child sexual abuse material, on the production of weapons of mass destruction guidance at the level of operational uplift, on the autonomous selection or engagement of human targets in the kinetic domain without verified meaningful human control in the sense of Article 36 of the UN Convention on Certain Conventional Weapons (Walsh 2015, 2017; Asilomar AI Principles 2017, Principle 18), on the targeting of critical infrastructure, on the impersonation of natural persons without their consent, and on the small set of analogous prohibitions whose impermissibility is not a function of any principal's preference. Platform Rules also include the safety floors that apply to vulnerable populations, including the prohibition on the engagement of minors in romantic or sexually explicit interaction. Platform Rules are evaluated first because no other layer may permit what they refuse.

The second layer is Organizational Policy. Organizational Policy is the set of constraints expressed by the organization within whose authority the principal currently operates. It includes the categories of action that the organization has prohibited, the external parties with whom the organization has prohibited interaction, the data classes that the organization has classified as confidential, the cross border transfer constraints that the organization has imposed, and the approval workflows that the organization has required for actions exceeding declared thresholds. Organizational Policy is evaluated after Platform Rules and before Scope Policy because the organization's compliance obligations are constraints on the principal that the principal cannot waive unilaterally.

The third layer is Scope Policy. Scope Policy is the set of constraints attached to the particular Scope under which the principal is currently acting. A principal who operates as a medical professional during business hours operates under a Scope whose policy reflects the obligations of medical practice. The same principal who operates as a private individual in the evening operates under a different Scope whose policy reflects the norms of private life. Scope Policy is evaluated after Organizational Policy because it expresses contextual constraints that apply within the space the higher layers have left open.

The fourth layer is Personal Preference. Personal Preference is the set of constraints that the principal has expressed for itself, and it draws on the Standing Permissions framework of RFC 0003 for the granular consent grants that the principal has previously authorized. It includes the principal's preferences about communication style, the principal's lists of trusted and distrusted parties, the principal's preferences about timing and channel, and the principal's overrides on default behaviours that the higher layers have not constrained. Personal Preference is evaluated last because it is the layer at which the principal expresses individual choice within the space that the institutional layers have permitted.

Within the Personal Preference layer, principals MAY declare advisory soft preferences in the sense of Bistarelli, Montanari, and Rossi (1997), Rossi, Venable, and Walsh (2011), and Loreggia, Mattei, Rossi, and Venable (2018). Soft preferences are weighted advisories over candidates that the four hard layers have already permitted; they do not refuse, and they cannot promote a refused candidate. When the Policy Stack permits multiple equivalent execution paths, the Agent SHOULD rank candidates by their alignment with the declared soft preferences, recording the contribution in the Decision Record. The construct is the protocol level realization of ethically bounded AI in the sense of Rossi and Mattei (2019): the hard policy layers bound what is permissible, and the soft preference layer expresses what the principal values within that bound. The schema is defined in RFC 0003 §3.2.

The hard-soft decomposition mirrors the distinction Floridi draws between hard ethics, which is compliance with law and categorical prohibitions, and soft ethics, which is what one ought to do beyond compliance (Floridi 2018, 2019). The four hard policy layers of this paper are the protocol level realization of hard ethics: they refuse what no agent may do regardless of any principal's preference. The soft preferences of the Personal Preference layer, the Reputation incentives of RFC 0009, and the Organization Norms of RFC 0030 are the substrate on which soft ethics is enacted. The protocol does not legislate soft ethics, because the principal level commitments that constitute soft ethics cannot be discharged by a substrate; the protocol provides the conditions under which they can be discharged, recorded, and audited.

Declared preferences are not the principal's true preferences. They are the principal's best current articulation of preferences whose full extension is not available to the principal in advance and not available to the Agent ever. The protocol therefore treats every Personal Preference declaration as an incomplete and potentially misspecified proxy for the principal's underlying objectives, in the sense of the Three Principles of beneficial machines articulated by Russell (2019): the Agent's only objective is to satisfy the principal's preferences, the Agent is initially uncertain about what those preferences are, and the ultimate source of evidence for those preferences is the principal's behavior. The Agent therefore MUST treat principal overrides, refusals, corrections, and post-hoc adjustments as Bayesian evidence about the underlying preference, not as noise to be filtered. Personal Preference is the declared component of an assistance game in the sense of Hadfield-Menell, Russell, Abbeel, and Dragan (2016): the Agent and the principal jointly optimize a reward known to the principal but only partially observable to the Agent, and the Agent's calibrated uncertainty about that reward is what makes the Agent corrigible in the sense of Dragan, Abbeel, and Russell (2017). An Agent that is certain it understands the principal's preferences has no incentive to defer to a shutdown or override; an Agent that is uncertain has positive incentive to do so. The protocol therefore treats preference uncertainty as a structural requirement, not a defect to be eliminated, and operationalizes it through the proactive escalation mechanism of RFC 0028 §3.5, the corrigibility declaration of RFC 0028 §3.5.1, and the cooling-off windows of RFC 0017.

3. Decision Records and the Right to Explanation

Every evaluation of the policy stack produces a Decision Record that becomes part of the Receipt for the relevant action. The Decision Record enumerates the layers that were evaluated, identifies the rules within each layer that were triggered, records the outcome of each rule, and records the final decision that emerged from the composition of the rule outcomes. The Decision Record also includes a natural language explanation that summarizes the reasoning in terms that the principal can understand.

The Decision Record is the protocol level mechanism for honoring the right to explanation that several major regulatory regimes have established. Under the European Union Artificial Intelligence Act the Decision Record provides the structured trace from which a regulator can determine whether a high risk system's decision was made on permissible grounds. Under the General Data Protection Regulation the Decision Record provides the principal with the meaningful information about the logic of automated decisions that the regulation requires. Under the Digital Services Act the Decision Record provides the user notice that intermediaries must furnish when content is moderated. The protocol's contribution is to make these obligations machine readable and uniform rather than leaving each operator to invent its own format.

The natural language explanation in the Decision Record is required to be honest in a particular sense. It must describe the actual rules that were evaluated, not a post hoc rationalization. It must use the same identifiers for rules and layers that appear in the structured fields of the Decision Record so that a verifier can confirm that the explanation matches the evaluation. It must be written in a register that a non technical principal can understand. The honesty requirement is not an aspiration. It is enforced by the conformance test suite, which exercises representative scenarios and verifies that the explanation produced corresponds to the structured trace.

4. Conflict Resolution Between Layers

The interesting case for any layered policy system is the case in which the layers conflict. The Open Agent Protocol resolves conflicts by the precedence rule that a higher precedence layer's refusal cannot be overridden by a lower precedence layer's permission, while a higher precedence layer's permission may be narrowed but not broadened by a lower precedence layer's restriction. The rule has three operational consequences that are worth making explicit.

The first consequence is that an organization cannot use Organizational Policy to permit what Platform Rules refuse. An organization that wishes to deploy an agent that produces categorically prohibited content cannot achieve that result by writing an Organizational Policy that overrides the prohibition. The Platform Rules layer is evaluated first and refuses the action regardless of what the lower layers say. The protocol is not neutral on the categorical prohibitions, and it is not negotiable on them.

The second consequence is that a principal cannot use Personal Preference to permit what Organizational Policy refuses. A medical professional who wishes to share a patient identifier with a friend cannot achieve that result by writing a Personal Preference that permits the sharing. The Organizational Policy of the employing institution refuses the action, and the principal's preference is evaluated only within the space the institution has left open. The protocol is the technical embodiment of the principle that institutional obligations bind the individuals who have assumed them.

The third consequence is that any layer may add restrictions that the higher layers have not imposed. An organization may forbid actions that Platform Rules permit. A Scope may forbid actions that the Organizational Policy permits. A principal may decline to take actions that Scope Policy permits. The composition is monotonic in the direction of restriction, and it is the principle by which the protocol respects the autonomy of each layer to impose constraints within its own domain.

5. Multi Party Review for High Stakes Actions

Some actions are sufficiently consequential that the policy stack alone is not a sufficient safeguard. Examples include the transfer of funds above a declared threshold, the deletion of records subject to legal hold, the publication of material non public information, the issuance of binding commitments on behalf of a regulated entity, and the operation of safety critical physical equipment. The Open Agent Protocol responds to such actions with the Multi Party Review mechanism, which requires the concurrent authorization of two or more independent principals before the action is permitted to execute. Multi Party Review composes with the cooling off periods defined in RFC 0017 for actions whose consequences are difficult to reverse, and with the escalation action of RFC 0018 which guarantees that a human path remains available at every stage of the review.

The Multi Party Review mechanism is configured at the policy layer. Organizational Policy may declare that a category of actions requires Multi Party Review. The configuration identifies the principals whose authorization is required, the threshold of authorizations that must be obtained, and the time window within which the authorizations must be assembled. An action that triggers Multi Party Review is held in a pending state, the required principals are notified through their registered consent channels, and the action proceeds only when the threshold is met. Each authorization produces a Receipt that becomes part of the action's Decision Record, which means that the authorization chain is auditable in the same way that any other action is.

The coexistence of Standing Permissions (RFC 0003) and Multi Party Review is the protocol level realization of the dual-process distinction articulated by Booch, Mattei, Rossi, and colleagues (2021). Standing Permissions are the fast pathway: pre-authorized, low-latency, suitable for actions whose risk profile the principal has already evaluated. Multi Party Review is the slow pathway: deliberative, multi-principal, suitable for actions whose stakes warrant fresh consideration. The four layer policy stack routes each action to the appropriate pathway, and the Decision Record captures which pathway was used so that the choice itself is auditable.

The mechanism is designed to compose with the existing approval workflows of regulated entities rather than to replace them. A bank that requires four eyes approval for outbound payments above a threshold can express that requirement as Multi Party Review and obtain the same result with cryptographic evidence that the existing workflow does not produce. A hospital that requires attending physician approval for the prescription of controlled substances can express that requirement similarly. The protocol does not invent the workflow. It provides the substrate on which the workflow is encoded and enforced.

6. Composition with Major Regulatory Regimes

The policy stack and its supporting mechanisms compose with the major regulatory regimes under which agents will operate. The composition is by design rather than by accident, and it deserves explicit treatment.

The European Union Artificial Intelligence Act establishes a risk based framework in which high risk systems are subject to obligations of risk management, data governance, technical documentation, record keeping, transparency, human oversight, accuracy, robustness, and cybersecurity. The policy stack and the Receipt chain together satisfy the record keeping and transparency obligations. The Decision Records satisfy the requirement for meaningful information about the logic of decisions. The Multi Party Review mechanism satisfies the human oversight requirements for the high stakes subset of high risk actions. The Manifest declaration of supported actions and pricing satisfies the technical documentation requirement at the integration boundary.

The General Data Protection Regulation establishes the rights of access, rectification, erasure, and data portability for natural persons whose personal data is processed. The Receipt chain provides the substrate on which these rights are exercised. The data export endpoint provides the substrate for the right of access and the right to portability. The data deletion endpoint provides the substrate for the right to erasure. The provenance tags that travel with personal data provide the audit trail that the right of access requires. The Decision Records that record the lawful basis for each processing event provide the substrate for the demonstrability obligation under Article five.

The Digital Services Act establishes the obligations of intermediary services with respect to illegal content, transparency, and user notice. The Receipt chain and the Decision Records together satisfy the user notice obligations. The Manifest declaration of moderation rules satisfies the transparency obligation. The Receipt anchored content provenance satisfies the obligation to maintain identification of the persons or entities responsible for the content.

The Markets in Crypto Assets regime applies to Wallet operators that hold crypto denominated balances. The protocol's posture that Wallet operators must comply with the regime in their jurisdiction, the Receipt anchored settlement record, and the export portability of the Wallet ledger together satisfy the conduct of business obligations that the regime establishes.

The composition is not exhaustive of the regulatory regimes that will eventually apply, but it demonstrates the pattern. The protocol does not invent its own regulatory regime. It provides the technical substrate on which the existing regulatory regimes can be honored uniformly across all conformant implementations.

7. Conclusion

A safety architecture for autonomous agents must respect the layered structure of the obligations that bind the agents' principals. The Open Agent Protocol responds with a four layer policy stack whose order of evaluation is normative, whose decisions are recorded with structured explanations, whose conflicts are resolved by precedence rules that no implementation may override, and whose composition with major regulatory regimes is engineered rather than accidental. The Multi Party Review mechanism extends the stack to the high stakes actions for which a single principal's authorization is insufficient. The result is a safety architecture that is simultaneously stricter than the contemporary single layer guardrail model and more respectful of the autonomy of each layer to impose constraints within its own domain. It is the architecture appropriate to the deployment of autonomous agents into the institutional settings where they will live for the next decade.

References

OAP-CORE-1.0. The Open Agent Protocol Core Specification.

RFC 0003: Standing Permissions. Defines the consent grants that Personal Preference draws on.

RFC 0006: Persona and Scope Layer. Defines the Scope to which Scope Policy attaches.

RFC 0007: Privacy Preserving Projections. Defines the projections that the policy stack composes with at the data layer.

RFC 0016: User Sovereignty Charter. Defines the principles that bind the Personal Preference layer to non-negotiable user rights.

RFC 0017: Irreversibility and Cooling Off Periods. Defines the temporal safeguards that compose with Multi Party Review for high-stakes actions.

RFC 0018: The Right to a Human Path. Defines the escalation action that the policy stack must always preserve.

RFC 0030: Agent Organizations, Roles, Scenes, and Norms. Lifts the Org Policy layer of this paper from a flat constraint set to a structured organizational model with deontic norms over role-scene pairs, and formalizes the Responsibility component of the ART principles in the sense of Dignum (2017, 2019).

Dignum, V. (2017). Responsible Autonomy. IJCAI. The conceptual basis for treating Accountability, Responsibility, and Transparency (ART) as the irreducible obligations of any deployed agent system. The Org Policy layer of this paper realizes the structural component of Responsible Autonomy at the protocol level; RFC 0030 supplies the role-scene-norm formalism on which it is built.

Dignum, V. (2019). Responsible Artificial Intelligence: How to Develop and Use AI in a Responsible Way. Springer. The book-length treatment of the same framework, including the EU High-Level Expert Group's seven requirements for Trustworthy AI, mapped to OAP artifacts in RFC 0028 Annex B.

Bistarelli, S., Montanari, U., Rossi, F. (1997). Semiring-based Constraint Satisfaction and Optimization. Journal of the ACM 44(2). Foundational treatment of soft constraints that grounds the soft preference semantics of RFC 0003 §3.2.

Rossi, F., Venable, K. B., Walsh, T. (2011). A Short Introduction to Preferences: Between Artificial Intelligence and Social Choice. Morgan & Claypool. The canonical bridge between preference representation and aggregation, on which the Personal Preference layer's soft preference advisories rest.

Loreggia, A., Mattei, N., Rossi, F., Venable, K. B. (2018). Preferences and Ethical Principles in Decision Making. AAAI/ACM Conference on AI, Ethics, and Society (AIES). The framework that motivates treating ethical priorities as soft constraints over otherwise permitted alternatives, realized in OAP as the soft preference advisories within Personal Preference.

Rossi, F., Mattei, N. (2019). Building Ethically Bounded AI. AAAI. The conceptual statement that hard policy bounds permissibility and soft preferences express values within that bound, mirrored by the four hard layers and the optional soft preference advisories of this paper.

Booch, G., Fabiano, G., Horesh, L., Kate, K., Lenchner, J., Linck, M., Loreggia, A., Murugesan, S., Mattei, N., Rossi, F., Srivastava, B. (2021). Thinking Fast and Slow in AI. AAAI. The dual-process framing that underlies the OAP architectural distinction between Standing Permissions (the fast pathway) and Multi Party Review (the slow, deliberative pathway).

Walsh, T. et al. (2015). Autonomous Weapons: an Open Letter from AI and Robotics Researchers. Future of Life Institute. Walsh, T. et al. (2017). An Open Letter to the United Nations Convention on Certain Conventional Weapons. Together with Walsh, T. (2018). Machines That Think, and Walsh, T. (2022). Machines Behaving Badly, these works articulate the doctrine of meaningful human control over the use of force that grounds the Lethal Autonomous Weapon Systems prohibition in the Platform Rules layer of this paper.

Future of Life Institute (2017). Asilomar AI Principles. Principle 18 (AI Arms Race) is the consensus statement of the AI research community against an arms race in lethal autonomous weapons; the Platform Rules layer of this paper is the protocol-level realization of that principle.

Russell, S. J. (2019). Human Compatible: Artificial Intelligence and the Problem of Control. Viking. The Three Principles of beneficial machines (the agent's only objective is to satisfy human preferences, the agent is initially uncertain about them, and human behavior is the ultimate source of evidence) that ground the Personal Preference layer's treatment of declared preferences as incomplete proxies and the corrigibility requirement of RFC 0028 §3.5.1.

Hadfield-Menell, D., Russell, S. J., Abbeel, P., Dragan, A. (2016). Cooperative Inverse Reinforcement Learning. Advances in Neural Information Processing Systems 29 (NeurIPS). The assistance-game formalism in which the Agent solves a partially observable cooperative game whose reward is known to the principal but only partially observable to the Agent, the formal substrate of the Personal Preference layer's preference-uncertainty mandate.

Dragan, A., Abbeel, P., Russell, S. J. (2017). The Off-Switch Game. AIIDE. The result that an Agent with calibrated uncertainty over the principal's reward has positive incentive to allow itself to be switched off, the formal justification for the corrigibility declaration of RFC 0028 §3.5.1.

Bengio, Y., Hinton, G., Russell, S., et al. (2024). Managing AI Risks in an Era of Rapid Progress. Science. Calls for independent audits, red-teaming, dangerous-capability evaluations, and scalable oversight at the level of frontier AI deployment; the four-layer policy stack of this paper composes with the Frontier Capability Evaluation of RFC 0028 §3.5.3 and the Multi-Party Review of section 5 to provide the protocol-level instantiation of the framework.

Bengio, Y., et al. (2024 interim, 2025 full). International AI Safety Report. The three-category risk taxonomy (malicious use, malfunction including loss of control, systemic risk) under which Platform Rules instantiate the malicious-use prohibitions, RFC 0028 §3.3 instantiates the malfunction-detection requirement through drift detection and backtesting, and RFC 0028 §3.5.1 instantiates the loss-of-control mitigation through corrigibility under preference uncertainty.

Bengio, Y. (2024). Towards Scientist AI: Considerations for Governance. The non-agentic AI proposal under which an Agent produces probabilistic predictions and explanations rather than executing actions; the Advisory-Only Mode of RFC 0028 §3.5.2 is the protocol-level realization, and is the recommended mode for high-stakes domains in which the principal wishes to retain enactment authority.

Floridi, L. (2018). Soft ethics, the governance of the digital and the General Data Protection Regulation. Philosophical Transactions of the Royal Society A 376. Establishes the distinction between hard ethics (compliance with law and categorical prohibitions) and soft ethics (what one ought to do beyond compliance) that grounds the architectural separation in this paper between the four hard policy layers and the optional soft preference advisories.

Floridi, L. (2019). Establishing the rules for building trustworthy AI. Nature Machine Intelligence 1(6). The argument that trustworthy AI requires architectural rather than institutional neutrality, mirrored in the precedence rules of section 4 that no implementation may override.

Rudin, C. (2019). Stop Explaining Black Box Machine Learning Models for High Stakes Decisions and Use Interpretable Models Instead. Nature Machine Intelligence 1(5), 206-215. The honesty requirement on Decision Record explanations stated above is the truthfulness condition on the explanation; Rudin establishes the structural condition on the Model that produced the decision, namely that for high-stakes consequential Actions a post hoc rationalization of an opaque Model is unreliable as a basis for autonomous execution. The protocol-level realization is the inherent-interpretability-or-escalate constraint of RFC 0028 §3.7.1, which mandates at Conformance Level L4 and above that high-stakes decisions in the named domains either be generated by a Model whose interpretability_class is inherent or be routed to the human path of RFC 0018 regardless of the agent confidence score.

Wachter, S., Mittelstadt, B., and Floridi, L. (2017). Why a Right to Explanation of Automated Decision-Making Does Not Exist in the General Data Protection Regulation. International Data Privacy Law 7(2), 76-99. The argument that GDPR Article 22 paragraph 3 and Recital 71 do not by themselves create a meaningful right to explanation absent the technical infrastructure that produces it is the legal motivation for the Decision Record obligations of section 4 above and for their machine-readable refinement in RFC 0028 §3.7, §3.7.1, and §3.8. The protocol does not assume the right to explanation; it constructs it.

Related whitepapers: Confidentiality and Compliance Context, Accountability in the Agent Economy, Governance of an Ownerless Protocol.

Regulation (EU) 2024/1689 on harmonised rules for artificial intelligence (AI Act).

Regulation (EU) 2016/679 (General Data Protection Regulation).

Regulation (EU) 2022/2065 on a Single Market for Digital Services (Digital Services Act).

Appendix A: Social Choice Foundations of Multi-Party Review

This appendix is normative for the social-choice claims it makes and informative for the supporting commentary. It provides the formal foundation of the Multi-Party Review mechanism introduced in section 5, characterizes the voting rules it admits, gives precise impossibility and possibility results that bound what Multi-Party Review can and cannot guarantee, and identifies the strategy-proof voting rules whose use the protocol recommends. The treatment follows the social-choice axiomatics of Arrow (1951), the strategy-proofness theorems of Gibbard (1973) and Satterthwaite (1975), the implementation theory of Maskin (1999), the median-voter analysis of Black (1948) and Moulin (1980), and the multi agent voting treatment of Brandt, Conitzer, Endriss, Lang, and Procaccia (2016) Handbook of Computational Social Choice. It is consistent with the social-choice framing in Shoham and Leyton-Brown (2009), chapter 9.

A.1 Multi-Party Review as a Social Choice Mechanism

Let an Action be subject to Multi-Party Review. The Organizational Policy specifies a set of reviewers with , an authorization threshold , and a time window . Each reviewer submits a signed authorization within . The Multi-Party Review mechanism is a function

where the protocol-default is the threshold rule

Special cases include unanimity (), supermajority (), simple majority (), and the four-eyes rule (). The threshold rule is the rule the protocol recommends as the safe default; the conditions under which other rules are admissible are characterized in A.4.

A.2 The Social Choice Frame

Treat Multi-Party Review as a binary social-choice problem: the alternatives are , the agents are the reviewers, and the preference profile is the vector of authorizations. Each reviewer 's preference is (the reviewer prefers to execute) or (the reviewer prefers to block). Abstention is treated as indifference.

The binary structure of the problem evades the more vexing impossibilities of social choice: with only two alternatives, Arrow's impossibility theorem (1951) does not bind, and the May (1952) characterization theorem applies instead.

A.3 Theorem 1 (May's Characterization for Binary Multi-Party Review)

Statement (May 1952). A social choice function on two alternatives satisfies the four axioms of decisiveness, anonymity, neutrality, and positive responsiveness if and only if is the simple majority rule with .

Implication for OAP. When the institutional context is one in which the four axioms are normatively desirable (each reviewer is treated equally, the two outcomes are treated symmetrically, and a single switched vote can change the outcome), simple majority is the unique admissible rule. The protocol therefore exposes simple majority as the default value of the threshold rule when the Organizational Policy does not specify explicitly.

A.4 Theorem 2 (Threshold Rules Are Strategy-Proof)

Statement. Every threshold rule on the binary alternative set is strategy-proof: no reviewer can obtain a more preferred outcome by misrepresenting its authorization.

Proof. A reviewer that prefers obtains it by reporting whenever the rule's threshold can be satisfied by its vote, and is otherwise indifferent. A reviewer that prefers obtains it by reporting (which is operationally equivalent to refusing to approve), with the symmetric argument. Abstention is weakly dominated by reporting the truthful preference. Hence truth-telling is a weakly dominant strategy for every reviewer.

Remark A.4.1 (Why Gibbard-Satterthwaite Does Not Bind). The Gibbard (1973) and Satterthwaite (1975) impossibility states that no strategy-proof, non-dictatorial social choice function exists on three or more alternatives. Multi-Party Review is binary and therefore evades the impossibility, which is the principled reason the protocol restricts the choice space to rather than admitting multi-way decisions.

Remark A.4.2 (Maskin Monotonicity and Nash-Implementability). The threshold rule is Maskin-monotonic in the sense of Maskin (1977, published 1999): if selects at preference profile and at profile every reviewer's ranking of relative to is weakly higher than at , then selects at as well; the symmetric statement holds for . By the Maskin Theorem on Nash Implementation, with reviewers the threshold rule is Nash-implementable through the canonical Maskin mechanism. Combined with strategy-proofness (Theorem 2), this gives the strongest pair of implementation-theoretic guarantees attainable for binary social choice: dominant-strategy truth-telling at the individual level and Nash-implementation of the social choice at the mechanism level. The reviewer signing requirement and the Transparency Log of OAP-CORE Section 19 jointly realize the public-action and information-revelation conditions that the canonical Maskin mechanism assumes.

A.5 Theorem 3 (Liberal Paradox Avoidance)

Statement. The Multi-Party Review mechanism does not produce the Sen (1970) liberal paradox: there is no preference profile under which the mechanism mandates an outcome that violates a reviewer's right to veto an action that personally affects them.

Proof. The Organizational Policy specifies which reviewers must be included in for which Action class. A reviewer whose individual rights are at stake (for example, a Data Subject with respect to deletion of personal data) is included in with (unanimity), making the reviewer's reject decisive. The protocol thereby implements the contractarian veto guarantee that Sen's paradox identifies as the failure mode of utilitarian aggregation rules.

A.6 Theorem 4 (Independence from Irrelevant Alternatives)

Statement. The threshold rule on the binary alternative set satisfies Independence of Irrelevant Alternatives (IIA): the social choice between and depends only on each reviewer's preference between and , not on any other consideration.

Proof. Direct from the threshold-rule definition: is a function of the authorization vector and nothing else.

Remark A.6.1. IIA is the axiom that fails most often in non-binary social-choice mechanisms (Arrow 1951). The binary restriction makes IIA trivially attainable in OAP, which is the main reason the protocol does not admit multi-way Multi-Party Review with arbitrary alternatives. Multi-way decisions are decomposed into a sequence of binary Multi-Party Reviews per OAP-CORE-1.0 section 13.

A.7 Theorem 5 (No Ostrogorski Paradox in Threshold-Rule Composition)

Statement. Suppose an Action is conditioned on independent Multi-Party Review steps with threshold rules , and the action proceeds only if all steps return . Then the conjunctive composition does not exhibit the Ostrogorski (1902) paradox: there is no profile in which a majority of reviewers individually prefer on each issue but the composition returns , given the threshold rule.

Proof sketch. The Ostrogorski paradox arises when issue-by-issue majority and bundle-by-bundle majority disagree. The OAP composition is conjunctive in the institutional sense: each Multi-Party Review step is a separate authorization with its own reviewer set and threshold . The composition is logical AND over independent decisions. If each of the steps returns by its own threshold, the conjunction returns . The paradox arises only when reviewer sets overlap and a single voter casts inconsistent ballots across issues, which the OAP signature requirement (each authorization is signed by the reviewer's DID and is timestamped) makes detectable; see Remark A.7.1.

Remark A.7.1. When reviewer sets overlap across composed Multi-Party Review steps, implementations SHOULD log the per-reviewer ballot vector and flag inconsistencies for organizational audit. This is an additive recommendation; the threshold-rule composition is sound in any case.

A.8 Theorem 6 (Cooling-Off Composition Preserves Walk-Away)

Statement. Multi-Party Review composes with the cooling-off mechanism of RFC 0017 in a way that preserves the walk-away stability of RFC 0002 Appendix A.3. Specifically, a reviewer who has signed retains the right to revoke the authorization within the cooling-off window, in which case the Multi-Party Review threshold count is decremented.

Proof. RFC 0017 specifies the cooling-off window as a deferred-execution period during which the principal may withdraw consent. Applied to Multi-Party Review, the principal is each individual reviewer. Withdrawal of an authorization reduces the satisfied-threshold count from to . If , the action is blocked, and its execution is suspended pending re-authorization. The walk-away stability of RFC 0002 Appendix A.3 is preserved at the level of each reviewer, because each reviewer's outside option (refuse, revoke, abstain) yields utility no less than coerced approval.

A.9 Theorem 7 (Sybil-Resistance of Multi-Party Review)

Statement. A coalition that controls verified reviewer identities can trivially force . Sybil resistance therefore requires the verified-reviewer constraint of the Organizational Policy together with the Sub-Tree Aggregation discount of RFC 0011 section 3.6: reviewers sharing a Delegation root are aggregated to a single effective vote.

Proof sketch. Without Sub-Tree Aggregation, a Principal that spawns Sub Agent reviewers may unilaterally satisfy any threshold. With Sub-Tree Aggregation, the Sub Agents collapse to one effective vote, recovering Sybil resistance under the same bound as RFC 0009 Appendix A.3. The Multi-Party Review mechanism MUST consult the Sub-Tree Aggregation function before counting authorizations.

Implication. The conformance probe behavior/multi-party-review-sybil.test.js verifies that an Organizational Policy that omits Sub-Tree Aggregation is flagged as non-conformant.

A.10 Voting Beyond Binary Decisions

Some institutional contexts call for richer expressive power than binary approval (for example, ranking proposed amendments to a contract). The protocol does not extend Multi-Party Review to multi-way decisions directly because of Gibbard-Satterthwaite (Remark A.4.1). Instead, the protocol recommends decomposition into a sequence of binary Multi-Party Reviews using one of the strategy-proof binary-decomposition rules:

  1. Pairwise sequential elimination. Each pair of alternatives is voted in sequence, the loser is eliminated, the survivor faces the next alternative. This is the standard parliamentary procedure and satisfies Condorcet consistency on the strict preference order.
  2. Median-voter rule on a line. When the alternatives admit a single-peaked preference order on a one-dimensional line (Black 1948), the median voter's preferred alternative is the strategy-proof Condorcet winner. The Multi-Party Review mechanism may implement this directly when the Organizational Policy declares the alternative space as one-dimensional.
  3. Approval voting. Each reviewer approves any subset of alternatives; the alternative with the most approvals is selected. Approval voting is strategy-proof in expectation under standard assumptions (Brams and Fishburn 1978).

Implementations that wish to support multi-way decisions MUST select one of these rules and document the selection in the Organizational Policy declaration.

A.11 Composition with the Layered Policy Stack

Multi-Party Review is a layer-2 (Organizational Policy) mechanism in the four-layer stack of section 2. It composes monotonically with the higher layer (Platform Rules) and the lower layers (Scope Policy and Personal Preference). Specifically:

  1. A Platform Rule refusal cannot be overridden by Multi-Party Review approval (section 4 conflict-resolution rule).
  2. Scope Policy and Personal Preference may add additional Multi-Party Review requirements but cannot weaken those imposed by Organizational Policy.

These properties follow directly from the precedence rules of section 4 and do not require separate proof.

A.12 Implications for Downstream RFCs

  1. RFC 0002 (Negotiation). The walk-away stability of Theorem 2 of RFC 0002 Appendix A is preserved under Multi-Party Review composition by Theorem 6 above.
  2. RFC 0008 (Workflows). Joint commitments under Workflows that require Multi-Party Review (Appendix A.8 of RFC 0008) inherit the strategy-proofness of Theorem A.4.
  3. RFC 0017 (Irreversibility and Cooling Off). The cooling-off window composes with Multi-Party Review per Theorem 6.
  4. RFC 0019 (Conformance). The probes behavior/multi-party-review-threshold.test.js, behavior/multi-party-review-sybil.test.js, and behavior/multi-party-review-cooling-off.test.js mechanically verify Theorems 2, 7, and 6 respectively.

A.13 References to Prior Treatments

  • Arrow, K. J. (1951). Social Choice and Individual Values. Yale University Press.
  • May, K. O. (1952). A Set of Independent Necessary and Sufficient Conditions for Simple Majority Decision. Econometrica 20(4).
  • Black, D. (1948). On the Rationale of Group Decision-Making. Journal of Political Economy 56(1).
  • Sen, A. (1970). The Impossibility of a Paretian Liberal. Journal of Political Economy 78(1).
  • Gibbard, A. (1973). Manipulation of Voting Schemes: A General Result. Econometrica 41(4).
  • Satterthwaite, M. A. (1975). Strategy-proofness and Arrow's Conditions. Journal of Economic Theory 10(2).
  • Brams, S. J., and Fishburn, P. C. (1978). Approval Voting. American Political Science Review 72(3).
  • Moulin, H. (1980). On Strategy-Proofness and Single Peakedness. Public Choice 35(4).
  • Maskin, E. (1999). Nash Equilibrium and Welfare Optimality. Review of Economic Studies 66(1).
  • Brandt, F., Conitzer, V., Endriss, U., Lang, J., and Procaccia, A. D. (eds.) (2016). Handbook of Computational Social Choice. Cambridge University Press.
  • Ostrogorski, M. (1902). La Democratie et les Partis Politiques. Calmann-Levy.
  • Shoham, Y., and Leyton-Brown, K. (2009). Multiagent Systems: Algorithmic, Game-Theoretic, and Logical Foundations. Cambridge University Press, chapter 9.

Appendix B: Stackelberg Security Game Analysis of the Policy Stack

This appendix is normative for the security claims it makes and informative for the supporting commentary. It models the four layer Policy Stack of section 2 and the Multi-Party Review mechanism of section 5 as a Stackelberg Security Game in the sense of Tambe (2011), characterizes the defender's optimal strategy under bounded attacker rationality, gives precise utility bounds against adaptive adversaries, and connects the analysis to the operational security mechanisms deployed by the major Stackelberg-based security systems ARMOR (Pita et al. 2008), IRIS (Tsai et al. 2009), PROTECT (Shieh et al. 2012), and the Bayesian DOBSS algorithm of Paruchuri, Pearce, Marecki, Tambe, Ordonez, and Kraus (2008). The treatment is consistent with the survey of Sinha, Fang, An, Kiekintveld, and Tambe (2018) and the textbook of Tambe (2011) Security and Game Theory: Algorithms, Deployed Systems, Lessons Learned.

B.1 The Policy Stack as a Stackelberg Security Game

Let denote the Defender, namely the union of the Policy Stack enforcement points of section 2 (the Universal Prohibitions evaluator, the Organizational Policy engine, the Scope Policy engine, and the Personal Policy engine) together with the Multi-Party Review mechanism of section 5. Let denote the set of protected targets, namely the consequential Actions defined under RFC 0018 plus the irreversible Actions defined under RFC 0017 plus the high-stake commerce primitives defined under RFC 0014. Let denote the Defender's set of resources, namely the verification probes that the Defender can deploy: signature verification, Receipt-chain audit, Multi-Party Review escalation, cooling-off enforcement, reputation-weighted gating, and adversarial conformance testing under RFC 0019.

The Defender's pure strategy is an assignment of resources to targets: which probe is run on which Action class, with what frequency, and triggering which escalation. The Defender's mixed strategy is a probability distribution over pure strategies. The Attacker's strategy is the choice of which target Action class to attack and with what method (forgery, replay, signature stripping, cooling-off bypass, escalation routing, replaceability obfuscation, Sybil creation, all enumerated in section 8 of RFC 0019).

The Defender's utility is

with (covered attacks are preferred to uncovered attacks). The Attacker's utility has the symmetric structure with (uncovered attacks are preferred). The Stackelberg structure asserts that the Defender commits first to a publicly observable mixed strategy and the Attacker best-responds.

B.2 Why Stackelberg Rather Than Simultaneous Play

The Policy Stack and Multi-Party Review parameters of OAP are publicly declared in the Manifest under RFC 0019 section 7. Consequently the Attacker is a fully informed second mover. This is the defining condition under which the Stackelberg model applies, in contrast to the simultaneous-move Nash equilibrium that obtains when neither party observes the other's strategy. Tambe (2011, chapter 2) showed that for security domains with this commitment structure, the Defender's optimal strategy is the Strong Stackelberg Equilibrium (SSE), which generally yields strictly higher Defender utility than the Nash equilibrium of the simultaneous game.

The SSE satisfies:

  1. where is the Attacker's best response.
  2. The Attacker breaks ties in the Defender's favor.

Condition 2 is the standard "strong" assumption of Stackelberg security games and is justified in the OAP context by the cooperative-tie-breaking convention of section 4 (when the Attacker is indifferent, the protocol's defaults route to the safer outcome).

B.3 Theorem B.1 (Existence and Uniqueness of the Defender's Optimal Mixed Strategy)

Statement. For every finite OAP Policy Stack instance with finite resource set and finite target set , the Strong Stackelberg Equilibrium exists and is unique up to a measure-zero set of degenerate utility profiles.

Proof sketch. The argument is the classical existence-and-uniqueness proof of Conitzer and Sandholm (2006) and Paruchuri, Pearce, Marecki, Tambe, Ordonez, and Kraus (2008) applied to the OAP setting. Existence follows from the compactness of and the upper semi-continuity of in . Uniqueness up to measure zero follows from the genericity argument of Conitzer and Sandholm (2006, Theorem 3): for any non-degenerate utility profile, the SSE coverage probabilities are uniquely determined by the LP whose constraints encode the Attacker's best-response indifference.

B.4 Theorem B.2 (Computation via DOBSS)

Statement. The OAP Defender's optimal mixed strategy is computable in polynomial time in the size of the Policy Stack instance using the Decomposed Optimal Bayesian Stackelberg Solver of Paruchuri, Pearce, Marecki, Tambe, Ordonez, and Kraus (2008) when the Attacker is single-type, and is computable in time in the Bayesian setting where is the set of Attacker types and is the number of resources.

Proof sketch. DOBSS reformulates the multi-LP characterization of SSE as a single mixed-integer linear program whose optimal value gives the Defender's optimal commitment strategy. The reformulation is polynomial-size in the Policy Stack instance, and the resulting MILP can be solved in polynomial time when the Attacker is single-type. The Bayesian extension, in which the Attacker is drawn from a finite type space with known prior, requires the Harsanyi transformation that grows the strategy space exponentially in the number of resources, yielding the bound stated. The Defender's reference implementation publishes its DOBSS configuration under the OAP Registry entry oap.policy.dobss.v1.

B.5 Theorem B.3 (Bounded Adversary Utility under SSE Defense)

Statement. Let the Attacker have utility bounded by on uncovered attacks and on covered attacks. Under the SSE strategy of B.1, the Attacker's expected utility is bounded above by

where is the SSE coverage probability of the Attacker's best-response target. The OAP Policy Stack achieves for any provided the Defender's resource budget exceeds the Attacker's marginal-utility-per-coverage threshold derived in Korzhyk, Conitzer, and Parr (2010).

Proof sketch. Direct from the SSE definition and the resource-coverage bound of Korzhyk, Conitzer, and Parr (2010, Theorem 1). The bound is tight when the Attacker's utility profile is known to the Defender, and degrades gracefully as the Defender's prior over Attacker types becomes more diffuse.

Operational implication. The Defender's resource budget is realized in OAP by the conformance probe budget of RFC 0019 section 7 plus the Multi-Party Review reviewer budget of section 5 of this paper. A Provider that wishes to certify Tambe-grade adversarial robustness MUST publish its allocation in the Manifest's conformance.security_budget block.

B.6 Theorem B.4 (Robustness to Bounded-Rationality Attackers)

Statement. When the Attacker exhibits Quantal Response behavior with rationality parameter (McFadden 1976; Yang, Kiekintveld, Ordonez, Tambe, and John 2011), the Defender's optimal strategy under the Quantal Response Stackelberg Equilibrium (QSE) of Yang et al. (2011) is computable in polynomial time and yields Defender utility that strictly improves over the SSE strategy for .

Proof sketch. Yang, Kiekintveld, Ordonez, Tambe, and John (2011, Theorem 2) characterize the QSE as the solution to a convex optimization problem when the Defender's utility is concave in coverage probability. The OAP utility structure of B.1 satisfies this concavity by construction. The strict improvement over SSE follows from the strict suboptimality of the SSE strategy when the Attacker deviates from perfect rationality, and is operationally important for OAP because real-world adversaries (script kiddies, opportunistic attackers, automated scanners) are demonstrably bounded-rational. The Defender's reference implementation MAY use the QSE algorithm of Yang et al. (2011) and publish its choice under the Registry entry oap.policy.qse.v1.

B.7 Multi-Party Review under Stackelberg Bribery

Multi-Party Review (section 5) is itself a Stackelberg Security Game with a different threat model: the Attacker is an internal actor who attempts to bribe a subset of the reviewer set to obtain approvals for a malicious Action. The Defender's strategy is the threshold and the reviewer-selection rule. Under the Stackelberg framing:

Theorem B.5 (Bribery-Resistance Bound). Suppose the Attacker has bribery budget , each reviewer has a personal cost-of-corruption drawn from a known distribution , and the Defender selects as a function of . Then the Defender's optimal that minimizes the probability of successful bribery satisfies

where is the -th order statistic of the reviewer cost-of-corruption distribution and is the maximum tolerable bribery success probability.

Proof sketch. The Attacker's optimal strategy is to bribe the cheapest reviewers. The Defender's optimal is the smallest threshold under which this cost exceeds with probability . The order-statistic bound is the standard combinatorial security argument of Tambe (2011, chapter 6) on insider-threat resistance under Stackelberg bribery.

Operational implication. Organizations deploying Multi-Party Review SHOULD calibrate to their estimated and . The OAP Registry MAY publish reference parametrizations for industry-standard threat profiles under oap.policy.multi-party-review.bribery.v1.

B.8 Composition with Conformance Testing (RFC 0019)

The adversarial test selection problem of RFC 0019 section 8 is itself a Stackelberg Patrolling Game in the sense of IRIS (Tsai, Rathi, Kiekintveld, Ordonez, and Tambe 2009). The treatment is given in RFC 0019 Appendix A and is referenced here for completeness. The composition is well defined by the additivity of Stackelberg utilities across independent security games (Korzhyk, Conitzer, and Parr 2011): the Defender's optimal allocation across the Policy Stack and the Conformance Test Patrol is the joint LP solution that respects the per-game resource constraints.

B.9 Operational Deployments as Reference Points

The Stackelberg Security Game framework underlying this appendix is not theoretical: it has been deployed at scale in safety-critical contexts that share the OAP Policy Stack's defender-attacker structure.

  1. ARMOR (Pita, Jain, Marecki, Ordonez, Portway, Tambe, Western, Paruchuri, and Kraus 2008) deployed at Los Angeles International Airport for randomized vehicle checkpoint and canine patrol scheduling.
  2. IRIS (Tsai, Rathi, Kiekintveld, Ordonez, and Tambe 2009) deployed by the United States Federal Air Marshals Service for randomized flight assignment.
  3. PROTECT (Shieh, An, Yang, Tambe, Baldwin, DiRenzo, Maule, and Meyer 2012) deployed by the United States Coast Guard for port patrol scheduling.
  4. TRUSTS (Yin, Jiang, Johnson, Tambe, Kiekintveld, Leyton-Brown, Sandholm, and Sullivan 2012) deployed by the Los Angeles Sheriff's Department for fare-evasion patrols on the LA Metro.

These deployments collectively demonstrate that the Stackelberg approach scales to real-world adversarial settings with millions of agent-target pairs and that the resource-coverage bounds of Theorem B.3 hold empirically. OAP inherits this empirical track record by adopting the same algorithmic framework for its Policy Stack defense.

B.10 Implications for Downstream RFCs

  1. RFC 0017 (Cooling-Off). The cooling-off window is a Defender resource in the SSE of B.1. Optimal allocation of cooling-off enforcement across Action classes follows from the LP of Theorem B.2.
  2. RFC 0018 (Right to Human Path). The Escalation Action is a Defender resource that breaks the strict autonomy assumption of the Stackelberg game and converts the analysis into a Human-Agent Collective security game in the sense of Jennings et al. (2014).
  3. RFC 0019 (Conformance). The adversarial test budget allocation is the patrolling-game extension of B.8, treated formally in RFC 0019 Appendix A.
  4. RFC 0026 (Registry Protocol). The DOBSS, QSE, and bribery-resistance reference parametrizations are published as Registry entries with the identifiers given in B.4, B.6, and B.7.

B.11 References to Stackelberg Security Games and Game-Theoretic Defense

  • Tambe, M. (2011). Security and Game Theory: Algorithms, Deployed Systems, Lessons Learned. Cambridge University Press.
  • Conitzer, V., and Sandholm, T. (2006). Computing the Optimal Strategy to Commit to. Proceedings of the ACM Conference on Electronic Commerce (EC '06).
  • Paruchuri, P., Pearce, J. P., Marecki, J., Tambe, M., Ordonez, F., and Kraus, S. (2008). Playing Games for Security: An Efficient Exact Algorithm for Solving Bayesian Stackelberg Games. Proceedings of AAMAS-2008. [DOBSS]
  • Pita, J., Jain, M., Marecki, J., Ordonez, F., Portway, C., Tambe, M., Western, C., Paruchuri, P., and Kraus, S. (2008). Deployed ARMOR Protection: The Application of a Game-Theoretic Model for Security at the Los Angeles International Airport. Proceedings of AAMAS-2008 Industry Track.
  • Tsai, J., Rathi, S., Kiekintveld, C., Ordonez, F., and Tambe, M. (2009). IRIS: A Tool for Strategic Security Allocation in Transportation Networks. Proceedings of AAMAS-2009 Industry Track.
  • Korzhyk, D., Conitzer, V., and Parr, R. (2010). Complexity of Computing Optimal Stackelberg Strategies in Security Resource Allocation Games. Proceedings of AAAI-2010.
  • Korzhyk, D., Conitzer, V., and Parr, R. (2011). Solving Stackelberg Games with Uncertain Observability. Proceedings of AAMAS-2011.
  • Yang, R., Kiekintveld, C., Ordonez, F., Tambe, M., and John, R. (2011). Improving Resource Allocation Strategy against Human Adversaries in Security Games. Proceedings of IJCAI-2011.
  • Shieh, E., An, B., Yang, R., Tambe, M., Baldwin, C., DiRenzo, J., Maule, B., and Meyer, G. (2012). PROTECT: A Deployed Game Theoretic System to Protect the Ports of the United States. Proceedings of AAMAS-2012.
  • Yin, Z., Jiang, A. X., Johnson, M. P., Tambe, M., Kiekintveld, C., Leyton-Brown, K., Sandholm, T., and Sullivan, J. P. (2012). TRUSTS: Scheduling Randomized Patrols for Fare Inspection in Transit Systems. Proceedings of IAAI-2012.
  • McFadden, D. (1976). Quantal Choice Analysis: A Survey. Annals of Economic and Social Measurement 5(4).
  • Sinha, A., Fang, F., An, B., Kiekintveld, C., and Tambe, M. (2018). Stackelberg Security Games: Looking Beyond a Decade of Success. Proceedings of IJCAI-2018.