Policy vs Playbook

Policy vs playbook refers to the distinction between high-level access rules (policy) and detailed execution procedures (playbook) for agents. Policies define what an agent can or cannot do, establishing boundaries and constraints. Playbooks define how an agent should accomplish specific tasks, providing step-by-step procedures for execution.

Why It Matters

Separation of Concerns

The policy-playbook distinction creates a clean architectural separation between governance and implementation. Policies remain stable and reflect organizational requirements, while playbooks evolve with operational needs. This separation allows security teams to manage policies independently of engineering teams who build playbooks.

Governance vs Execution

Policies serve as governance mechanisms that ensure compliance, security, and risk management across all agent operations. They establish non-negotiable boundaries that apply universally. Playbooks, conversely, focus on operational efficiency and task completion. A robust policy layer prevents playbooks from executing harmful actions, even when playbooks contain bugs or unexpected logic.

Auditing Clarity

Separating policies from playbooks dramatically simplifies compliance auditing. When a policy violation occurs, investigators can trace which policy was breached without parsing through complex execution logic. Policy logs show what was prevented and why. Playbook logs show what steps were executed. This dual-layer logging provides both security oversight and operational debugging capabilities.

Concrete Examples

Data Protection

Policy: "No personally identifiable information (PII) may be transmitted to external APIs outside the approved vendor list."

Playbook: "When processing customer support tickets: (1) Extract ticket content, (2) Identify PII using NER model, (3) Redact SSN, email, and phone numbers, (4) Check API endpoint against approved vendor list, (5) If approved, transmit redacted content, (6) Log transmission with request ID."

The policy establishes the absolute constraint. The playbook implements the specific steps to comply with that constraint while accomplishing the business objective.

Authorization and Access Control

Policy: "Agents operating in production environments must authenticate with service accounts that have read-only permissions. Write operations require human approval."

Playbook: "For database query requests: (1) Authenticate using read-only service account, (2) Parse user query for intent, (3) Generate SQL SELECT statement, (4) Execute query with 10-second timeout, (5) Return results. For database modification requests: (1) Generate proposed SQL statement, (2) Request human approval via Slack, (3) Wait for approval with 5-minute timeout, (4) If approved, execute with write-enabled credentials, (5) Log modification."

The policy prevents unauthorized writes. The playbook implements the workflow that respects this boundary while handling both read and write scenarios.

Financial Transactions

Policy: "Individual transaction amounts must be < $1,000. Daily aggregate transactions per customer must be < $5,000."

Playbook: "For payment processing: (1) Validate transaction amount against $1,000 limit, (2) Query daily transaction total for customer, (3) Calculate new aggregate with current transaction, (4) Validate aggregate < $5,000, (5) If both checks pass, initiate payment via payment gateway, (6) Update customer transaction log, (7) Send confirmation."

The policy establishes spending limits. The playbook implements the checks and processing sequence to enforce those limits.

Common Pitfalls

Conflating the Two

Teams frequently embed policy logic directly within playbooks, creating a tangled architecture where governance rules are scattered across execution code. This makes policy updates require playbook modifications, increasing deployment risk. It also obscures which code enforces business rules versus operational logic. Maintain policies in dedicated policy engines that playbooks query before execution.

Policy in Code

Hard-coding policies as conditional statements in playbooks creates maintenance nightmares. When regulations change or business requirements evolve, developers must hunt through playbook code to update scattered policy checks. Use declarative policy languages (OPA's Rego, Cedar, JSON schemas) that non-developers can update without code deployment.

Playbook Rigidity

Designing playbooks as rigid, inflexible scripts that cannot adapt to context prevents agents from handling edge cases. While policies should be strict, playbooks benefit from conditional logic and error handling. A playbook that fails completely when one step encounters an error is less valuable than one that attempts recovery, retries, or escalates appropriately while still respecting policy constraints.

Policy Proliferation

Creating overly granular policies that specify execution details defeats the purpose of separation. Policies like "use TLS 1.3 for HTTPS connections" conflate policy with implementation. The policy should state "external communications must be encrypted," while the playbook specifies TLS 1.3 as the implementation method.

Implementation

Policy Declaration Formats

Modern policy engines use declarative formats that separate policy from code:

Open Policy Agent (OPA) with Rego:

package agent.data_protection

deny_external_api[msg] {
    input.action == "api_call"
    not is_approved_vendor(input.endpoint)
    contains_pii(input.payload)
    msg := sprintf("Blocked external API call to %v with PII", [input.endpoint])
}

is_approved_vendor(endpoint) {
    approved_vendors := ["api.salesforce.com", "api.zendesk.com"]
    startswith(endpoint, approved_vendors[_])
}

AWS Cedar:

permit(
    principal,
    action == Action::"readData",
    resource
) when {
    principal.role == "agent" &&
    resource.classification != "PII"
};

forbid(
    principal,
    action == Action::"writeData",
    resource
) unless {
    context.has_human_approval == true
};

JSON Schema for Declarative Policies:

{
  "policies": [
    {
      "id": "financial-transaction-limit",
      "rules": [
        {
          "resource": "transaction",
          "constraints": {
            "amount_max": 1000,
            "daily_aggregate_max": 5000
          },
          "enforcement": "deny"
        }
      ]
    }
  ]
}

Playbook Execution Engines

Playbooks require execution engines that interpret procedures while enforcing policy checkpoints:

YAML-based Playbook:

playbook:
  name: "process_customer_refund"
  version: "1.2.0"
  steps:
    - id: "validate_request"
      action: "validate_schema"
      params:
        schema: "refund_request_v1"
      policy_check: "input_validation"

    - id: "check_eligibility"
      action: "query_database"
      params:
        query: "SELECT * FROM orders WHERE id = {{order_id}}"
      policy_check: "data_access"

    - id: "process_refund"
      action: "call_payment_api"
      params:
        endpoint: "{{payment_gateway}}/refund"
        amount: "{{refund_amount}}"
      policy_check: "financial_transaction"
      requires_approval: true

    - id: "notify_customer"
      action: "send_email"
      params:
        template: "refund_confirmation"
      policy_check: "external_communication"

Python-based Execution Engine:

class PlaybookExecutor:
    def __init__(self, policy_engine):
        self.policy_engine = policy_engine

    def execute_step(self, step, context):
        # Policy check before execution
        policy_decision = self.policy_engine.evaluate(
            action=step['action'],
            resource=step.get('params', {}),
            context=context
        )

        if not policy_decision.allowed:
            raise PolicyViolationError(
                f"Policy {policy_decision.violated_policy} "
                f"denied action: {policy_decision.reason}"
            )

        # Execute step
        result = self.execute_action(step['action'], step['params'])

        # Post-execution policy validation
        self.policy_engine.audit_log(step, result, context)

        return result

Validation and Testing

Implement continuous validation to ensure policies and playbooks work correctly together:

Policy Testing:

def test_pii_external_api_policy():
    policy_engine = OPAEngine("policies/data_protection.rego")

    # Should deny: PII to unapproved vendor
    decision = policy_engine.evaluate({
        "action": "api_call",
        "endpoint": "https://untrusted-api.com",
        "payload": {"ssn": "123-45-6789"}
    })
    assert decision.denied

    # Should allow: non-PII to unapproved vendor
    decision = policy_engine.evaluate({
        "action": "api_call",
        "endpoint": "https://untrusted-api.com",
        "payload": {"product_id": "ABC123"}
    })
    assert decision.allowed

Playbook Testing:

def test_refund_playbook_respects_policies():
    mock_policy = MockPolicyEngine()
    executor = PlaybookExecutor(mock_policy)

    # Test policy enforcement during execution
    playbook = load_playbook("process_customer_refund.yaml")

    # Simulate policy denial on excessive refund
    mock_policy.set_rule("financial_transaction", deny=True)

    with pytest.raises(PolicyViolationError):
        executor.execute(playbook, {"refund_amount": 10000})

    # Verify playbook logged policy violation
    assert executor.audit_log.contains("policy_violation")

Key Metrics to Track

Policy Compliance Rate

Measures the percentage of agent actions that comply with policies on first attempt. Calculate as:

Policy Compliance Rate = (Allowed Actions / Total Actions Attempted) × 100

Target compliance rates > 95% indicate well-designed playbooks that naturally align with policies. Rates < 90% suggest either overly restrictive policies or poorly designed playbooks that frequently violate constraints.

Playbook Success Rate

Tracks completed playbook executions versus failed attempts:

Playbook Success Rate = (Successful Completions / Total Executions) × 100

Success rates < 80% indicate brittle playbooks with insufficient error handling. Monitor success rates per playbook to identify which procedures need refinement.

Policy Violations

Count and categorize policy violations by severity and type:

  • Critical violations: Actions that would cause security breaches or regulatory non-compliance (target: 0)
  • High violations: Actions blocked that posed significant risk (monitor trend)
  • Medium violations: Actions blocked due to procedural policies (acceptable if < 5% of total actions)

Track violation trends over time. Increasing violations may indicate policy-playbook misalignment or evolving attack patterns. Decreasing violations suggest playbooks are adapting to policy requirements.

Policy Coverage

Percentage of agent capabilities covered by explicit policies:

Policy Coverage = (Actions with Policy Rules / Total Possible Actions) × 100

Insufficient coverage (< 70%) creates security gaps. Complete coverage (100%) may indicate policy over-engineering.

Policy Evaluation Latency

Time required to evaluate policies for each agent action. Track p50, p95, and p99 latencies:

  • p95 < 50ms: Excellent (minimal impact on agent responsiveness)
  • p95 < 200ms: Acceptable (noticeable but manageable)
  • p95 > 500ms: Poor (significant performance bottleneck)

High latencies suggest policy engine optimization needs or overly complex policy logic.

Related Concepts

Understanding policy vs playbook connects to several related architectural patterns in agent systems:

  • Guardrails - Runtime safety mechanisms that enforce policies during agent execution
  • Allow-deny-lists - Simple policy implementation pattern using explicit permission and restriction lists
  • Observability - Monitoring and logging systems that track policy compliance and playbook execution
  • Policy-engine - Dedicated systems for defining, managing, and enforcing policies separately from application logic

These concepts form a comprehensive governance and execution framework for production agent systems.