Proof of Action

Proof of action refers to evidence artifacts such as screenshots, logs, or traces that document actions taken by an agent for verification and audit purposes. These artifacts create a verifiable chain of evidence that demonstrates what an autonomous agent did, when it did it, and what the system state was at each step.

In the context of computer-use agents and agentic UI systems, proof of action serves as the foundation for accountability, enabling organizations to reconstruct agent behavior, validate outcomes, and maintain compliance with regulatory requirements.

Why It Matters

Proof of action is critical for three primary reasons that directly impact production agent deployments:

Dispute Resolution

When an agent performs an incorrect action or produces an unexpected outcome, proof of action artifacts enable teams to determine exactly what happened. A customer claiming an agent made unauthorized changes can be verified or refuted with timestamped screenshots and action logs showing the precise sequence of events. Without these artifacts, disputes become impossible to resolve definitively, leading to customer trust issues and potential liability.

Compliance Audits

Regulated industries—including finance, healthcare, and government—require detailed audit trails demonstrating that systems behave according to established policies. Proof of action provides auditors with concrete evidence that agents operated within approved parameters. For example, a financial services agent must prove it verified customer identity before executing transactions, with artifacts showing each verification step and the decision rationale.

Debugging Failures

When agents fail or produce incorrect results, engineers need to understand the failure mode to prevent recurrence. Proof of action artifacts capture the agent's perception of the environment (screenshots), its decision-making process (logs), and the system state (traces) at the moment of failure. This evidence is essential for root cause analysis—without it, engineers are left guessing about what went wrong.

Concrete Examples

Proof of action manifests in several distinct artifact types, each serving specific verification needs:

Screenshot Capture at Key Steps

Visual snapshots of the application state at critical decision points provide evidence of what the agent "saw" when taking actions. For example:

  • Before-and-after screenshots when an agent modifies a form field, showing the original value and the new value
  • Decision point screenshots capturing the UI state when the agent chose between multiple options
  • Error state screenshots documenting what the agent encountered when a task failed
  • Confirmation screen screenshots proving the agent verified an action before committing it

A customer service agent that processes refunds might capture screenshots before clicking the "Approve Refund" button, showing the refund amount, customer account details, and reason code that informed the decision.

Action Logs with Timestamps

Structured log entries documenting each agent action create a sequential record of behavior. Comprehensive action logs include:

  • Precise timestamps in UTC with millisecond precision
  • Action type identifiers (e.g., "click", "type", "navigate", "api_call")
  • Target elements with DOM selectors or coordinates
  • Input parameters such as text entered or options selected
  • Correlation IDs linking actions to specific user requests or workflows

Example log entry:

{
  "timestamp": "2025-10-23T14:32:18.742Z",
  "action_type": "click",
  "target": "button#submit-order",
  "correlation_id": "req_abc123",
  "agent_id": "agent_xyz789",
  "session_id": "sess_456def"
}

State Snapshots

Point-in-time captures of system state provide context around agent actions. State snapshots typically include:

  • DOM snapshots preserving the complete HTML structure at action time
  • Variable state showing the agent's internal decision variables
  • API response payloads documenting external data that influenced decisions
  • Browser storage state capturing cookies, localStorage, and sessionStorage contents

These snapshots enable replay and analysis of exactly what information the agent had available when making decisions.

Common Pitfalls

Organizations implementing proof of action systems frequently encounter these failure modes:

Missing PII Redaction

Screenshot and log artifacts often inadvertently capture personally identifiable information (PII), protected health information (PHI), or payment card data. Storing these artifacts without redaction creates compliance violations and security risks.

Common PII exposure points include:

  • Social security numbers visible in form fields during screenshot capture
  • Credit card numbers in API logs
  • Patient names and medical record numbers in healthcare agent screenshots
  • Email addresses and phone numbers in customer service interactions

Teams must implement automatic redaction before artifact storage, using techniques like DOM-aware masking that redacts sensitive fields while preserving the overall UI structure for debugging purposes.

Excessive Storage Costs

Capturing comprehensive proof of action generates substantial data volumes. A single agent session might produce:

  • 50-200 screenshots at 200KB each (10-40MB per session)
  • 1,000-10,000 log entries at 500 bytes each (0.5-5MB per session)
  • 10-50 state snapshots at 1MB each (10-50MB per session)

At scale, this quickly becomes prohibitively expensive. Organizations running 10,000 agent sessions daily could generate 200GB-950GB of artifacts per day, or 73-347TB annually.

Mitigation strategies include implementing tiered retention (keeping recent artifacts in hot storage, archiving older artifacts to cold storage), selective capture (only capturing artifacts for failed sessions or random samples), and compression (using efficient formats like WebP for screenshots).

Incomplete Evidence Chains

Proof of action loses value when artifacts have gaps in coverage. An incomplete evidence chain occurs when:

  • Screenshots are captured only at task completion, missing intermediate decision points
  • Logs capture agent actions but not the environmental triggers that caused them
  • State snapshots exclude external API calls that influenced agent behavior
  • Correlation IDs don't properly link artifacts across distributed system components

An agent that successfully completes a task might appear correct based on the final screenshot, while missing critical errors in intermediate steps that happened to self-correct. Complete evidence chains require capturing artifacts at all decision boundaries, not just final outcomes.

Implementation

Implementing robust proof of action requires addressing three core capabilities:

Artifact Collection Strategies

Synchronous vs. Asynchronous Capture

Synchronous capture blocks agent execution until artifacts are collected, guaranteeing complete evidence but adding latency to agent actions. Asynchronous capture allows the agent to continue while artifacts are collected in parallel, improving performance but risking data loss if the capture process fails.

Best practice: Use synchronous capture for critical decision points (e.g., before executing a financial transaction) and asynchronous capture for informational artifacts (e.g., routine navigation steps).

Sampling Approaches

Full capture collects artifacts for every agent action, providing complete evidence but maximizing storage costs. Sampling strategies include:

  • Random sampling: Capture N% of all sessions
  • Failure-triggered: Capture only for sessions that end in errors
  • Threshold-based: Capture when specific conditions are met (e.g., high-value transactions)
  • Stratified sampling: Ensure representative coverage across agent types, users, and workflows

Artifact Format Selection

Screenshots can be stored as PNG (lossless, larger), JPEG (lossy, smaller), or WebP (efficient compression). Logs can use JSON (human-readable), Protocol Buffers (compact), or MessagePack (efficient). State snapshots benefit from compression formats like gzip or brotli.

Storage and Retention

Tiered Storage Architecture

  • Hot tier (0-30 days): Recent artifacts stored in fast-access databases or object storage for immediate debugging and investigation
  • Warm tier (30-365 days): Older artifacts moved to cheaper storage classes with slower retrieval times
  • Cold tier (1-7 years): Compliance-required artifacts archived to deep storage with significant retrieval latency
  • Deletion (>7 years): Artifacts purged according to data retention policies

Data Lifecycle Management

Implement automatic lifecycle policies that transition artifacts between tiers based on age and access patterns. Configure deletion rules aligned with legal and regulatory retention requirements.

Encryption and Access Controls

Encrypt artifacts at rest using AES-256 or equivalent. Implement role-based access controls ensuring only authorized personnel can retrieve artifacts. Maintain separate audit logs tracking who accessed proof of action artifacts and when.

Retrieval Systems

Indexing Strategies

Enable fast artifact lookup by indexing on:

  • Timestamp ranges (find all artifacts from a specific time period)
  • Session IDs (retrieve all artifacts for a particular agent session)
  • Agent IDs (review behavior patterns for specific agents)
  • Correlation IDs (trace artifacts across distributed system components)
  • Error codes (find all sessions that encountered specific failures)

Query Interfaces

Provide both programmatic APIs and human-friendly UIs for artifact retrieval. Engineers debugging issues need quick access to relevant screenshots and logs without writing complex queries.

Replay Capabilities

Advanced implementations enable session replay, reconstructing agent behavior by sequencing screenshots and logs chronologically. This allows investigators to "watch" what the agent did as if observing in real-time.

Key Metrics

Organizations should monitor these metrics to evaluate proof of action effectiveness:

Artifact Capture Rate

Definition: Percentage of agent actions for which proof of action artifacts were successfully collected and stored.

Target: > 99.9% for critical paths, > 95% for routine actions

Measurement: (successful_artifact_collections / total_agent_actions) × 100

Low capture rates indicate collection failures, potentially leaving critical actions undocumented. Monitor capture failures by artifact type to identify systematic issues (e.g., screenshot capture failing at higher rates than log capture).

Storage Costs

Definition: Total cost of storing proof of action artifacts across all storage tiers.

Target: < 5% of total infrastructure costs for agent systems

Measurement: Sum of monthly storage costs for hot, warm, and cold tiers plus data transfer costs

Track cost per agent session to understand efficiency: total_monthly_storage_cost / total_monthly_sessions. Optimize by adjusting retention policies, sampling rates, and compression strategies.

Retrieval Time

Definition: Time elapsed between requesting artifacts and receiving usable results.

Target: < 2 seconds for hot tier, < 30 seconds for warm tier, < 5 minutes for cold tier

Measurement: P50, P95, and P99 latencies for artifact retrieval operations

Slow retrieval times hinder debugging and incident response. Monitor retrieval latency by storage tier and artifact type to identify performance bottlenecks.

Artifact Completeness

Definition: Percentage of debugging sessions where available artifacts provided sufficient information to determine root cause.

Target: > 90%

Measurement: Survey engineers after incident investigations to assess whether proof of action artifacts were sufficient for root cause analysis

Low completeness indicates gaps in artifact coverage—missing screenshots at key decision points, insufficient log detail, or inadequate state snapshots.

Related Concepts

  • Audit log: Structured record of system events and user actions that complements proof of action artifacts with security-focused event tracking
  • Session replay: Capability to reconstruct and visualize agent behavior by sequencing proof of action artifacts chronologically
  • Observability: Broader practice of understanding system behavior through metrics, logs, and traces, of which proof of action is a specialized application
  • Screenshots: Visual artifacts capturing application state, a primary proof of action artifact type for computer-use agents
  • Proof of action artifact: Individual evidence items (screenshot, log entry, state snapshot) within the proof of action system