Proof-of-action artifact

A proof-of-action artifact is a concrete evidence object that documents and verifies autonomous agent actions within computer-use systems. These artifacts include screenshots, video recordings, DOM snapshots, API response logs, and state captures that provide immutable, timestamped records of what an agent actually did during task execution.

Unlike simple text logs that describe actions, proof-of-action artifacts preserve the actual visual, structural, or data state at the moment of interaction, enabling precise reconstruction and verification of agent behavior.

Why it matters

Proof-of-action artifacts serve as the foundation for trustworthy autonomous systems across three critical dimensions:

Dispute resolution and accountability

When an agent completes a financial transaction, books a travel reservation, or modifies production data, artifacts provide irrefutable evidence of what occurred. If a user claims "the agent booked the wrong flight" or "deleted the wrong records," screenshot artifacts showing the exact UI state and interaction sequence resolve disputes definitively. This is particularly crucial for high-stakes operations where agent errors can have significant financial or operational consequences.

For example, in a customer service scenario where an agent processes a refund, artifacts documenting the original order details, the refund form inputs, and the final confirmation screen provide objective evidence that the agent executed the refund correctly—or reveal exactly where the process went wrong if the outcome was incorrect.

Regulatory compliance and auditability

Industries like healthcare, finance, and legal services require detailed audit trails for compliance with regulations such as HIPAA, SOX, GDPR, and SEC rules. Proof-of-action artifacts create tamper-evident records that satisfy regulatory requirements for demonstrating who (or what) performed an action, when it occurred, and what the system state was before and after.

Auditors can review artifacts to verify compliance without needing to reproduce complex agent behaviors. A financial trading agent must maintain artifacts proving it followed mandate restrictions, observed trading windows, and received proper authorization. These artifacts might include API request payloads, timestamped authorization tokens, exchange response confirmations, and account balance snapshots before and after each transaction—creating a complete chain of evidence for regulatory review.

Debugging and system improvement

When agents fail or behave unexpectedly, artifacts enable engineers to see exactly what the agent saw and understand why it made specific decisions. A screenshot artifact revealing that a button was obscured by a modal dialog explains why the agent couldn't complete a task. DOM snapshots showing unexpected HTML structure identify website changes that broke agent workflows. This observability dramatically reduces mean time to resolution for agent issues.

Artifacts also expose edge cases and failure modes that are difficult to reproduce deliberately. An agent that occasionally fails to complete checkout might reveal through artifacts that a specific combination of payment method, shipping option, and promotional code triggers a race condition in the target system—an issue that would be nearly impossible to diagnose from text logs alone.

Concrete examples

Screenshot artifacts

Visual captures of the agent's screen or browser viewport at critical interaction points:

{
  "artifact_id": "art_1a2b3c4d",
  "type": "screenshot",
  "timestamp": "2024-01-15T14:32:18.427Z",
  "action_id": "act_xyz789",
  "action_type": "click",
  "target_element": "button#confirm-purchase",
  "image_url": "s3://artifacts/sessions/sess_abc/art_1a2b3c4d.png",
  "dimensions": {
    "width": 1920,
    "height": 1080
  },
  "annotations": {
    "click_coordinates": [850, 420],
    "bounding_boxes": {
      "target_element": [800, 400, 900, 450]
    }
  },
  "metadata": {
    "page_url": "https://shop.example.com/checkout",
    "viewport_scroll_position": [0, 340]
  }
}

Screenshot artifacts capture the complete visual context, including overlays, popups, tooltips, and transient UI states that wouldn't be preserved in DOM snapshots alone. A customer service agent processing a refund might capture screenshots of the customer account page showing order history, the specific order detail page, the refund form with populated fields, the confirmation dialog, and the final success state showing the updated account balance—each screenshot providing visual proof of correct operation.

API response logs

Complete request-response pairs that document data exchanges between agents and external systems:

{
  "artifact_id": "art_5e6f7g8h",
  "type": "api_response",
  "timestamp": "2024-01-15T14:32:19.112Z",
  "action_id": "act_xyz790",
  "request": {
    "method": "POST",
    "url": "https://api.stripe.com/v1/payment_intents",
    "headers": {
      "authorization": "[REDACTED]",
      "content-type": "application/x-www-form-urlencoded"
    },
    "body": {
      "amount": 4999,
      "currency": "usd",
      "payment_method": "pm_1234567890"
    }
  },
  "response": {
    "status": 200,
    "headers": {
      "content-type": "application/json",
      "request-id": "req_abc123xyz"
    },
    "body": {
      "id": "pi_abcdefghijklmnop",
      "status": "succeeded",
      "amount": 4999,
      "created": 1705329139
    },
    "duration_ms": 342
  }
}

These artifacts prove exactly what data the agent sent and received, enabling verification of correct API usage and debugging of integration issues. A travel booking agent coordinating flight, hotel, and car rental reservations would maintain separate artifact chains for each service, enabling verification that the agent received correct availability information and that the booking confirmations match the intended reservations.

API response logs include full HTTP request details (headers, method, URL, query parameters, and body payload), complete responses (status code, headers, and body), timing information showing request initiation, first byte, and completion times, and retry attempts with any error responses received before success.

DOM snapshots

Complete HTML structure captures that preserve the exact state of web pages during interaction:

{
  "artifact_id": "art_9i0j1k2l",
  "type": "dom_snapshot",
  "timestamp": "2024-01-15T14:32:17.891Z",
  "action_id": "act_xyz788",
  "page_url": "https://shop.example.com/cart",
  "html_content_url": "s3://artifacts/sessions/sess_abc/art_9i0j1k2l.html",
  "computed_styles_url": "s3://artifacts/sessions/sess_abc/art_9i0j1k2l_styles.json",
  "dom_stats": {
    "total_elements": 1847,
    "interactive_elements": 23,
    "form_fields": 0
  },
  "extracted_data": {
    "cart_items": 3,
    "total_price": "$49.99",
    "shipping_method": "Standard"
  }
}

DOM snapshots enable replay and analysis of agent behavior even when the original web page has changed or is no longer accessible. When debugging why an agent failed to locate a "Submit" button, a DOM snapshot reveals whether the button existed, whether it was hidden by CSS, whether it was disabled, or whether its label was different than expected.

These snapshots include full HTML structure with all elements, attributes, and text content, computed styles and layout information for elements the agent considers, form field values and selection states, dynamic content loaded via JavaScript, and shadow DOM contents for web components.

Video recordings

Continuous screen recordings that capture the complete sequence of agent interactions:

{
  "artifact_id": "art_3m4n5o6p",
  "type": "video_recording",
  "timestamp_start": "2024-01-15T14:31:45.000Z",
  "timestamp_end": "2024-01-15T14:33:22.000Z",
  "duration_seconds": 97,
  "session_id": "sess_abc",
  "video_url": "s3://artifacts/sessions/sess_abc/recording.mp4",
  "format": "mp4",
  "codec": "h264",
  "resolution": "1920x1080",
  "frame_rate": 30,
  "action_markers": [
    {
      "action_id": "act_xyz788",
      "timestamp_offset": 12.4,
      "description": "Navigate to cart page"
    },
    {
      "action_id": "act_xyz789",
      "timestamp_offset": 33.8,
      "description": "Click confirm purchase"
    }
  ]
}

Video artifacts provide the most comprehensive record but come with significantly higher storage costs and processing requirements. Session replays excel at exposing timing-dependent issues like race conditions, animation states that confuse computer vision models, or transient error messages that appear briefly before the agent proceeds.

State snapshots

Structured data captures of application state, database records, or system configuration at transaction boundaries:

{
  "artifact_id": "art_7q8r9s0t",
  "type": "state_snapshot",
  "timestamp": "2024-01-15T14:32:20.551Z",
  "action_id": "act_xyz791",
  "state_type": "database_record",
  "entity_type": "order",
  "entity_id": "order_12345",
  "snapshot_timing": "after_action",
  "state_data": {
    "order_id": "order_12345",
    "status": "confirmed",
    "total_amount": 4999,
    "payment_status": "completed",
    "updated_at": "2024-01-15T14:32:20.551Z"
  }
}

State snapshots document before and after states for any modified records, complete object graphs for entities involved in multi-step operations, configuration values active at the time of execution, and session state including authentication status and user context. A document processing agent that extracts data from invoices and updates an accounting system would snapshot the original document metadata, the extracted fields with confidence scores, the database records before modification, and the final committed state.

Common pitfalls

Missing contextual information

Recording artifacts without sufficient metadata renders them difficult to interpret or useless for debugging. A screenshot showing a generic error message without the accompanying HTTP request, agent reasoning chain, or task context cannot explain why the error occurred.

Avoid: Saving raw screenshots with only timestamps.

Instead: Bundle artifacts with comprehensive context including the agent's goal, current task step, page URL, relevant DOM selectors, previous actions, and decision-making rationale. Link artifacts to their parent session, workflow, and user request.

Always capture temporal context through timestamps, sequence numbers, and session identifiers. Include causal context by linking artifacts to the decision-making process that produced the action. An artifact showing an agent clicked a button should reference the perception data that identified the button and the plan step that required clicking it.

Excessive storage costs

Capturing high-resolution screenshots or video for every agent action generates massive storage costs that scale poorly. A simple form-filling task might generate 50+ screenshots, consuming several megabytes for a workflow that could be represented in kilobytes with selective artifact capture.

Naive artifact collection strategies can generate terabytes of data. Storing full-page screenshots at 2MB each for every action in a session that performs 500 operations generates a gigabyte per session. Multiply across thousands of concurrent agents and storage costs become prohibitive.

Avoid: Blanket policies like "screenshot every action" or "record all sessions."

Instead: Implement intelligent artifact strategies that capture heavily at critical points (authentication, payments, data modifications) and lightly during routine navigation. Use compression, downsampling, and retention policies that delete low-value artifacts after fixed periods while preserving high-value audit artifacts indefinitely.

Implement intelligent artifact retention policies:

  • Retain all artifacts for recent sessions (e.g., last 7 days) for debugging active issues
  • Downsample older sessions, keeping only artifacts at critical decision points
  • Compress artifacts aggressively—screenshots compress well, JSON logs benefit from deduplication
  • Archive artifacts to cold storage after initial retention periods
  • Delete artifacts for successful sessions after compliance retention requirements are met

PII and sensitive data exposure

Artifacts frequently capture sensitive information including passwords, credit card numbers, personal health data, social security numbers, and confidential business information displayed in UI or API responses. Storing this data without proper redaction creates significant security and compliance risks.

Avoid: Storing raw screenshots and API logs that contain sensitive fields.

Instead: Implement automated redaction that masks sensitive fields before artifact storage. Use pattern matching, element tagging, and API schema definitions to identify and redact PII. For example, blur form fields tagged as type="password", mask credit card numbers matching regex patterns, and redact API fields marked as sensitive in schema definitions. Store redaction metadata separately for authorized access when needed.

Implement PII detection and redaction pipelines that process artifacts before storage:

  • Scan screenshots for credit card numbers, SSNs, and other sensitive patterns
  • Redact form field values containing passwords or security tokens
  • Mask email addresses and phone numbers in DOM snapshots
  • Replace actual customer names with pseudonymized identifiers
  • Encrypt artifacts at rest and restrict access through role-based permissions

Incomplete failure artifacts

The most valuable artifacts are those captured during failures, yet failures often disrupt artifact collection itself. An agent that crashes due to memory exhaustion may fail to persist the very artifacts that would explain the root cause.

Avoid: Buffering artifacts in memory and flushing only at session end.

Instead: Implement resilient artifact collection:

  • Flush artifacts to storage incrementally rather than buffering entire sessions
  • Use separate processes or threads for artifact collection to isolate from agent failures
  • Maintain heartbeat monitoring that triggers artifact dumps if agents become unresponsive
  • Capture environment state (memory usage, CPU, open connections) at regular intervals
  • Implement emergency artifact capture in exception handlers before agent termination

Artifact tampering vulnerabilities

Storing artifacts in mutable storage systems or without integrity verification allows bad actors to modify evidence after the fact, undermining the trustworthiness of the entire proof-of-action system.

Avoid: Storing artifacts in standard file systems or databases without tamper detection.

Instead: Use content-addressable storage where artifact IDs are cryptographic hashes of content, making tampering detectable. Implement write-once storage policies. Sign artifacts with cryptographic keys and verify signatures during retrieval. Consider blockchain or distributed ledger integration for high-stakes compliance scenarios.

Insufficient artifact granularity

Capturing only end-of-workflow artifacts misses intermediate failures and makes debugging difficult. If an agent completes a 20-step checkout flow and only the final "order placed" screenshot is saved, diagnosing why the wrong shipping address was selected becomes nearly impossible.

Avoid: Saving artifacts only at workflow completion or failure.

Instead: Capture artifacts at state transitions, decision points, and action boundaries throughout workflows. Balance storage costs against debugging value by using tiered strategies—lightweight artifacts (DOM snapshots, API logs) at every step, heavyweight artifacts (screenshots, video) at critical points.

Artifact-observation coupling

The act of capturing artifacts can alter agent behavior, particularly for timing-sensitive operations. Taking screenshots introduces latency, serializing DOM snapshots allocates memory, and writing artifacts to disk consumes I/O bandwidth that affects overall system performance.

Avoid: Capturing artifacts synchronously in the agent's critical path.

Instead: Design artifact collection to minimize observation effects:

  • Capture artifacts asynchronously after actions complete
  • Use copy-on-write techniques to snapshot state without blocking the agent
  • Sample artifacts at lower rates during performance-critical operations
  • Disable heavyweight artifacts (like video recordings) in production, enable for targeted debugging

Implementation

Artifact type selection

Choose artifact types based on specific verification and debugging needs. Not all actions require the same artifact evidence. Match artifact types to action criticality and debugging value:

Screenshots: Best for visual verification, UI state capture, and user-facing evidence. Required when agent success depends on visual rendering (OCR tasks, layout verification, image-based workflows). Moderate storage cost, high interpretability.

Use for high-value artifacts for irreversible operations like financial transactions, data deletion, or permission changes. Capture full screenshot + DOM snapshot + API request/response + state before/after with cryptographic signatures proving artifact authenticity and chain-of-custody metadata linking artifacts to authorization decisions.

DOM snapshots: Optimal for web automation debugging, element selection verification, and replay scenarios. Lower storage cost than screenshots, enables programmatic analysis and selective rendering. Use compressed JSON or MessagePack formats.

Include full HTML structure with all elements, attributes, and text content, computed styles and layout information for elements the agent considers, form field values and selection states, dynamic content loaded via JavaScript, and shadow DOM contents for web components.

API logs: Essential for integration debugging, data flow verification, and distributed system tracing. Minimal storage cost, high debugging value. Implement structured logging with correlation IDs linking requests across services.

Capture full HTTP request including headers, method, URL, query parameters, and body payload, complete response including status code, headers, and body, timing information showing request initiation, first byte, and completion times, and retry attempts with any error responses received before success.

Video recordings: Required for compliance scenarios demanding complete interaction records, complex debugging, or user training. Highest storage cost, use selectively for high-value sessions or when explicitly required.

Session replays capture time-synchronized combinations of screenshots, DOM snapshots, network activity, and input events that enable complete reconstruction of agent sessions. Include mouse movements and click events with pixel coordinates, keyboard inputs with timing (sensitive values redacted), scroll events and viewport changes, network requests correlated with UI interactions, and console logs and JavaScript errors.

State snapshots: Application-specific state captures (database records, memory dumps, configuration files) at action boundaries. Enable precise replay and rollback. Size varies by application.

Document before and after states for any modified records, complete object graphs for entities involved in multi-step operations, configuration values active at the time of execution, and session state including authentication status and user context.

Medium-value artifacts for standard operations like navigation, form filling, or data retrieval:

  • Targeted screenshots of relevant UI regions
  • API responses without full request details
  • State snapshots of modified entities only

Low-value artifacts for routine operations like scrolling, hovering, or reading:

  • Log entries with timestamps and action descriptions
  • Sampled screenshots at key transition points
  • Aggregated metrics rather than individual artifacts

Storage strategy

Implement a multi-tier storage architecture optimized for access patterns and cost:

interface ArtifactStorageStrategy {
  // Hot storage: Frequently accessed artifacts from recent sessions
  // SSD-backed object storage, 7-30 day retention, < 100ms retrieval
  hot: {
    location: "s3://artifacts-hot/",
    retention_days: 30,
    access_tier: "frequent",
    encryption: "AES-256-GCM"
  },

  // Warm storage: Audit artifacts and debugging reference
  // Standard object storage, 90-365 day retention, < 1s retrieval
  warm: {
    location: "s3://artifacts-warm/",
    retention_days: 365,
    access_tier: "infrequent",
    encryption: "AES-256-GCM",
    compression: "zstd"
  },

  // Cold storage: Long-term compliance and legal hold
  // Glacier-class storage, multi-year retention, minutes-hours retrieval
  cold: {
    location: "s3://artifacts-cold/",
    retention_days: 2555, // 7 years
    access_tier: "archive",
    encryption: "AES-256-GCM",
    compression: "zstd",
    integrity_verification: "sha256"
  }
}

// Automatic lifecycle transitions
const lifecyclePolicy = {
  rules: [
    {
      // Move screenshots to warm after 30 days
      artifact_types: ["screenshot", "video"],
      transition_to_warm: 30,
      transition_to_cold: 365
    },
    {
      // Keep API logs and DOM snapshots hot longer (cheaper storage)
      artifact_types: ["api_response", "dom_snapshot"],
      transition_to_warm: 90,
      transition_to_cold: 730
    }
  ]
};

Hot tier (SSD-backed object storage, <100ms retrieval):

  • Recent sessions (last 24-48 hours)
  • Any sessions flagged for active debugging
  • Critical artifacts for compliance (financial transactions, healthcare actions)
  • Enables real-time debugging and immediate incident response

Warm tier (standard object storage, <1s retrieval):

  • Sessions from last 30-90 days
  • Serves compliance reporting and retrospective analysis
  • Supports customer service investigations of recent issues

Cold tier (archival storage, <5s retrieval):

  • Sessions beyond 90 days but within regulatory retention periods
  • Rare access, primarily for legal discovery or long-term pattern analysis
  • Significant cost savings (typically 90% less than hot storage)

Offline/tape tier (multi-hour retrieval):

  • Sessions beyond compliance retention but kept for historical purposes
  • Accessed only for extraordinary circumstances

Partitioning strategy: Organize artifacts by session, date, and user to enable efficient bulk operations and cost-effective deletion. Use structure like s3://artifacts/year=2024/month=01/day=15/session=abc/.

Compression: Apply lossless compression to JSON artifacts (API logs, DOM snapshots) achieving 5-10x size reduction. Use lossy compression for screenshots when visual quality degradation is acceptable.

Implement automatic lifecycle policies that transition artifacts between tiers based on age and access patterns. Use object storage versioning to protect against accidental deletion.

Retrieval and query systems

Build efficient access patterns for common artifact retrieval scenarios. Raw artifact storage is insufficient—teams need efficient ways to locate relevant artifacts among millions of stored objects:

interface ArtifactQueryAPI {
  // Find artifacts by session
  getSessionArtifacts(sessionId: string, options?: {
    types?: ArtifactType[],
    startTime?: Date,
    endTime?: Date,
    limit?: number
  }): Promise<Artifact[]>;

  // Find artifacts by action
  getActionArtifacts(actionId: string): Promise<Artifact[]>;

  // Find artifacts by search criteria
  searchArtifacts(query: {
    userId?: string,
    agentId?: string,
    timeRange: [Date, Date],
    actionTypes?: string[],
    pageUrls?: string[],
    hasErrors?: boolean
  }): Promise<Artifact[]>;

  // Retrieve artifact with integrity verification
  getArtifact(artifactId: string, options?: {
    verifyIntegrity?: boolean,
    includeMetadata?: boolean
  }): Promise<{
    artifact: Artifact,
    content: Buffer | string,
    contentHash: string,
    verified: boolean
  }>;
}

Common query patterns include:

Time-based queries: "Show me all artifacts from agent-7 between 14:00 and 14:30 UTC on 2024-11-15"

Action-based queries: "Find all artifacts where agents clicked 'Submit Payment' buttons"

Outcome-based queries: "Retrieve artifacts from sessions that ended in errors"

Entity-based queries: "Get all artifacts related to order ID 12345 across any agent sessions"

Implement these query patterns through:

  • Structured metadata indexes (DynamoDB, PostgreSQL) mapping session IDs, timestamps, action types, and entities to artifact storage locations
  • Full-text search (Elasticsearch) over artifact content for finding specific UI text, error messages, or API responses
  • Graph databases linking artifacts to decision traces, showing which perceptions led to which actions
  • Time-series databases aggregating artifact metrics for capacity planning and anomaly detection

Indexing strategy: Maintain searchable metadata in a fast database (PostgreSQL, DynamoDB) with references to actual artifact storage. Index on session_id, user_id, timestamp, action_type, and page_url for common query patterns.

Caching layer: Cache recently accessed artifacts in Redis or CDN to reduce retrieval latency and storage API costs for frequently reviewed sessions.

Artifact capture implementation

Integrate artifact generation directly into the agent action loop:

async function executeAgentAction(
  action: AgentAction,
  context: ExecutionContext
): Promise<ActionResult> {
  const artifactCollector = new ArtifactCollector(context.sessionId);

  // Pre-action artifacts
  if (shouldCaptureBefore(action)) {
    await artifactCollector.captureScreenshot({
      label: "before_action",
      actionId: action.id,
      annotations: {
        target_element: action.targetElement
      }
    });

    await artifactCollector.captureDOMSnapshot({
      actionId: action.id,
      extractSelectors: [action.targetElement]
    });
  }

  // Execute action with instrumentation
  const result = await instrumentedExecute(action, {
    onApiRequest: (req) => artifactCollector.captureAPIRequest(req),
    onApiResponse: (res) => artifactCollector.captureAPIResponse(res),
    onError: (err) => artifactCollector.captureError(err)
  });

  // Post-action artifacts
  if (shouldCaptureAfter(action, result)) {
    await artifactCollector.captureScreenshot({
      label: "after_action",
      actionId: action.id,
      success: result.success
    });

    if (isStateChangingAction(action)) {
      await artifactCollector.captureStateSnapshot({
        actionId: action.id,
        stateType: "application",
        state: await context.getApplicationState()
      });
    }
  }

  // Upload artifacts asynchronously
  await artifactCollector.flush();

  return result;
}

This pattern ensures artifacts are consistently captured while minimizing performance impact through asynchronous upload and intelligent filtering.

Artifact correlation and reconstruction

Individual artifacts gain value when correlated into coherent narratives. Implement correlation systems that:

Link artifacts temporally: Order artifacts by precise timestamps, handling clock skew and distributed system timing challenges

Link artifacts causally: Connect perception artifacts (screenshots analyzed) to decision artifacts (plan steps selected) to action artifacts (API calls executed) to outcome artifacts (success/failure states)

Link artifacts across agents: When multiple agents coordinate on a task, correlate their artifact streams to reconstruct the complete interaction

Reconstruct sessions: Provide tools that take an artifact set and generate human-readable reports or interactive visualizations showing the agent's complete journey

Session reconstruction tools should generate:

  • Timeline views showing artifacts in chronological order with action annotations
  • Side-by-side comparisons of state before/after critical operations
  • Network diagrams showing API call dependencies and response times
  • Annotated screenshots highlighting which UI elements the agent detected and interacted with

Key metrics

Artifact completeness

Measures the percentage of agent actions with required artifacts successfully captured and stored.

Formula: (Actions with all required artifacts / Total actions) × 100

Target: > 99.9% for critical actions (payments, data modifications), > 95% for standard actions

Tracking: Monitor by action type, session, and agent. Alert when completeness drops below thresholds, indicating artifact capture failures, storage issues, or agent errors preventing artifact generation.

Low completeness indicates systematic artifact collection failures. Common causes include:

  • Agent crashes before artifacts flush to storage
  • Network failures preventing upload to artifact stores
  • Resource exhaustion (disk space, memory) blocking artifact creation
  • Code paths that bypass artifact collection logic

Track completeness separately by artifact type (screenshots vs. DOM snapshots vs. API logs) to identify specific collection failures.

Storage cost per session

Measures the average storage cost incurred for artifacts generated during a complete agent session.

Formula: Total storage costs / Number of sessions

Target: < $0.10 per session for typical workflows, < $1.00 for complex multi-hour sessions

Optimization: Track cost distribution by artifact type to identify expensive capture strategies. Balance completeness requirements against budget constraints through selective capture policies.

Rising storage costs indicate:

  • Insufficient lifecycle policies moving cold artifacts to cheaper tiers
  • Excessive artifact capture for low-value actions
  • Poor compression ratios suggesting artifact format optimization opportunities
  • Growing session lengths or action counts requiring revised sampling strategies

Monitor storage costs by artifact type to identify optimization opportunities. Screenshots and video often dominate costs.

Artifact retrieval time

Measures the latency to retrieve artifacts during debugging, audit, or review scenarios.

Formula: Time from query to artifact availability (p50, p95, p99)

Target: < 200ms p50, < 1s p95 for hot storage; < 2s p50, < 10s p95 for warm storage

Monitoring: Track retrieval performance by storage tier, artifact type, and access pattern. Slow retrievals indicate indexing issues, cache misses, or storage tier misconfiguration.

Slow retrieval times indicate:

  • Insufficient indexing forcing full scans of artifact metadata
  • Object storage performance issues or inadequate provisioned throughput
  • Query patterns that don't align with index structures
  • Artifacts distributed across too many storage locations requiring scatter-gather operations

Optimize retrieval through:

  • Colocating related artifacts in storage (all artifacts for a session in nearby keys)
  • Maintaining denormalized indexes that support common query patterns without joins
  • Caching frequently accessed artifacts (recently failed sessions, compliance review cases)
  • Pre-computing artifact bundles for anticipated queries

Artifact storage efficiency

Measures the compression and deduplication effectiveness in reducing storage footprint.

Formula: Actual storage used / Uncompressed artifact size

Target: < 0.3 for JSON artifacts (DOM snapshots, API logs), < 0.6 for screenshots with compression

Improvement: Implement content-aware compression, deduplication of identical screenshots (common in repeated navigation), and delta encoding for sequential DOM snapshots.

Time to resolution with artifacts

Measures how artifacts reduce debugging time compared to reproduction attempts.

Formula: Mean time to identify root cause (with artifacts vs. without artifacts)

Target: 5-10x faster resolution with comprehensive artifacts compared to blind reproduction

Value demonstration: Track debugging sessions and categorize by artifact availability. Quantify engineering time savings to justify artifact infrastructure investment.

Artifact retention compliance

Measures adherence to defined retention policies, ensuring artifacts are kept as long as required but deleted when permitted.

Target: 100% compliance—all artifacts within retention periods are accessible, all artifacts beyond retention are deleted.

Formula: (artifacts_matching_policy / total_artifacts_subject_to_policy) × 100

Non-compliance creates legal risk (failing to retain required records) or unnecessary costs (retaining data longer than required). Implement automated compliance checking:

  • Daily scans verifying all artifacts within retention periods are accessible and uncorrupted
  • Automated deletion workflows for artifacts exceeding retention periods
  • Audit logs proving deletion occurred in accordance with policies
  • Legal hold mechanisms that preserve artifacts for litigation even beyond normal retention

Related concepts

  • Proof-of-action - The complete system of documenting and verifying agent actions through artifacts, logs, and audit trails
  • Screenshots - Visual capture techniques and best practices for agent observability
  • Audit log - Structured event records that complement artifact evidence with detailed action metadata
  • Session replay - Technologies for reconstructing complete agent sessions using artifacts and interaction logs
  • Observability - The broader practice of understanding system behavior through telemetry and monitoring
  • Computer-use agent - Autonomous systems that interact with computers through UI, generating proof-of-action artifacts during operation