Redaction (fields/masks)

Redaction refers to techniques for removing or obfuscating sensitive data in screenshots, logs, and other agent artifacts. In the context of computer-use agents and agentic UI systems, redaction ensures that personally identifiable information (PII), credentials, financial data, and other sensitive content are not exposed in debugging outputs, training data, or proof-of-action records.

Why It Matters

Redaction is critical for several interconnected reasons in agentic systems:

Privacy Compliance

Computer-use agents often interact with systems containing regulated data subject to GDPR, CCPA, HIPAA, and other privacy frameworks. Screenshots captured during agent execution may inadvertently include:

  • Social Security numbers in form fields
  • Credit card details during checkout flows
  • Healthcare information in patient portals
  • Personal addresses and contact information

Without proper redaction, organizations risk regulatory violations, substantial fines, and loss of user trust. Automated redaction ensures compliance even when agents operate at scale across diverse applications.

Security Best Practices

Agent artifacts frequently contain security-sensitive information that could enable unauthorized access:

  • API keys and authentication tokens visible in browser developer tools
  • Database connection strings in configuration panels
  • Session cookies in network traffic logs
  • Administrative credentials in login screenshots

Redacting this data prevents credential leakage, reduces the attack surface for threat actors, and minimizes the impact of data breaches involving agent logs or training datasets.

Safe Debugging and Development

Developers and operators need visibility into agent behavior to diagnose failures and optimize performance. However, full data exposure creates liability:

  • Support teams may access customer PII unnecessarily
  • Development environments may retain production data
  • Third-party debugging tools could exfiltrate sensitive information

Field-level and visual redaction enables effective troubleshooting while maintaining data minimization principles, allowing teams to identify issues without exposing the underlying sensitive content.

Concrete Examples

Field-Level Masking

Structured data redaction replaces sensitive values with placeholder tokens or partial information:

// Original API response
{
  "user_id": "usr_8x7k2m",
  "email": "sarah.chen@example.com",
  "ssn": "123-45-6789",
  "credit_card": "4532-1234-5678-9010"
}

// Redacted version
{
  "user_id": "usr_8x7k2m",
  "email": "s***h@e*****e.com",
  "ssn": "***-**-6789",
  "credit_card": "****-****-****-9010"
}

Field-level masking preserves data structure and type information, enabling developers to understand data flow without accessing actual values. Common patterns include:

  • Last-N preservation: Retain final digits for verification (e.g., last 4 of credit cards)
  • Tokenization: Replace values with consistent hashes for tracking across logs
  • Format-preserving encryption: Maintain data format while encrypting content

Visual Redaction in Screenshots

Computer-use agents capture screenshots to document actions and enable visual validation. Visual redaction applies masks to sensitive regions:

# Screenshot redaction pipeline
def redact_screenshot(image_path: str) -> Image:
    """Apply visual redaction to agent screenshot."""
    image = load_image(image_path)

    # OCR-based sensitive text detection
    text_regions = ocr_extract(image)
    sensitive_regions = [
        region for region in text_regions
        if matches_sensitive_pattern(region.text)
    ]

    # Apply blur or solid overlay
    for region in sensitive_regions:
        image = apply_blur(
            image,
            bbox=region.bbox,
            radius=15
        )

    # Object detection for sensitive UI elements
    ui_elements = detect_ui_elements(image)
    for element in ui_elements:
        if element.type in ['password_field', 'ssn_field']:
            image = apply_black_box(image, element.bbox)

    return image

Visual redaction techniques include:

  • Bounding box overlays: Solid black or colored rectangles covering sensitive areas
  • Gaussian blur: Progressive blur for readability reduction
  • Pixelation: Block-level obfuscation maintaining general layout
  • Inpainting: AI-generated fill to replace sensitive content contextually

Log Sanitization

Agent execution logs contain structured and unstructured data requiring multi-strategy redaction:

# Original log
[2025-10-23 14:32:17] Agent executed: fill_form(
  email="john.doe@company.com",
  password="MyP@ssw0rd!2024",
  session_token="eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9..."
)

# Redacted log
[2025-10-23 14:32:17] Agent executed: fill_form(
  email="[REDACTED_EMAIL]",
  password="[REDACTED_PASSWORD]",
  session_token="[REDACTED_TOKEN_32_CHARS]"
)

Log sanitization applies regex patterns, named entity recognition (NER), and semantic analysis to identify:

  • Email addresses and phone numbers
  • Passwords and API keys (using entropy analysis)
  • Authentication tokens (JWT, OAuth, session IDs)
  • IP addresses and network identifiers
  • Financial account numbers

Common Pitfalls

Incomplete Coverage

The most frequent redaction failure is insufficient scope:

Problem: Redacting structured API responses but neglecting URL query parameters where the same data appears:

# API body redacted, but URL exposes data
POST /api/users HTTP/1.1
Request body: {"ssn": "[REDACTED]"}

# Meanwhile in access logs:
GET /verify?ssn=123-45-6789&user=john.doe

Solution: Implement comprehensive redaction across all artifact types—screenshots, structured logs, unstructured text, network traces, and error messages. Use centralized redaction libraries that process all data flows.

Reversible Masking

Weak redaction schemes may be reversible through pattern analysis:

Problem: Consistent tokenization without salting:

# Same input always produces same output
redact("123-45-6789") → "REDACTED_abc123"
redact("123-45-6789") → "REDACTED_abc123"  # Attacker can correlate

Solution: Use cryptographically secure hashing with per-session salts, or apply true anonymization that prevents correlation. For certain use cases, differential privacy techniques add noise to prevent re-identification.

Performance Impact

Aggressive redaction, especially visual processing, can significantly slow agent execution:

Problem: Running OCR and object detection on every screenshot introduces 2-5 second latency per capture, unacceptable for real-time agents.

Solution:

  • Apply lazy redaction—capture raw data, redact only on access or export
  • Use GPU acceleration for visual processing
  • Implement tiered redaction—lightweight patterns for real-time, comprehensive scanning for archival
  • Cache redaction rules and pre-compile regex patterns

False Positives and Negatives

Pattern-based redaction struggles with context:

Problem: Redacting "4111111111111111" in test documentation because it matches credit card format, removing legitimate test data examples.

Problem: Missing "SSN: one-two-three four-five six-seven-eight-nine" because it's written out, not numeric.

Solution: Combine multiple detection methods (regex, NER, semantic analysis), tune sensitivity based on data classification, and provide override mechanisms for known false positives. Implement feedback loops where security teams validate redaction accuracy.

Implementation

Effective redaction requires layered detection, configurable masking, and continuous validation:

Detection Methods

Regular Expression Patterns: Fast, deterministic matching for structured formats:

const REDACTION_PATTERNS = {
  SSN: /\b\d{3}-\d{2}-\d{4}\b/g,
  CREDIT_CARD: /\b\d{4}[-\s]?\d{4}[-\s]?\d{4}[-\s]?\d{4}\b/g,
  EMAIL: /\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b/g,
  PHONE: /\b\d{3}[-.]?\d{3}[-.]?\d{4}\b/g,
  API_KEY: /\b[A-Za-z0-9_-]{32,}\b/g,  // High-entropy strings
};

Named Entity Recognition (NER): ML-based identification of contextual entities:

from transformers import pipeline

ner_pipeline = pipeline("ner", model="dslim/bert-base-NER")

def detect_pii_entities(text: str) -> List[Entity]:
    """Identify PII using transformer-based NER."""
    entities = ner_pipeline(text)
    return [
        entity for entity in entities
        if entity['entity'] in ['B-PER', 'I-PER', 'B-LOC', 'I-LOC']
    ]

Entropy Analysis: Detect high-randomness strings likely to be credentials:

import math
from collections import Counter

def calculate_entropy(s: str) -> float:
    """Calculate Shannon entropy of string."""
    if not s:
        return 0.0
    counts = Counter(s)
    probabilities = [count / len(s) for count in counts.values()]
    return -sum(p * math.log2(p) for p in probabilities)

def is_likely_secret(s: str) -> bool:
    """High entropy suggests cryptographic material."""
    return len(s) >= 16 and calculate_entropy(s) > 3.5

Computer Vision: Detect sensitive UI elements in screenshots:

from transformers import DetrImageProcessor, DetrForObjectDetection
import torch

processor = DetrImageProcessor.from_pretrained("facebook/detr-resnet-50")
model = DetrForObjectDetection.from_pretrained("custom/ui-element-detector")

def detect_sensitive_ui(image: Image) -> List[BoundingBox]:
    """Identify password fields, SSN inputs, etc."""
    inputs = processor(images=image, return_tensors="pt")
    outputs = model(**inputs)

    # Post-process detections
    target_sizes = torch.tensor([image.size[::-1]])
    results = processor.post_process_object_detection(
        outputs, target_sizes=target_sizes, threshold=0.7
    )[0]

    sensitive_classes = ['password_field', 'ssn_field', 'credit_card_field']
    return [
        bbox for bbox, label in zip(results['boxes'], results['labels'])
        if model.config.id2label[label.item()] in sensitive_classes
    ]

Masking Techniques

Partial Redaction: Preserve utility while protecting data:

function partialRedact(value: string, type: DataType): string {
  switch (type) {
    case 'EMAIL':
      const [local, domain] = value.split('@');
      return `${local[0]}***@${domain}`;

    case 'CREDIT_CARD':
      return `****-****-****-${value.slice(-4)}`;

    case 'SSN':
      return `***-**-${value.slice(-4)}`;

    case 'PHONE':
      return `***-***-${value.slice(-4)}`;

    default:
      return '[REDACTED]';
  }
}

Tokenization: Maintain referential integrity across logs:

import crypto from 'crypto';

class RedactionTokenizer {
  private tokenMap: Map<string, string> = new Map();
  private salt: string;

  constructor(sessionId: string) {
    this.salt = sessionId;
  }

  tokenize(value: string, preserveFormat: boolean = false): string {
    const hash = crypto
      .createHmac('sha256', this.salt)
      .update(value)
      .digest('hex')
      .slice(0, 16);

    if (preserveFormat) {
      // Maintain character classes
      return value.replace(/[a-zA-Z]/g, 'X')
                  .replace(/\d/g, '0')
                  .replace(/[^a-zA-Z0-9]/g, '_');
    }

    return `[REDACTED_${hash}]`;
  }
}

Visual Masking: Apply overlays to screenshots:

from PIL import Image, ImageDraw, ImageFilter

def apply_visual_redaction(
    image: Image,
    regions: List[BoundingBox],
    method: str = 'blur'
) -> Image:
    """Apply visual masks to sensitive regions."""
    redacted = image.copy()

    for bbox in regions:
        x1, y1, x2, y2 = bbox

        if method == 'blur':
            region = image.crop((x1, y1, x2, y2))
            blurred = region.filter(ImageFilter.GaussianBlur(radius=15))
            redacted.paste(blurred, (x1, y1))

        elif method == 'blackbox':
            draw = ImageDraw.Draw(redacted)
            draw.rectangle([x1, y1, x2, y2], fill='black')

        elif method == 'pixelate':
            region = image.crop((x1, y1, x2, y2))
            small = region.resize(
                (region.width // 10, region.height // 10),
                Image.NEAREST
            )
            pixelated = small.resize(region.size, Image.NEAREST)
            redacted.paste(pixelated, (x1, y1))

    return redacted

Validation and Testing

Ensure redaction effectiveness through automated testing:

import pytest

class RedactionValidator:
    """Validate redaction coverage and correctness."""

    def __init__(self, redactor):
        self.redactor = redactor

    def test_pii_detection(self):
        """Ensure all PII patterns are caught."""
        test_cases = [
            ("SSN: 123-45-6789", True),
            ("Card: 4532-1234-5678-9010", True),
            ("Email: test@example.com", True),
            ("Normal text here", False),
        ]

        for text, should_redact in test_cases:
            result = self.redactor.redact(text)
            contains_sensitive = (result != text)
            assert contains_sensitive == should_redact, \
                f"Failed on: {text}"

    def test_no_false_positives(self):
        """Ensure legitimate data isn't over-redacted."""
        safe_texts = [
            "Invoice #1234-5678-9012-3456 (not a credit card)",
            "Call 123-456-7890 for info (already public)",
        ]

        for text in safe_texts:
            result = self.redactor.redact(text, strict=False)
            # Context-aware redaction should preserve these
            assert result == text or "[PUBLIC]" in result

    def test_visual_redaction_coverage(self):
        """Ensure all sensitive UI elements are masked."""
        test_screenshot = load_test_image('login_form.png')
        redacted = self.redactor.redact_image(test_screenshot)

        # OCR on redacted image should not recover passwords
        recovered_text = ocr_extract(redacted)
        assert not any(
            pattern in recovered_text
            for pattern in ['password', 'P@ssw0rd', 'secret']
        )

Key Metrics

Measure redaction system effectiveness through quantitative metrics:

Redaction Accuracy

Definition: Percentage of sensitive data correctly identified and masked.

Formula: (True Positives) / (True Positives + False Negatives) × 100

Target: > 99.5% for critical data (SSN, credit cards), > 95% for general PII

Measurement:

def calculate_redaction_accuracy(
    ground_truth: List[SensitiveSpan],
    detected: List[SensitiveSpan]
) -> float:
    """Calculate detection accuracy with span overlap."""
    true_positives = 0

    for gt_span in ground_truth:
        for det_span in detected:
            if spans_overlap(gt_span, det_span, threshold=0.8):
                true_positives += 1
                break

    return true_positives / len(ground_truth) if ground_truth else 1.0

Improvement strategies: Combine multiple detection methods, use domain-specific models, implement manual review for high-risk data.

False Positive Rate

Definition: Percentage of benign data incorrectly flagged as sensitive.

Formula: (False Positives) / (False Positives + True Negatives) × 100

Target: < 5% to minimize over-redaction and maintain utility

Measurement:

interface RedactionMetrics {
  falsePositiveRate: number;
  falsePositivesByType: Record<string, number>;
}

function calculateFalsePositiveRate(
  redactionLog: RedactionEvent[]
): RedactionMetrics {
  const falsePositives = redactionLog.filter(
    event => event.manualReview === 'not_sensitive'
  );

  const byType = falsePositives.reduce((acc, event) => {
    acc[event.detectionPattern] = (acc[event.detectionPattern] || 0) + 1;
    return acc;
  }, {} as Record<string, number>);

  return {
    falsePositiveRate: falsePositives.length / redactionLog.length,
    falsePositivesByType: byType,
  };
}

Optimization: Tune pattern strictness, implement allowlists for known safe patterns, use contextual analysis.

Processing Latency

Definition: Time added to agent operations by redaction processing.

Target:

  • Text redaction: < 50ms per document
  • Screenshot redaction: < 500ms per image
  • Real-time log sanitization: < 10ms per entry

Measurement:

import time
from dataclasses import dataclass

@dataclass
class LatencyMetrics:
    p50: float  # Median
    p95: float  # 95th percentile
    p99: float  # 99th percentile
    max: float

class RedactionPerformanceMonitor:
    """Track redaction processing latency."""

    def __init__(self):
        self.latencies: List[float] = []

    def measure(self, func, *args, **kwargs):
        """Measure function execution time."""
        start = time.perf_counter()
        result = func(*args, **kwargs)
        elapsed = (time.perf_counter() - start) * 1000  # Convert to ms
        self.latencies.append(elapsed)
        return result

    def get_metrics(self) -> LatencyMetrics:
        """Calculate latency percentiles."""
        sorted_latencies = sorted(self.latencies)
        n = len(sorted_latencies)

        return LatencyMetrics(
            p50=sorted_latencies[int(n * 0.5)],
            p95=sorted_latencies[int(n * 0.95)],
            p99=sorted_latencies[int(n * 0.99)],
            max=sorted_latencies[-1],
        )

Performance optimization: Pre-compile regex patterns, use GPU for visual processing, implement caching, apply lazy redaction.

Coverage Completeness

Definition: Percentage of artifact types with redaction applied.

Target: 100% coverage across screenshots, logs, API responses, error traces, and training data

Measurement: Audit all data export paths and ensure each passes through redaction pipeline.

Related Concepts