PII Redaction

PII redaction is the automated removal or masking of personally identifiable information from agent observations, logs, and artifacts. This process protects sensitive data by detecting and obscuring names, addresses, social security numbers, credit card details, and other personal identifiers before they are stored, transmitted, or analyzed by AI systems.

Why It Matters

PII redaction serves as a critical safeguard in computer-use agents where automated systems interact with user interfaces containing sensitive information.

Regulatory Compliance

GDPR Article 32 requires "appropriate technical and organizational measures" to protect personal data. PII redaction demonstrates due diligence by implementing pseudonymization at the observation layer. Organizations processing EU citizen data face fines up to 4% of annual global turnover for inadequate protection measures. CCPA, HIPAA, and other frameworks impose similar obligations with their own penalty structures.

Data Breach Prevention

Agent systems capture screenshots, API responses, and form interactions that frequently contain sensitive data. A single unredacted log entry can expose thousands of customer records if an attacker gains database access. The 2023 Verizon Data Breach Investigations Report found that 83% of breaches involved data that should have been redacted at capture time. Real-time redaction ensures that even if storage systems are compromised, the exposed data has minimal value.

Customer Trust and Brand Protection

Users expect that systems handling their information implement privacy-by-design principles. Visible redaction in user-facing proof-of-action artifacts signals to customers that their privacy is actively protected. Conversely, accidental PII exposure in support tickets, error reports, or training datasets damages reputation and triggers user churn. Companies like Apple and Microsoft explicitly market their PII protection measures as competitive differentiators.

Concrete Examples

Regex-Based Pattern Matching

The simplest redaction approach uses regular expressions to detect structured PII formats:

import re

patterns = {
    'ssn': r'\b\d{3}-\d{2}-\d{4}\b',
    'credit_card': r'\b\d{4}[\s-]?\d{4}[\s-]?\d{4}[\s-]?\d{4}\b',
    'email': r'\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b',
    'phone': r'\b(\+\d{1,2}\s?)?\(?\d{3}\)?[\s.-]?\d{3}[\s.-]?\d{4}\b'
}

def redact_text(text):
    for pii_type, pattern in patterns.items():
        text = re.sub(pattern, f'[REDACTED_{pii_type.upper()}]', text)
    return text

This method excels at detecting formatted data but misses unstructured PII like names in natural language.

Named Entity Recognition Models

Production systems combine regex with machine learning models trained to identify person names, locations, and organizations in freeform text:

from presidio_analyzer import AnalyzerEngine
from presidio_anonymizer import AnonymizerEngine

analyzer = AnalyzerEngine()
anonymizer = AnonymizerEngine()

text = "John Smith's address is 123 Main St, and his SSN is 123-45-6789"
results = analyzer.analyze(text=text, language='en')
anonymized = anonymizer.anonymize(text=text, analyzer_results=results)
# Output: "<PERSON>'s address is <LOCATION>, and his SSN is <US_SSN>"

Microsoft Presidio, AWS Comprehend, and Google Cloud DLP API provide pre-trained models with 90%+ recall on standard PII types. These tools detect contextual patterns that pure regex cannot capture.

Screenshot and Image Redaction

Computer-use agents capture visual observations requiring pixel-level redaction:

import pytesseract
from PIL import Image, ImageDraw

def redact_screenshot(image_path):
    img = Image.open(image_path)
    ocr_data = pytesseract.image_to_data(img, output_type=pytesseract.Output.DICT)

    draw = ImageDraw.Draw(img)
    for i, text in enumerate(ocr_data['text']):
        if is_sensitive(text):  # Apply PII detection
            x, y, w, h = (ocr_data['left'][i], ocr_data['top'][i],
                         ocr_data['width'][i], ocr_data['height'][i])
            draw.rectangle([x, y, x+w, y+h], fill='black')

    img.save('redacted_' + image_path)

Advanced implementations use computer vision models to detect form fields, credit cards, and ID documents before OCR processing.

Common Pitfalls

Incomplete Redaction Patterns

Many implementations focus on US-centric formats while missing international variations. UK National Insurance numbers (AB123456C), Canadian Social Insurance Numbers (123-456-789), and IBAN codes require region-specific patterns. Date formats vary globally—MM/DD/YYYY vs DD/MM/YYYY—and partial dates combined with other data can re-identify individuals.

Multi-field PII poses detection challenges. A first name alone is not sensitive, but "Jennifer" appearing near "jennifer.smith@company.com" creates linkability. Address components scattered across form fields require state-tracking to detect full addresses.

False Positives Breaking Functionality

Overly aggressive redaction disrupts legitimate operations. Redacting "John" from system usernames breaks authentication logs. Phone number patterns match order IDs (1234-5678-9012), causing support teams to lose tracking references. One financial services company accidentally redacted transaction amounts matching credit card patterns, rendering fraud detection logs useless.

The solution requires confidence scoring and context awareness. Only redact when multiple signals confirm PII—a 16-digit number in a field labeled "Card Number" versus in a URL parameter.

Post-Capture Redaction Gaps

Many teams apply redaction only to final storage while leaving intermediate buffers exposed. Agent memory stores, message queues, and application logs often contain unredacted data during processing. If the system crashes before redaction, sensitive data persists in error dumps and temporary files.

Third-party observability tools (DataDog, Sentry) ingest logs before redaction occurs. A developer accidentally exposed 50,000 email addresses by enabling verbose logging without checking that the observability pipeline included PII filters.

Implementation

Detection Strategies

Layered Detection Pipeline: Implement multiple detection methods in sequence, each optimized for different PII types:

Format-based regex for structured identifiers (SSN, credit cards, phone numbers)
Named Entity Recognition for person names, locations, organizations
Field-level classification using form labels and HTML attributes
Custom business logic for domain-specific identifiers (customer IDs, account numbers)

Pre-Trained vs Custom Models: Start with commercial APIs (AWS Comprehend, Google DLP) providing 95%+ precision on common PII. Fine-tune open-source models (spaCy, Hugging Face) on your domain when you have labeled examples. Healthcare systems require custom models for medical record numbers; fintech needs models trained on proprietary transaction formats.

Real-Time vs Batch Processing: Agent observations require real-time redaction with <100ms latency to avoid blocking action execution. Use fast regex and rule-based systems in the hot path. Schedule batch ML model processing for historical data and quality audits.

Redaction Methods

Masking Strategies:

Full removal: "SSN 123-45-6789" → "SSN [REDACTED]"
Partial masking: "Card 4532-1234-5678-9010" → "Card ****-****-****-9010"
Tokenization: Replace with consistent pseudonyms—"John Smith" becomes "User_A7B2" across all logs
Synthetic replacement: Substitute realistic fake data—"555-0123" instead of actual phone

Choose based on downstream use cases. Support teams need last-4-digits for verification. Analytics systems require consistent tokens for tracking user journeys. Synthetic data enables realistic testing environments.

Visual Redaction: For screenshots, apply bounding boxes before encoding:

def redact_regions(image, sensitive_boxes):
    for box in sensitive_boxes:
        x, y, width, height = box
        # Apply Gaussian blur instead of black boxes for less obvious redaction
        region = image[y:y+height, x:x+width]
        blurred = cv2.GaussianBlur(region, (51, 51), 0)
        image[y:y+height, x:x+width] = blurred
    return image

Validation and Testing

Synthetic PII Injection: Generate test data containing known PII and verify 100% detection:

test_cases = [
    "My SSN is 123-45-6789",
    "Contact me at john.doe@example.com or (555) 123-4567",
    "Credit card: 4532-1234-5678-9010, exp 12/25"
]

for test in test_cases:
    redacted = redact_text(test)
    assert not contains_pii(redacted), f"Failed to redact: {test}"

Human Review Sampling: Randomly sample 1% of redacted outputs for manual verification. Track false negative rate (missed PII) and false positive rate (incorrect redactions). Aim for <0.1% false negatives in production.

Penetration Testing: Have security teams attempt to reconstruct PII from redacted logs. Test if combining multiple log entries enables re-identification through inference attacks.

Key Metrics to Track

Detection Accuracy

Precision: Percentage of redactions that correctly identify PII. Target ≥95% to minimize disruption from false positives. Measure per PII type—credit card precision may be 99% while name precision is 92%.

Recall: Percentage of actual PII successfully detected. Regulatory compliance requires ≥98% recall for high-risk categories (SSN, medical records). Track using labeled test sets refreshed quarterly.

F1 Score: Harmonic mean balancing precision and recall. Production systems should maintain F1 ≥0.96 across all PII categories.

Redaction Latency

P50/P95/P99 Processing Time: Measure end-to-end latency from observation capture to redacted output. Agent systems require:

P50 < 50ms for inline redaction
P95 < 150ms to avoid user-perceivable delays
P99 < 500ms as absolute ceiling

Track separately for text (faster) vs images (slower). Implement caching for repeated redaction patterns.

False Positive Rate

Functional Impact: Percentage of redactions that break intended functionality. Examples include:

Redacted usernames preventing log correlation
Masked transaction IDs breaking support workflows
Obscured product names hindering analytics

Target <2% false positive rate with zero critical-path breakages. Monitor customer support tickets mentioning "redacted" or "hidden information" as proxy metrics.

Coverage Metrics

Data Flow Coverage: Percentage of data pathways with redaction applied. Map all locations where PII might persist—application logs, database exports, API responses, error traces, analytics events, backup archives. Achieve 100% coverage before production deployment.

PII Type Coverage: Track detection rates per category. Many systems achieve 99% coverage for credit cards but miss 30% of passport numbers. Expand pattern libraries based on gap analysis.

Related Concepts

Data handling: Broader practices for managing sensitive information in agent systems
Guardrails: Safety mechanisms that prevent agents from exposing or mishandling PII
Proof of action: Artifacts documenting agent behavior that require PII redaction before user review
Redaction: General techniques for obscuring sensitive content across various media types

PII Redaction

Why It Matters

Regulatory Compliance

Data Breach Prevention

Customer Trust and Brand Protection

Concrete Examples

Regex-Based Pattern Matching

Named Entity Recognition Models

Screenshot and Image Redaction

Common Pitfalls

Incomplete Redaction Patterns

False Positives Breaking Functionality

Post-Capture Redaction Gaps

Implementation

Detection Strategies

Redaction Methods

Validation and Testing

Key Metrics to Track

Detection Accuracy

Redaction Latency

False Positive Rate

Coverage Metrics

Related Concepts

Related Concepts

Data handling (agents)

Guardrails (agents)

Proof of action

Redaction (fields/masks)