Rollback & Undo

Rollback and undo are mechanisms for reversing agent actions when errors are detected or users request cancellation. In agentic systems, rollback capabilities restore systems to previous states after failures or unwanted changes, while undo mechanisms allow users to explicitly reverse completed actions. Together, these capabilities provide critical safety nets that enable users to trust autonomous agents with consequential operations.

Why It Matters

Rollback and undo capabilities are essential for production-grade agentic systems, transforming agents from risky automation experiments into trustworthy tools users can deploy with confidence.

User Confidence and Trust

Users grant agents significant autonomy—from modifying production databases to processing financial transactions. Without reliable rollback mechanisms, a single agent error could cause irreversible damage, making users hesitant to delegate important tasks. When agents provide clear undo capabilities, users feel empowered to experiment and leverage automation for high-stakes operations.

Consider a content management agent that updates website copy. Without undo, users must meticulously review every change before allowing the agent to proceed, eliminating efficiency gains. With transparent rollback—showing exactly what changed and offering one-click restoration—users confidently approve bulk updates, knowing mistakes are easily corrected.

Error Correction and Recovery

Agent errors are inevitable. Network failures interrupt multi-step workflows. Misunderstood instructions lead to incorrect actions. External system changes invalidate assumptions mid-execution. Rollback mechanisms ensure these failures don't compound into catastrophic outcomes.

A deployment agent that rolls out infrastructure changes exemplifies this need. If the agent successfully creates a database and cache cluster but fails when deploying the application, partial state is dangerous. Automatic rollback that tears down the database and cache prevents orphaned resources, billing waste, and configuration drift.

Compliance and Audit Requirements

Regulated industries require demonstrable controls over data modifications and system changes. Rollback capabilities serve dual purposes: they provide technical undo functionality while generating audit trails proving that unauthorized or erroneous changes can be reversed.

Financial services agents processing transactions need rollback to comply with regulations requiring compensating transactions for errors. Healthcare agents modifying patient records must maintain complete change history with rollback capabilities to satisfy HIPAA audit requirements. These aren't optional features—they're regulatory necessities.

Concrete Examples

Database Transaction Rollback

A customer service agent processes a refund by updating multiple database tables:

async function processRefund(orderId: string, amount: number) {
  const transaction = await db.beginTransaction();

  try {
    // Update order status
    await transaction.query(
      'UPDATE orders SET status = $1, refund_amount = $2 WHERE id = $3',
      ['refunded', amount, orderId]
    );

    // Credit customer account
    const order = await transaction.query(
      'SELECT customer_id FROM orders WHERE id = $1',
      [orderId]
    );
    await transaction.query(
      'UPDATE customers SET balance = balance + $1 WHERE id = $2',
      [amount, order.rows[0].customer_id]
    );

    // Record refund transaction
    await transaction.query(
      'INSERT INTO refund_log (order_id, amount, timestamp) VALUES ($1, $2, $3)',
      [orderId, amount, new Date()]
    );

    // All operations succeeded - commit
    await transaction.commit();
    return { success: true, message: 'Refund processed' };

  } catch (error) {
    // Any failure triggers automatic rollback
    await transaction.rollback();
    console.error('Refund failed, all changes rolled back:', error);
    return { success: false, message: 'Refund failed, no changes made' };
  }
}

Database transactions provide automatic rollback—if any operation fails, all changes revert, maintaining data consistency. The customer never sees a partially processed refund.

API Compensation Actions

When integrating with third-party services, rollback requires explicit compensation logic:

interface CompensationAction {
  execute: () => Promise<void>;
  description: string;
}

class CompensatingTransaction {
  private compensations: CompensationAction[] = [];

  async execute<T>(
    action: () => Promise<T>,
    compensation: () => Promise<void>,
    description: string
  ): Promise<T> {
    try {
      const result = await action();
      // Track compensation in reverse order
      this.compensations.unshift({ execute: compensation, description });
      return result;
    } catch (error) {
      await this.rollback();
      throw error;
    }
  }

  async rollback() {
    console.log(`Rolling back ${this.compensations.length} actions...`);

    for (const compensation of this.compensations) {
      try {
        console.log(`Compensating: ${compensation.description}`);
        await compensation.execute();
      } catch (error) {
        console.error(`Compensation failed for ${compensation.description}:`, error);
        // Log but continue with other compensations
      }
    }
  }
}

// Usage example: Email campaign agent
async function scheduleCampaign(campaignData: CampaignData) {
  const tx = new CompensatingTransaction();

  try {
    // Create email template in marketing platform
    const template = await tx.execute(
      () => emailService.createTemplate(campaignData.template),
      () => emailService.deleteTemplate(template.id),
      'Create email template'
    );

    // Upload recipient list
    const audience = await tx.execute(
      () => emailService.uploadAudience(campaignData.recipients),
      () => emailService.deleteAudience(audience.id),
      'Upload recipient list'
    );

    // Schedule campaign
    const campaign = await tx.execute(
      () => emailService.scheduleCampaign({
        templateId: template.id,
        audienceId: audience.id,
        sendTime: campaignData.sendTime
      }),
      () => emailService.cancelCampaign(campaign.id),
      'Schedule campaign'
    );

    return { success: true, campaignId: campaign.id };

  } catch (error) {
    // Automatic rollback via compensating actions
    return { success: false, message: 'Campaign scheduling failed, all changes reverted' };
  }
}

Compensation actions reverse API calls that don't support native transactions. Each operation registers its inverse, enabling complete rollback even across external services.

File System Restoration

An agent that reorganizes file structures needs rollback for safe experimentation:

interface FileOperation {
  type: 'move' | 'delete' | 'create' | 'modify';
  path: string;
  backup?: string;
  originalContent?: Buffer;
}

class FileSystemTransaction {
  private operations: FileOperation[] = [];
  private backupDir: string;

  constructor() {
    this.backupDir = `/tmp/agent-backup-${Date.now()}`;
    fs.mkdirSync(this.backupDir, { recursive: true });
  }

  async moveFile(source: string, destination: string) {
    // Create backup of source
    const backupPath = path.join(this.backupDir, path.basename(source));
    await fs.promises.copyFile(source, backupPath);

    this.operations.push({
      type: 'move',
      path: source,
      backup: backupPath
    });

    await fs.promises.rename(source, destination);
  }

  async deleteFile(filePath: string) {
    // Backup before deletion
    const backupPath = path.join(this.backupDir, path.basename(filePath));
    await fs.promises.copyFile(filePath, backupPath);

    this.operations.push({
      type: 'delete',
      path: filePath,
      backup: backupPath
    });

    await fs.promises.unlink(filePath);
  }

  async modifyFile(filePath: string, newContent: Buffer) {
    // Save original content
    const originalContent = await fs.promises.readFile(filePath);

    this.operations.push({
      type: 'modify',
      path: filePath,
      originalContent
    });

    await fs.promises.writeFile(filePath, newContent);
  }

  async commit() {
    // Clean up backup directory
    await fs.promises.rm(this.backupDir, { recursive: true });
  }

  async rollback() {
    console.log(`Rolling back ${this.operations.length} file operations...`);

    // Reverse operations in reverse order
    for (const op of this.operations.reverse()) {
      try {
        switch (op.type) {
          case 'delete':
            // Restore deleted file from backup
            if (op.backup) {
              await fs.promises.copyFile(op.backup, op.path);
              console.log(`Restored: ${op.path}`);
            }
            break;

          case 'modify':
            // Restore original content
            if (op.originalContent) {
              await fs.promises.writeFile(op.path, op.originalContent);
              console.log(`Reverted changes: ${op.path}`);
            }
            break;

          case 'move':
            // Move back from backup
            if (op.backup) {
              await fs.promises.copyFile(op.backup, op.path);
              console.log(`Moved back: ${op.path}`);
            }
            break;
        }
      } catch (error) {
        console.error(`Failed to rollback ${op.type} on ${op.path}:`, error);
      }
    }

    // Clean up backup directory after rollback
    await fs.promises.rm(this.backupDir, { recursive: true });
  }
}

// Usage
async function reorganizeFiles() {
  const tx = new FileSystemTransaction();

  try {
    await tx.moveFile('/data/old/report.pdf', '/data/archive/report.pdf');
    await tx.deleteFile('/data/temp/cache.tmp');
    await tx.modifyFile('/data/config.json', Buffer.from('{"updated": true}'));

    // Verify everything looks correct
    const verified = await verifyFileStructure();
    if (!verified) {
      throw new Error('File structure verification failed');
    }

    await tx.commit();
  } catch (error) {
    await tx.rollback();
    console.error('File reorganization failed, all changes reverted');
  }
}

File system transactions backup data before modifications, enabling complete restoration when operations fail or verification detects problems.

Common Pitfalls

Non-Reversible Actions

Some operations cannot be truly reversed, creating dangerous false security:

// PROBLEMATIC: Cannot truly undo email sending
async function sendEmailWithUndo(email: EmailData) {
  const sent = await emailService.send(email);

  // This isn't real undo - email is already sent
  return {
    messageId: sent.id,
    undo: async () => {
      // Can't unsend email, only send apology
      await emailService.send({
        to: email.to,
        subject: 'Please disregard previous email',
        body: 'The previous email was sent in error.'
      });
    }
  };
}

// BETTER: Delayed execution with cancellation window
class EmailScheduler {
  private pendingEmails = new Map<string, NodeJS.Timeout>();

  async sendWithUndoWindow(email: EmailData, undoWindowMs = 30000) {
    const emailId = generateId();

    // Schedule for delayed sending
    const timeout = setTimeout(async () => {
      await emailService.send(email);
      this.pendingEmails.delete(emailId);
      console.log(`Email ${emailId} sent (undo window expired)`);
    }, undoWindowMs);

    this.pendingEmails.set(emailId, timeout);

    return {
      emailId,
      canUndo: true,
      undoDeadline: Date.now() + undoWindowMs,
      undo: () => this.cancelEmail(emailId)
    };
  }

  cancelEmail(emailId: string): boolean {
    const timeout = this.pendingEmails.get(emailId);
    if (timeout) {
      clearTimeout(timeout);
      this.pendingEmails.delete(emailId);
      console.log(`Email ${emailId} cancelled before sending`);
      return true;
    }
    return false; // Too late to cancel
  }
}

True undo requires preventing actions from taking effect until the undo window expires. Operations with external effects (emails, notifications, payments) need delayed execution for genuine reversibility.

Partial Rollback Failures

Rollback itself can fail, leaving systems in inconsistent intermediate states:

// PROBLEMATIC: Rollback failures aren't handled
async function processOrder(order: Order) {
  try {
    await chargePayment(order.total);
    await updateInventory(order.items);
    await sendConfirmation(order.email);
  } catch (error) {
    // What if refund fails? Customer charged but order not processed
    await refundPayment(order.total);
    throw error;
  }
}

// BETTER: Idempotent rollback with retry and fallback
class RobustRollback {
  async rollback(actions: RollbackAction[], maxAttempts = 3) {
    const failures: Array<{action: RollbackAction, error: Error}> = [];

    for (const action of actions) {
      let success = false;

      for (let attempt = 1; attempt <= maxAttempts; attempt++) {
        try {
          // Use idempotent rollback operations
          await action.rollback();
          console.log(`Rolled back: ${action.description}`);
          success = true;
          break;
        } catch (error) {
          console.warn(`Rollback attempt ${attempt} failed for ${action.description}`);

          if (attempt < maxAttempts) {
            await this.exponentialBackoff(attempt);
          }
        }
      }

      if (!success) {
        failures.push({
          action,
          error: new Error(`Failed to rollback after ${maxAttempts} attempts`)
        });
      }
    }

    if (failures.length > 0) {
      // Escalate to manual intervention
      await this.createIncident({
        type: 'ROLLBACK_FAILURE',
        failures,
        severity: 'CRITICAL',
        requiresManualReview: true
      });

      throw new RollbackException(
        `${failures.length} rollback operations failed`,
        failures
      );
    }
  }

  private async exponentialBackoff(attempt: number) {
    const delay = Math.min(1000 * Math.pow(2, attempt), 10000);
    await new Promise(resolve => setTimeout(resolve, delay));
  }

  private async createIncident(details: IncidentDetails) {
    // Log to monitoring system for ops team intervention
    await incidentManagement.createTicket(details);
    await alerting.sendPagerDutyAlert(details);
  }
}

Robust rollback handles its own failures through retries, idempotency, and escalation to human operators when automated recovery is impossible.

State Inconsistency

Rollback that doesn't restore dependent state creates subtle bugs:

// PROBLEMATIC: Cache inconsistent with database after rollback
async function updateUserProfile(userId: string, updates: ProfileData) {
  const tx = await db.beginTransaction();

  try {
    // Update database
    await tx.query('UPDATE users SET name = $1, email = $2 WHERE id = $3',
      [updates.name, updates.email, userId]);

    // Update cache
    await cache.set(`user:${userId}`, updates);

    await tx.commit();
  } catch (error) {
    await tx.rollback();
    // BUG: Database rolled back but cache still has new data!
    // Next read will get stale cached data
  }
}

// BETTER: Coordinated rollback across all state stores
class CoordinatedTransaction {
  private dbTransaction: DatabaseTransaction;
  private cacheOperations: Array<{key: string, oldValue: any}> = [];

  async updateWithCache(userId: string, updates: ProfileData) {
    this.dbTransaction = await db.beginTransaction();

    try {
      // Track old cache value before update
      const oldValue = await cache.get(`user:${userId}`);
      this.cacheOperations.push({
        key: `user:${userId}`,
        oldValue
      });

      // Update database
      await this.dbTransaction.query(
        'UPDATE users SET name = $1, email = $2 WHERE id = $3',
        [updates.name, updates.email, userId]
      );

      // Update cache
      await cache.set(`user:${userId}`, updates);

      await this.dbTransaction.commit();

    } catch (error) {
      await this.rollback();
      throw error;
    }
  }

  private async rollback() {
    // Rollback database
    await this.dbTransaction.rollback();

    // Rollback cache to previous values
    for (const op of this.cacheOperations) {
      try {
        if (op.oldValue !== null) {
          await cache.set(op.key, op.oldValue);
        } else {
          await cache.delete(op.key);
        }
      } catch (error) {
        console.error(`Failed to rollback cache for ${op.key}:`, error);
        // Invalidate cache entry as fallback
        await cache.delete(op.key);
      }
    }
  }
}

Complete rollback must restore all related state—databases, caches, search indexes, event queues—to maintain consistency across the system.

Implementation

Undo Stack Pattern

Implement explicit undo functionality with command pattern:

interface Command {
  execute(): Promise<void>;
  undo(): Promise<void>;
  description: string;
}

class UndoStack {
  private history: Command[] = [];
  private undoneCommands: Command[] = [];
  private maxHistorySize = 50;

  async execute(command: Command) {
    await command.execute();

    this.history.push(command);

    // Clear redo stack when new command executes
    this.undoneCommands = [];

    // Limit history size
    if (this.history.length > this.maxHistorySize) {
      this.history.shift();
    }
  }

  async undo(): Promise<boolean> {
    const command = this.history.pop();

    if (!command) {
      return false;
    }

    await command.undo();
    this.undoneCommands.push(command);

    console.log(`Undone: ${command.description}`);
    return true;
  }

  async redo(): Promise<boolean> {
    const command = this.undoneCommands.pop();

    if (!command) {
      return false;
    }

    await command.execute();
    this.history.push(command);

    console.log(`Redone: ${command.description}`);
    return true;
  }

  canUndo(): boolean {
    return this.history.length > 0;
  }

  canRedo(): boolean {
    return this.undoneCommands.length > 0;
  }

  getHistory(): string[] {
    return this.history.map(cmd => cmd.description);
  }
}

// Example commands
class UpdateSettingCommand implements Command {
  private oldValue: any;

  constructor(
    private settingKey: string,
    private newValue: any,
    private settingsService: SettingsService
  ) {}

  async execute() {
    this.oldValue = await this.settingsService.get(this.settingKey);
    await this.settingsService.set(this.settingKey, this.newValue);
  }

  async undo() {
    await this.settingsService.set(this.settingKey, this.oldValue);
  }

  get description() {
    return `Update ${this.settingKey} to ${this.newValue}`;
  }
}

// Usage in agent
const undoStack = new UndoStack();

async function agentUpdateSettings(key: string, value: any) {
  const command = new UpdateSettingCommand(key, value, settingsService);
  await undoStack.execute(command);
}

// User can undo
await undoStack.undo(); // Reverts last change

Undo stacks provide explicit user-triggered rollback with full history visibility and redo capability.

Compensation Pattern for Distributed Operations

Handle rollback across distributed services:

interface SagaStep {
  name: string;
  action: () => Promise<any>;
  compensation: (result?: any) => Promise<void>;
}

class SagaOrchestrator {
  async execute(steps: SagaStep[]) {
    const completedSteps: Array<{step: SagaStep, result: any}> = [];

    try {
      for (const step of steps) {
        console.log(`Executing: ${step.name}`);
        const result = await step.action();
        completedSteps.push({ step, result });
      }

      return { success: true };

    } catch (error) {
      console.error('Saga failed, compensating...', error);

      // Compensate in reverse order
      for (const { step, result } of completedSteps.reverse()) {
        try {
          console.log(`Compensating: ${step.name}`);
          await step.compensation(result);
        } catch (compensationError) {
          console.error(
            `Compensation failed for ${step.name}:`,
            compensationError
          );
          // Continue compensating other steps despite failures
        }
      }

      return { success: false, error };
    }
  }
}

// Example: Multi-service booking saga
async function bookTrip(tripData: TripData) {
  const saga = new SagaOrchestrator();

  const steps: SagaStep[] = [
    {
      name: 'Reserve flight',
      action: () => flightService.reserve(tripData.flight),
      compensation: (reservation) => flightService.cancel(reservation.id)
    },
    {
      name: 'Book hotel',
      action: () => hotelService.book(tripData.hotel),
      compensation: (booking) => hotelService.cancel(booking.id)
    },
    {
      name: 'Rent car',
      action: () => carService.rent(tripData.car),
      compensation: (rental) => carService.cancel(rental.id)
    },
    {
      name: 'Charge payment',
      action: () => paymentService.charge(tripData.total),
      compensation: (charge) => paymentService.refund(charge.id)
    }
  ];

  return await saga.execute(steps);
}

Saga pattern orchestrates distributed transactions with compensating actions, ensuring consistent outcomes even when services fail independently.

State Snapshot for Complete Restoration

Capture complete state snapshots for point-in-time recovery:

interface StateSnapshot {
  timestamp: number;
  version: string;
  state: Record<string, any>;
  metadata: Record<string, any>;
}

class SnapshotManager {
  private snapshots: StateSnapshot[] = [];
  private maxSnapshots = 10;

  createSnapshot(state: Record<string, any>, metadata = {}): string {
    const snapshot: StateSnapshot = {
      timestamp: Date.now(),
      version: generateVersion(),
      state: deepClone(state),
      metadata
    };

    this.snapshots.push(snapshot);

    // Limit snapshot history
    if (this.snapshots.length > this.maxSnapshots) {
      this.snapshots.shift();
    }

    console.log(`Created snapshot ${snapshot.version} at ${new Date(snapshot.timestamp)}`);
    return snapshot.version;
  }

  restoreSnapshot(version: string): Record<string, any> | null {
    const snapshot = this.snapshots.find(s => s.version === version);

    if (!snapshot) {
      console.error(`Snapshot ${version} not found`);
      return null;
    }

    console.log(`Restoring snapshot ${version} from ${new Date(snapshot.timestamp)}`);
    return deepClone(snapshot.state);
  }

  getSnapshotHistory(): Array<{version: string, timestamp: number, metadata: any}> {
    return this.snapshots.map(s => ({
      version: s.version,
      timestamp: s.timestamp,
      metadata: s.metadata
    }));
  }

  rollbackToTime(targetTime: number): Record<string, any> | null {
    // Find latest snapshot before target time
    const snapshot = this.snapshots
      .filter(s => s.timestamp <= targetTime)
      .sort((a, b) => b.timestamp - a.timestamp)[0];

    if (!snapshot) {
      console.error(`No snapshot found before ${new Date(targetTime)}`);
      return null;
    }

    return this.restoreSnapshot(snapshot.version);
  }
}

// Usage in stateful agent
class StatefulAgent {
  private state: AgentState = {};
  private snapshots = new SnapshotManager();

  async executeOperation(operation: Operation) {
    // Create snapshot before risky operation
    const snapshotVersion = this.snapshots.createSnapshot(
      this.state,
      { operation: operation.name }
    );

    try {
      this.state = await operation.execute(this.state);
    } catch (error) {
      console.error(`Operation ${operation.name} failed, rolling back...`);

      // Restore to pre-operation state
      const restoredState = this.snapshots.restoreSnapshot(snapshotVersion);
      if (restoredState) {
        this.state = restoredState;
      }

      throw error;
    }
  }

  async undoLastOperation() {
    const history = this.snapshots.getSnapshotHistory();
    if (history.length < 2) {
      return false;
    }

    // Restore to previous snapshot
    const previousSnapshot = history[history.length - 2];
    const restoredState = this.snapshots.restoreSnapshot(previousSnapshot.version);

    if (restoredState) {
      this.state = restoredState;
      console.log(`Rolled back to ${new Date(previousSnapshot.timestamp)}`);
      return true;
    }

    return false;
  }
}

State snapshots enable rollback to any previous point, useful for complex agents where tracking individual command inversions is impractical.

Key Metrics

Measuring rollback effectiveness ensures these safety mechanisms work when needed:

Rollback Success Rate: Percentage of rollback attempts that successfully restore valid system state:

Rollback Success Rate = (Successful Rollbacks / Total Rollback Attempts) × 100

Target: > 99% for production systems. Anything less indicates rollback mechanisms themselves are unreliable, which is catastrophic.

Undo Latency: Time from rollback initiation to complete state restoration:

Undo Latency = Σ(Rollback Completion Time) / Number of Rollbacks

Target: < 2 seconds for simple operations, < 30 seconds for complex distributed rollbacks. High latency frustrates users and increases risk of additional errors during rollback.

State Consistency Score: Percentage of rollbacks that maintain system invariants and constraints:

State Consistency = (Valid State After Rollback / Total Rollbacks) × 100

Target: 100%. Any inconsistency indicates partial rollback failures or incomplete compensation logic. Measure through automated validation checks after each rollback.

Rollback Invocation Rate: Frequency of rollback usage relative to total operations:

Rollback Rate = (Rollbacks Triggered / Total Agent Operations) × 100

Target: < 5% for automatic rollbacks, varies for manual undo. High rates suggest agents make too many errors or users lack confidence in agent actions.

Compensation Cost Ratio: Resource cost of rollback relative to forward operation:

Compensation Cost = Total Rollback Resource Usage / Total Forward Operation Resource Usage

Target: < 1.0. Rollback should not cost more than the original operation. Ratios > 1.0 indicate inefficient compensation strategies.

Related Concepts

Rollback and undo integrate with broader agent reliability patterns:

  • Error Recovery: Rollback is one recovery strategy; error recovery encompasses the full spectrum of failure handling approaches
  • Fail-safes: Preventive mechanisms that reduce the need for rollback by catching errors before they cause damage
  • Idempotency: Operations designed to be safely retried enable simpler rollback through re-execution rather than complex compensation
  • Proof of Action: Rollback mechanisms benefit from proof artifacts that document exactly what changed and provide evidence of successful restoration