Rollback & Undo
Rollback and undo are mechanisms for reversing agent actions when errors are detected or users request cancellation. In agentic systems, rollback capabilities restore systems to previous states after failures or unwanted changes, while undo mechanisms allow users to explicitly reverse completed actions. Together, these capabilities provide critical safety nets that enable users to trust autonomous agents with consequential operations.
Why It Matters
Rollback and undo capabilities are essential for production-grade agentic systems, transforming agents from risky automation experiments into trustworthy tools users can deploy with confidence.
User Confidence and Trust
Users grant agents significant autonomy—from modifying production databases to processing financial transactions. Without reliable rollback mechanisms, a single agent error could cause irreversible damage, making users hesitant to delegate important tasks. When agents provide clear undo capabilities, users feel empowered to experiment and leverage automation for high-stakes operations.
Consider a content management agent that updates website copy. Without undo, users must meticulously review every change before allowing the agent to proceed, eliminating efficiency gains. With transparent rollback—showing exactly what changed and offering one-click restoration—users confidently approve bulk updates, knowing mistakes are easily corrected.
Error Correction and Recovery
Agent errors are inevitable. Network failures interrupt multi-step workflows. Misunderstood instructions lead to incorrect actions. External system changes invalidate assumptions mid-execution. Rollback mechanisms ensure these failures don't compound into catastrophic outcomes.
A deployment agent that rolls out infrastructure changes exemplifies this need. If the agent successfully creates a database and cache cluster but fails when deploying the application, partial state is dangerous. Automatic rollback that tears down the database and cache prevents orphaned resources, billing waste, and configuration drift.
Compliance and Audit Requirements
Regulated industries require demonstrable controls over data modifications and system changes. Rollback capabilities serve dual purposes: they provide technical undo functionality while generating audit trails proving that unauthorized or erroneous changes can be reversed.
Financial services agents processing transactions need rollback to comply with regulations requiring compensating transactions for errors. Healthcare agents modifying patient records must maintain complete change history with rollback capabilities to satisfy HIPAA audit requirements. These aren't optional features—they're regulatory necessities.
Concrete Examples
Database Transaction Rollback
A customer service agent processes a refund by updating multiple database tables:
async function processRefund(orderId: string, amount: number) {
const transaction = await db.beginTransaction();
try {
// Update order status
await transaction.query(
'UPDATE orders SET status = $1, refund_amount = $2 WHERE id = $3',
['refunded', amount, orderId]
);
// Credit customer account
const order = await transaction.query(
'SELECT customer_id FROM orders WHERE id = $1',
[orderId]
);
await transaction.query(
'UPDATE customers SET balance = balance + $1 WHERE id = $2',
[amount, order.rows[0].customer_id]
);
// Record refund transaction
await transaction.query(
'INSERT INTO refund_log (order_id, amount, timestamp) VALUES ($1, $2, $3)',
[orderId, amount, new Date()]
);
// All operations succeeded - commit
await transaction.commit();
return { success: true, message: 'Refund processed' };
} catch (error) {
// Any failure triggers automatic rollback
await transaction.rollback();
console.error('Refund failed, all changes rolled back:', error);
return { success: false, message: 'Refund failed, no changes made' };
}
}
Database transactions provide automatic rollback—if any operation fails, all changes revert, maintaining data consistency. The customer never sees a partially processed refund.
API Compensation Actions
When integrating with third-party services, rollback requires explicit compensation logic:
interface CompensationAction {
execute: () => Promise<void>;
description: string;
}
class CompensatingTransaction {
private compensations: CompensationAction[] = [];
async execute<T>(
action: () => Promise<T>,
compensation: () => Promise<void>,
description: string
): Promise<T> {
try {
const result = await action();
// Track compensation in reverse order
this.compensations.unshift({ execute: compensation, description });
return result;
} catch (error) {
await this.rollback();
throw error;
}
}
async rollback() {
console.log(`Rolling back ${this.compensations.length} actions...`);
for (const compensation of this.compensations) {
try {
console.log(`Compensating: ${compensation.description}`);
await compensation.execute();
} catch (error) {
console.error(`Compensation failed for ${compensation.description}:`, error);
// Log but continue with other compensations
}
}
}
}
// Usage example: Email campaign agent
async function scheduleCampaign(campaignData: CampaignData) {
const tx = new CompensatingTransaction();
try {
// Create email template in marketing platform
const template = await tx.execute(
() => emailService.createTemplate(campaignData.template),
() => emailService.deleteTemplate(template.id),
'Create email template'
);
// Upload recipient list
const audience = await tx.execute(
() => emailService.uploadAudience(campaignData.recipients),
() => emailService.deleteAudience(audience.id),
'Upload recipient list'
);
// Schedule campaign
const campaign = await tx.execute(
() => emailService.scheduleCampaign({
templateId: template.id,
audienceId: audience.id,
sendTime: campaignData.sendTime
}),
() => emailService.cancelCampaign(campaign.id),
'Schedule campaign'
);
return { success: true, campaignId: campaign.id };
} catch (error) {
// Automatic rollback via compensating actions
return { success: false, message: 'Campaign scheduling failed, all changes reverted' };
}
}
Compensation actions reverse API calls that don't support native transactions. Each operation registers its inverse, enabling complete rollback even across external services.
File System Restoration
An agent that reorganizes file structures needs rollback for safe experimentation:
interface FileOperation {
type: 'move' | 'delete' | 'create' | 'modify';
path: string;
backup?: string;
originalContent?: Buffer;
}
class FileSystemTransaction {
private operations: FileOperation[] = [];
private backupDir: string;
constructor() {
this.backupDir = `/tmp/agent-backup-${Date.now()}`;
fs.mkdirSync(this.backupDir, { recursive: true });
}
async moveFile(source: string, destination: string) {
// Create backup of source
const backupPath = path.join(this.backupDir, path.basename(source));
await fs.promises.copyFile(source, backupPath);
this.operations.push({
type: 'move',
path: source,
backup: backupPath
});
await fs.promises.rename(source, destination);
}
async deleteFile(filePath: string) {
// Backup before deletion
const backupPath = path.join(this.backupDir, path.basename(filePath));
await fs.promises.copyFile(filePath, backupPath);
this.operations.push({
type: 'delete',
path: filePath,
backup: backupPath
});
await fs.promises.unlink(filePath);
}
async modifyFile(filePath: string, newContent: Buffer) {
// Save original content
const originalContent = await fs.promises.readFile(filePath);
this.operations.push({
type: 'modify',
path: filePath,
originalContent
});
await fs.promises.writeFile(filePath, newContent);
}
async commit() {
// Clean up backup directory
await fs.promises.rm(this.backupDir, { recursive: true });
}
async rollback() {
console.log(`Rolling back ${this.operations.length} file operations...`);
// Reverse operations in reverse order
for (const op of this.operations.reverse()) {
try {
switch (op.type) {
case 'delete':
// Restore deleted file from backup
if (op.backup) {
await fs.promises.copyFile(op.backup, op.path);
console.log(`Restored: ${op.path}`);
}
break;
case 'modify':
// Restore original content
if (op.originalContent) {
await fs.promises.writeFile(op.path, op.originalContent);
console.log(`Reverted changes: ${op.path}`);
}
break;
case 'move':
// Move back from backup
if (op.backup) {
await fs.promises.copyFile(op.backup, op.path);
console.log(`Moved back: ${op.path}`);
}
break;
}
} catch (error) {
console.error(`Failed to rollback ${op.type} on ${op.path}:`, error);
}
}
// Clean up backup directory after rollback
await fs.promises.rm(this.backupDir, { recursive: true });
}
}
// Usage
async function reorganizeFiles() {
const tx = new FileSystemTransaction();
try {
await tx.moveFile('/data/old/report.pdf', '/data/archive/report.pdf');
await tx.deleteFile('/data/temp/cache.tmp');
await tx.modifyFile('/data/config.json', Buffer.from('{"updated": true}'));
// Verify everything looks correct
const verified = await verifyFileStructure();
if (!verified) {
throw new Error('File structure verification failed');
}
await tx.commit();
} catch (error) {
await tx.rollback();
console.error('File reorganization failed, all changes reverted');
}
}
File system transactions backup data before modifications, enabling complete restoration when operations fail or verification detects problems.
Common Pitfalls
Non-Reversible Actions
Some operations cannot be truly reversed, creating dangerous false security:
// PROBLEMATIC: Cannot truly undo email sending
async function sendEmailWithUndo(email: EmailData) {
const sent = await emailService.send(email);
// This isn't real undo - email is already sent
return {
messageId: sent.id,
undo: async () => {
// Can't unsend email, only send apology
await emailService.send({
to: email.to,
subject: 'Please disregard previous email',
body: 'The previous email was sent in error.'
});
}
};
}
// BETTER: Delayed execution with cancellation window
class EmailScheduler {
private pendingEmails = new Map<string, NodeJS.Timeout>();
async sendWithUndoWindow(email: EmailData, undoWindowMs = 30000) {
const emailId = generateId();
// Schedule for delayed sending
const timeout = setTimeout(async () => {
await emailService.send(email);
this.pendingEmails.delete(emailId);
console.log(`Email ${emailId} sent (undo window expired)`);
}, undoWindowMs);
this.pendingEmails.set(emailId, timeout);
return {
emailId,
canUndo: true,
undoDeadline: Date.now() + undoWindowMs,
undo: () => this.cancelEmail(emailId)
};
}
cancelEmail(emailId: string): boolean {
const timeout = this.pendingEmails.get(emailId);
if (timeout) {
clearTimeout(timeout);
this.pendingEmails.delete(emailId);
console.log(`Email ${emailId} cancelled before sending`);
return true;
}
return false; // Too late to cancel
}
}
True undo requires preventing actions from taking effect until the undo window expires. Operations with external effects (emails, notifications, payments) need delayed execution for genuine reversibility.
Partial Rollback Failures
Rollback itself can fail, leaving systems in inconsistent intermediate states:
// PROBLEMATIC: Rollback failures aren't handled
async function processOrder(order: Order) {
try {
await chargePayment(order.total);
await updateInventory(order.items);
await sendConfirmation(order.email);
} catch (error) {
// What if refund fails? Customer charged but order not processed
await refundPayment(order.total);
throw error;
}
}
// BETTER: Idempotent rollback with retry and fallback
class RobustRollback {
async rollback(actions: RollbackAction[], maxAttempts = 3) {
const failures: Array<{action: RollbackAction, error: Error}> = [];
for (const action of actions) {
let success = false;
for (let attempt = 1; attempt <= maxAttempts; attempt++) {
try {
// Use idempotent rollback operations
await action.rollback();
console.log(`Rolled back: ${action.description}`);
success = true;
break;
} catch (error) {
console.warn(`Rollback attempt ${attempt} failed for ${action.description}`);
if (attempt < maxAttempts) {
await this.exponentialBackoff(attempt);
}
}
}
if (!success) {
failures.push({
action,
error: new Error(`Failed to rollback after ${maxAttempts} attempts`)
});
}
}
if (failures.length > 0) {
// Escalate to manual intervention
await this.createIncident({
type: 'ROLLBACK_FAILURE',
failures,
severity: 'CRITICAL',
requiresManualReview: true
});
throw new RollbackException(
`${failures.length} rollback operations failed`,
failures
);
}
}
private async exponentialBackoff(attempt: number) {
const delay = Math.min(1000 * Math.pow(2, attempt), 10000);
await new Promise(resolve => setTimeout(resolve, delay));
}
private async createIncident(details: IncidentDetails) {
// Log to monitoring system for ops team intervention
await incidentManagement.createTicket(details);
await alerting.sendPagerDutyAlert(details);
}
}
Robust rollback handles its own failures through retries, idempotency, and escalation to human operators when automated recovery is impossible.
State Inconsistency
Rollback that doesn't restore dependent state creates subtle bugs:
// PROBLEMATIC: Cache inconsistent with database after rollback
async function updateUserProfile(userId: string, updates: ProfileData) {
const tx = await db.beginTransaction();
try {
// Update database
await tx.query('UPDATE users SET name = $1, email = $2 WHERE id = $3',
[updates.name, updates.email, userId]);
// Update cache
await cache.set(`user:${userId}`, updates);
await tx.commit();
} catch (error) {
await tx.rollback();
// BUG: Database rolled back but cache still has new data!
// Next read will get stale cached data
}
}
// BETTER: Coordinated rollback across all state stores
class CoordinatedTransaction {
private dbTransaction: DatabaseTransaction;
private cacheOperations: Array<{key: string, oldValue: any}> = [];
async updateWithCache(userId: string, updates: ProfileData) {
this.dbTransaction = await db.beginTransaction();
try {
// Track old cache value before update
const oldValue = await cache.get(`user:${userId}`);
this.cacheOperations.push({
key: `user:${userId}`,
oldValue
});
// Update database
await this.dbTransaction.query(
'UPDATE users SET name = $1, email = $2 WHERE id = $3',
[updates.name, updates.email, userId]
);
// Update cache
await cache.set(`user:${userId}`, updates);
await this.dbTransaction.commit();
} catch (error) {
await this.rollback();
throw error;
}
}
private async rollback() {
// Rollback database
await this.dbTransaction.rollback();
// Rollback cache to previous values
for (const op of this.cacheOperations) {
try {
if (op.oldValue !== null) {
await cache.set(op.key, op.oldValue);
} else {
await cache.delete(op.key);
}
} catch (error) {
console.error(`Failed to rollback cache for ${op.key}:`, error);
// Invalidate cache entry as fallback
await cache.delete(op.key);
}
}
}
}
Complete rollback must restore all related state—databases, caches, search indexes, event queues—to maintain consistency across the system.
Implementation
Undo Stack Pattern
Implement explicit undo functionality with command pattern:
interface Command {
execute(): Promise<void>;
undo(): Promise<void>;
description: string;
}
class UndoStack {
private history: Command[] = [];
private undoneCommands: Command[] = [];
private maxHistorySize = 50;
async execute(command: Command) {
await command.execute();
this.history.push(command);
// Clear redo stack when new command executes
this.undoneCommands = [];
// Limit history size
if (this.history.length > this.maxHistorySize) {
this.history.shift();
}
}
async undo(): Promise<boolean> {
const command = this.history.pop();
if (!command) {
return false;
}
await command.undo();
this.undoneCommands.push(command);
console.log(`Undone: ${command.description}`);
return true;
}
async redo(): Promise<boolean> {
const command = this.undoneCommands.pop();
if (!command) {
return false;
}
await command.execute();
this.history.push(command);
console.log(`Redone: ${command.description}`);
return true;
}
canUndo(): boolean {
return this.history.length > 0;
}
canRedo(): boolean {
return this.undoneCommands.length > 0;
}
getHistory(): string[] {
return this.history.map(cmd => cmd.description);
}
}
// Example commands
class UpdateSettingCommand implements Command {
private oldValue: any;
constructor(
private settingKey: string,
private newValue: any,
private settingsService: SettingsService
) {}
async execute() {
this.oldValue = await this.settingsService.get(this.settingKey);
await this.settingsService.set(this.settingKey, this.newValue);
}
async undo() {
await this.settingsService.set(this.settingKey, this.oldValue);
}
get description() {
return `Update ${this.settingKey} to ${this.newValue}`;
}
}
// Usage in agent
const undoStack = new UndoStack();
async function agentUpdateSettings(key: string, value: any) {
const command = new UpdateSettingCommand(key, value, settingsService);
await undoStack.execute(command);
}
// User can undo
await undoStack.undo(); // Reverts last change
Undo stacks provide explicit user-triggered rollback with full history visibility and redo capability.
Compensation Pattern for Distributed Operations
Handle rollback across distributed services:
interface SagaStep {
name: string;
action: () => Promise<any>;
compensation: (result?: any) => Promise<void>;
}
class SagaOrchestrator {
async execute(steps: SagaStep[]) {
const completedSteps: Array<{step: SagaStep, result: any}> = [];
try {
for (const step of steps) {
console.log(`Executing: ${step.name}`);
const result = await step.action();
completedSteps.push({ step, result });
}
return { success: true };
} catch (error) {
console.error('Saga failed, compensating...', error);
// Compensate in reverse order
for (const { step, result } of completedSteps.reverse()) {
try {
console.log(`Compensating: ${step.name}`);
await step.compensation(result);
} catch (compensationError) {
console.error(
`Compensation failed for ${step.name}:`,
compensationError
);
// Continue compensating other steps despite failures
}
}
return { success: false, error };
}
}
}
// Example: Multi-service booking saga
async function bookTrip(tripData: TripData) {
const saga = new SagaOrchestrator();
const steps: SagaStep[] = [
{
name: 'Reserve flight',
action: () => flightService.reserve(tripData.flight),
compensation: (reservation) => flightService.cancel(reservation.id)
},
{
name: 'Book hotel',
action: () => hotelService.book(tripData.hotel),
compensation: (booking) => hotelService.cancel(booking.id)
},
{
name: 'Rent car',
action: () => carService.rent(tripData.car),
compensation: (rental) => carService.cancel(rental.id)
},
{
name: 'Charge payment',
action: () => paymentService.charge(tripData.total),
compensation: (charge) => paymentService.refund(charge.id)
}
];
return await saga.execute(steps);
}
Saga pattern orchestrates distributed transactions with compensating actions, ensuring consistent outcomes even when services fail independently.
State Snapshot for Complete Restoration
Capture complete state snapshots for point-in-time recovery:
interface StateSnapshot {
timestamp: number;
version: string;
state: Record<string, any>;
metadata: Record<string, any>;
}
class SnapshotManager {
private snapshots: StateSnapshot[] = [];
private maxSnapshots = 10;
createSnapshot(state: Record<string, any>, metadata = {}): string {
const snapshot: StateSnapshot = {
timestamp: Date.now(),
version: generateVersion(),
state: deepClone(state),
metadata
};
this.snapshots.push(snapshot);
// Limit snapshot history
if (this.snapshots.length > this.maxSnapshots) {
this.snapshots.shift();
}
console.log(`Created snapshot ${snapshot.version} at ${new Date(snapshot.timestamp)}`);
return snapshot.version;
}
restoreSnapshot(version: string): Record<string, any> | null {
const snapshot = this.snapshots.find(s => s.version === version);
if (!snapshot) {
console.error(`Snapshot ${version} not found`);
return null;
}
console.log(`Restoring snapshot ${version} from ${new Date(snapshot.timestamp)}`);
return deepClone(snapshot.state);
}
getSnapshotHistory(): Array<{version: string, timestamp: number, metadata: any}> {
return this.snapshots.map(s => ({
version: s.version,
timestamp: s.timestamp,
metadata: s.metadata
}));
}
rollbackToTime(targetTime: number): Record<string, any> | null {
// Find latest snapshot before target time
const snapshot = this.snapshots
.filter(s => s.timestamp <= targetTime)
.sort((a, b) => b.timestamp - a.timestamp)[0];
if (!snapshot) {
console.error(`No snapshot found before ${new Date(targetTime)}`);
return null;
}
return this.restoreSnapshot(snapshot.version);
}
}
// Usage in stateful agent
class StatefulAgent {
private state: AgentState = {};
private snapshots = new SnapshotManager();
async executeOperation(operation: Operation) {
// Create snapshot before risky operation
const snapshotVersion = this.snapshots.createSnapshot(
this.state,
{ operation: operation.name }
);
try {
this.state = await operation.execute(this.state);
} catch (error) {
console.error(`Operation ${operation.name} failed, rolling back...`);
// Restore to pre-operation state
const restoredState = this.snapshots.restoreSnapshot(snapshotVersion);
if (restoredState) {
this.state = restoredState;
}
throw error;
}
}
async undoLastOperation() {
const history = this.snapshots.getSnapshotHistory();
if (history.length < 2) {
return false;
}
// Restore to previous snapshot
const previousSnapshot = history[history.length - 2];
const restoredState = this.snapshots.restoreSnapshot(previousSnapshot.version);
if (restoredState) {
this.state = restoredState;
console.log(`Rolled back to ${new Date(previousSnapshot.timestamp)}`);
return true;
}
return false;
}
}
State snapshots enable rollback to any previous point, useful for complex agents where tracking individual command inversions is impractical.
Key Metrics
Measuring rollback effectiveness ensures these safety mechanisms work when needed:
Rollback Success Rate: Percentage of rollback attempts that successfully restore valid system state:
Rollback Success Rate = (Successful Rollbacks / Total Rollback Attempts) × 100
Target: > 99% for production systems. Anything less indicates rollback mechanisms themselves are unreliable, which is catastrophic.
Undo Latency: Time from rollback initiation to complete state restoration:
Undo Latency = Σ(Rollback Completion Time) / Number of Rollbacks
Target: < 2 seconds for simple operations, < 30 seconds for complex distributed rollbacks. High latency frustrates users and increases risk of additional errors during rollback.
State Consistency Score: Percentage of rollbacks that maintain system invariants and constraints:
State Consistency = (Valid State After Rollback / Total Rollbacks) × 100
Target: 100%. Any inconsistency indicates partial rollback failures or incomplete compensation logic. Measure through automated validation checks after each rollback.
Rollback Invocation Rate: Frequency of rollback usage relative to total operations:
Rollback Rate = (Rollbacks Triggered / Total Agent Operations) × 100
Target: < 5% for automatic rollbacks, varies for manual undo. High rates suggest agents make too many errors or users lack confidence in agent actions.
Compensation Cost Ratio: Resource cost of rollback relative to forward operation:
Compensation Cost = Total Rollback Resource Usage / Total Forward Operation Resource Usage
Target: < 1.0. Rollback should not cost more than the original operation. Ratios > 1.0 indicate inefficient compensation strategies.
Related Concepts
Rollback and undo integrate with broader agent reliability patterns:
- Error Recovery: Rollback is one recovery strategy; error recovery encompasses the full spectrum of failure handling approaches
- Fail-safes: Preventive mechanisms that reduce the need for rollback by catching errors before they cause damage
- Idempotency: Operations designed to be safely retried enable simpler rollback through re-execution rather than complex compensation
- Proof of Action: Rollback mechanisms benefit from proof artifacts that document exactly what changed and provide evidence of successful restoration