Guided Mode
Guided mode is an operational mode where agents present planned actions to users for approval before execution. Rather than performing tasks autonomously, the agent pauses before each action (or set of actions), displays what it intends to do, and waits for explicit user confirmation or modification before proceeding.
Why It Matters
Building Trust Through Transparency
Guided mode serves as the foundation for establishing user trust in AI agents. When users can see and approve each action before it executes, they develop confidence in the agent's decision-making capabilities. This transparency is particularly crucial during initial deployments where users are unfamiliar with the agent's behavior patterns.
For example, a code deployment agent might show: "I plan to merge PR #342 into production and trigger a deployment to 15,000 servers." This visibility allows users to catch potential issues before they impact production systems.
High-Stakes Operations Require Human Oversight
In domains where errors carry significant consequences—financial transactions, medical decisions, infrastructure changes—guided mode provides a critical safety net. A financial trading agent operating in guided mode might display: "Sell 10,000 shares of ACME at market price (estimated $450,000 transaction)" and wait for explicit approval before submitting the order.
Healthcare scheduling agents use guided mode when proposing to cancel or reschedule patient appointments, ensuring that no critical care gets interrupted without human judgment.
Learning Phase for Agent Calibration
Guided mode generates valuable data during the agent training and calibration phase. By tracking which actions users approve immediately, which they modify, and which they reject, teams can identify gaps in the agent's judgment. An agent that consistently proposes actions requiring modification needs refinement before it can operate more autonomously.
Concrete Examples
Approval Workflows
Database Migration Agent: Before applying schema changes, the agent presents:
- Current schema state
- Proposed SQL migrations
- Affected tables and row counts
- Estimated downtime window
- Rollback plan
The user reviews this plan, potentially edits the migration script, and then approves execution.
Email Response Agent: When drafting customer responses, shows:
- Original customer inquiry
- Proposed response text
- Relevant support articles being referenced
- Tone analysis (formal/casual)
- Estimated resolution time commitment
Action Preview UIs
Effective guided mode interfaces show not just what the agent will do, but the expected outcomes:
Calendar Management Agent:
Proposed Action: Reschedule "Q4 Planning Meeting"
Current: Thursday 2pm-4pm (14 attendees confirmed)
New: Friday 10am-12pm
Conflicts: 3 attendees have tentative conflicts
Travel Impact: 2 attendees need to adjust flights
Confidence: 78%
[Edit Details] [Approve] [Reject & Suggest Alternative]
Infrastructure Agent:
Planned: Scale production cluster
Current: 8 nodes (CPU 45%, Memory 62%)
Proposed: 12 nodes (+4 nodes)
Cost Impact: +$840/month
Expected Response Time: Improve from 340ms to 210ms
Startup Time: 6-8 minutes
[Modify Scale] [Approve] [Cancel]
Edit-Before-Execute Patterns
The most sophisticated guided mode implementations allow inline editing of proposed actions:
Content Publishing Agent: Presents a draft blog post with inline editing enabled. Users can modify headlines, adjust tone, add sections, or request regeneration of specific paragraphs. Once editing is complete, the user approves the final version for publication.
Data Analysis Agent: Shows proposed SQL query with syntax highlighting. Users can modify the query directly in the preview interface, see the updated execution plan, and then approve the modified query.
Common Pitfalls
Approval Fatigue
When agents request approval for every minor action, users begin rubber-stamping approvals without careful review. An email agent that requires approval for every "Thank you for contacting us" response creates excessive friction.
Solution: Group related actions into batches, implement risk-based approval thresholds, or use progressive autonomy where low-risk actions (confirmed by user feedback patterns) get auto-approved after a learning period.
Unclear Action Descriptions
Vague descriptions like "I will update the database" or "I plan to modify the configuration" don't give users enough context to make informed approval decisions.
Bad: "Send email to customer" Good: "Send email to customer@example.com confirming refund of $127.50 for order #8432, processing time 3-5 business days"
Bad: "Restart service" Good: "Restart payment-processor service on prod-server-03 (will cause 20-30 second downtime, affecting ~12 active transactions which will retry automatically)"
No Bulk Approval Options
When an agent proposes 15 related actions that should logically execute together, forcing individual approval for each creates unnecessary friction.
Solution: Implement "Approve All," "Approve Low-Risk," or "Approve Similar" options. For example, if an agent proposes updating 10 npm packages without breaking changes, allow approving all dependency updates as a single decision.
Missing Context for Approval Decisions
Users can't make informed decisions without understanding why the agent chose a specific action.
Insufficient:
Action: Delete file "user_data_2023.csv"
[Approve] [Reject]
Comprehensive:
Action: Delete file "user_data_2023.csv"
Reason: File is 18 months old, backup exists in archive storage,
cleanup policy specifies deletion after 12 months
Size: 2.3 GB (will free disk space currently at 94% capacity)
Last Accessed: 8 months ago
Backup Location: s3://archive/user_data_2023.csv (verified)
[View File Details] [Approve] [Keep File] [Archive Instead]
Implementation
Action Planning Phase
Before requesting approval, the agent must generate a complete, specific action plan:
interface PlannedAction {
actionType: string;
target: string;
parameters: Record<string, any>;
reasoning: string;
estimatedImpact: {
scope: string;
risk: 'low' | 'medium' | 'high';
reversible: boolean;
affectedResources: string[];
};
alternatives: Array<{
description: string;
tradeoffs: string;
}>;
executionTime: number; // milliseconds
}
The planning phase should include impact analysis: "This will affect 14 customer records" or "This requires 45 seconds of service downtime."
Preview Generation
Generate rich previews that help users understand the action:
For Code Changes: Show syntax-highlighted diffs, not just file names For Data Operations: Show sample records that will be affected (first 5 of 1,200) For API Calls: Display the actual HTTP request with headers and body For Infrastructure: Visualize before/after state diagrams
Approval Interfaces
Effective approval UIs provide multiple response options beyond binary approve/reject:
enum ApprovalResponse {
APPROVE = 'approve', // Execute as planned
APPROVE_ALL = 'approve_all', // Execute this and similar actions
EDIT = 'edit', // Modify then approve
REJECT = 'reject', // Don't execute
SUGGEST = 'suggest', // User provides alternative approach
DEFER = 'defer', // Ask me again later
AUTO_APPROVE = 'auto_approve' // Approve this type automatically
}
Keyboard Shortcuts: Implement keyboard navigation for power users (Enter to approve, E to edit, R to reject).
Approval History: Show recent approvals so users can reference past decisions: "You approved similar actions 12 times in the past week."
Timeout Handling
Define clear behavior when users don't respond to approval requests:
- Critical actions: Never timeout, wait indefinitely
- Time-sensitive actions: Show countdown timer with context ("Meeting starts in 15 minutes")
- Low-priority actions: Auto-expire after defined period with notification
interface ApprovalRequest {
action: PlannedAction;
timeout?: number; // milliseconds
timeoutBehavior: 'cancel' | 'retry' | 'defer' | 'escalate';
escalationTarget?: string; // user or role to notify
}
Key Metrics to Track
Approval Rate
Definition: Percentage of proposed actions that users approve without modification.
Target: <90% may indicate the agent needs calibration; >98% might suggest approval fatigue where users aren't carefully reviewing.
Calculation: approved_actions / total_proposed_actions × 100
Use Case: An approval rate of 65% on a new deployment agent indicates it needs refinement before operating more autonomously. If the rate climbs to 94% over three months, consider graduating to shadow mode or autonomous operation for similar actions.
Time to Approve
Definition: Median time between when an approval request is presented and when the user responds.
Target: Should decrease over time as users gain confidence. Initial: 2-5 minutes, Mature: 10-30 seconds.
Tracking:
interface ApprovalMetrics {
timeToApprove: {
p50: number; // median
p95: number; // catch unusually slow approvals
};
timeByActionType: Map<string, number>;
timeByRiskLevel: Map<RiskLevel, number>;
}
Use Case: If time-to-approve increases from 20 seconds to 90 seconds over several weeks, users may be losing confidence or action descriptions may have become less clear.
Edit Frequency
Definition: Percentage of actions that users modify before approving.
Target: <15% suggests the agent proposes accurate actions; >40% indicates systematic issues with the agent's judgment.
Calculation: edited_actions / (approved_actions + edited_actions) × 100
Analysis: Track what types of edits users make:
- Parameter adjustments (e.g., changing quantities or thresholds)
- Scope reductions (agent proposed too broad an action)
- Timing changes (right action, wrong time)
- Complete rewrites (agent misunderstood the goal)
Use Case: If 35% of proposed email responses get edited, analyze the edits to identify patterns. If users consistently make the tone more formal, adjust the agent's tone parameter. If users add specific details the agent missed, improve the agent's context gathering.
Rejection Reasons
Track why users reject proposed actions:
enum RejectionReason {
INCORRECT_TARGET = 'incorrect_target',
WRONG_TIMING = 'wrong_timing',
INSUFFICIENT_CONTEXT = 'insufficient_context',
TOO_RISKY = 'too_risky',
BETTER_ALTERNATIVE = 'better_alternative',
NO_LONGER_NEEDED = 'no_longer_needed'
}
High rejection rates for "incorrect_target" suggest the agent's context understanding needs improvement.
Related Concepts
- Guided vs Autonomous - Comparing operational modes and when to use each approach
- Shadow Mode - Running agents in observation-only mode to build confidence before deployment
- Human-in-the-Loop - Broader pattern of incorporating human judgment in AI systems
Last updated: 2025-10-23