Azdoit¶

This directory contains documentation for the azdoit enhancement to azlin, which provides intelligent Azure operations through natural language commands.

Overview¶

Azdoit is an intelligent orchestration layer that: - Parses natural language commands into Azure operations - Selects optimal execution strategies - Provides cost estimation before execution - Handles failures with automatic recovery - Maintains state across operations

Architecture¶

The system consists of several key components:

Strategy System - Four implementation strategies: - Azure CLI strategy (direct CLI commands) - Terraform strategy (infrastructure as code) - MCP Server strategy (Model Context Protocol) - Custom code strategy (Python code generation)

State Management - JSON-based persistence: - Tracks objective execution - Stores configuration preferences - Maintains operation history

Cost Estimation - Azure Pricing API integration: - Estimates costs before execution - Respects user-defined cost limits - Provides cost breakdowns

Recovery Agent - MS Learn integration: - Analyzes failures - Researches solutions - Retries with corrections

Key Principles¶

Bricks & Studs Philosophy¶

Bricks (Modules): - Self-contained with clear boundaries - Single responsibility - No circular dependencies - Independently testable

Studs (Interfaces): - Stable public APIs - Full type hints - Clear contracts - Observable results

Design Goals¶

Minimal - Every module must justify its existence
Clear - Obvious what each module does
Testable - Easy to write unit and integration tests
Regeneratable - Can rebuild any module from contracts
Observable - Returns structured results, no hidden state

Module Summary¶

Core Modules (11 new files)¶

strategy_selector.py - Choose optimal execution strategy
objective_state.py - Persist objectives to JSON
cost_estimator.py - Estimate Azure resource costs
recovery_agent.py - Research and retry failures
terraform_generator.py - Generate Terraform HCL
mcp_client.py - Azure MCP Server client
mslearn_client.py - MS Learn documentation search
strategies/base.py - Strategy interface
strategies/azure_cli.py - Azure CLI strategy
strategies/terraform.py - Terraform strategy
strategies/mcp_server.py - MCP Server strategy
strategies/custom_code.py - Python code gen strategy

Plus: - errors.py - Exception hierarchy - config.py - Configuration management

Integration Points¶

Existing: IntentParser, CommandExecutor
New: Strategy system, state management, cost estimation, recovery

JSON Schemas¶

Objective State - Persistent execution state
Configuration - User preferences and settings

Implementation Status¶

Example Usage¶

Basic Flow¶

from azlin.agentic import handle_do_command

# User runs: azlin do "create 3 VMs called test-{1,2,3}"

result = handle_do_command(
    "create 3 VMs called test-{1,2,3}",
    resource_group="my-rg",
    cost_limit=50.0,
    prefer_terraform=True
)

if result.success:
    print(f"Created {len(result.resources_created)} resources")
    print(f"Total cost: ${result.cost_estimate:.2f}")
else:
    print(f"Failed: {result.error}")

Strategy Selection¶

from azlin.agentic.strategy_selector import StrategySelector
from azlin.agentic.strategies.base import ExecutionContext

# Create context
context = ExecutionContext(
    intent={"intent": "provision_vm", ...},
    parameters={"vm_name": "test", "count": 3},
    resource_group="my-rg",
    cost_limit=50.0
)

# Select strategy
selector = StrategySelector()
strategy = selector.select(context, preferences={"prefer_terraform": True})

# Execute
result = strategy.execute(context)

State Management¶

from azlin.agentic.objective_state import ObjectiveStateManager

# Create objective
manager = ObjectiveStateManager()
objective = manager.create_objective(
    user_request="create 3 VMs",
    intent={...},
    parameters={...}
)

# Update with result
manager.update_objective(
    objective.objective_id,
    status="completed",
    result=execution_result
)

# List all objectives
objectives = manager.list_objectives(status="completed")

Recovery Flow¶

from azlin.agentic.recovery_agent import RecoveryAgent

# Execution failed
if not result.success:
    recovery = RecoveryAgent()

    # Analyze failure and create recovery plan
    plan = recovery.analyze_failure(context, result)

    if plan.strategy == "retry":
        # Attempt recovery
        recovered = recovery.attempt_recovery(context, result, max_attempts=3)

        if recovered and recovered.success:
            print("Recovery succeeded!")

Testing Strategy¶

Unit Tests¶

Each module has comprehensive unit tests:

pytest tests/unit/test_strategy_selector.py -v
pytest tests/unit/test_objective_state.py -v
pytest tests/unit/strategies/ -v

Integration Tests¶

End-to-end scenarios:

pytest tests/integration/test_end_to_end.py -v
pytest tests/integration/test_recovery_flow.py -v

Coverage Goals¶

Unit tests: >80% coverage per module
Integration tests: All critical paths
Error scenarios: All exception types

Configuration¶

User Configuration File¶

~/.azlin/azdoit/config.json:

{
  "strategy_preferences": {
    "prefer_terraform": false,
    "max_cost": 100.0
  },
  "cost_estimation": {
    "enabled": true,
    "confirmation_threshold": 10.0
  },
  "recovery": {
    "enabled": true,
    "max_attempts": 3
  },
  "mcp_server": {
    "enabled": false,
    "url": "https://azure-mcp.azurewebsites.net"
  }
}

State Directory¶

~/.azlin/azdoit/state/: - Contains JSON files for each objective - One file per objective: {uuid}.json - Automatic cleanup of old objectives

CLI Commands¶

Enhanced `azlin do` Command¶

# Basic usage
azlin do "create a VM called test"

# With strategy preference
azlin do "create 3 VMs" --prefer-terraform

# With cost limit
azlin do "provision fleet" --cost-limit 50

# Dry run
azlin do "delete all VMs" --dry-run

# Disable recovery
azlin do "create VM" --no-recovery

# View objectives
azlin do list
azlin do show --objective-id {uuid}

Dependencies¶

New Dependencies¶

anthropic>=0.40.0 - Already in PR #156
No additional dependencies needed for core functionality

Optional Dependencies¶

hcl2 - For Terraform HCL parsing (if needed)
requests - For MCP Server client (standard library alternative: urllib)

Performance¶

Expected Performance¶

Intent parsing: <2 seconds
Strategy selection: <100ms
Azure CLI execution: 30-300 seconds (depends on operation)
Terraform execution: 60-600 seconds (depends on complexity)
State persistence: <10ms

Optimization Opportunities¶

Cache pricing data (reduce API calls)
Parallel strategy scoring (if multiple strategies viable)
Lazy load strategies (only load when needed)
Background state cleanup (don't block user)

Security¶

Considerations¶

Command Injection - Use list args, not shell=True
API Key Exposure - Never log full keys
Path Traversal - Validate file paths
Cost Limits - Respect user-defined limits
Resource Cleanup - Always rollback on failure

Best Practices¶

Validate all user inputs
Use subprocess safely
Sanitize file paths
Respect cost constraints
Clean up temporary files

Troubleshooting¶

Common Issues¶

Strategy not selected: - Check can_handle() implementation - Verify intent type matches - Check cost limits

State file corrupted: - Delete file from ~/.azlin/azdoit/state/ - Run cleanup: azlin do cleanup

Cost estimation fails: - Check network connectivity - Verify Azure Pricing API access - Use fallback estimates

Recovery fails: - Check error message - Verify MS Learn API access - Try manual recovery

Contributing¶

When adding new features:

Design API contract first - Update contracts document
Update architecture - Add to architecture diagram
Write tests - Before implementation
Implement module - Follow implementation guide
Update documentation - Keep docs in sync

Philosophy Alignment¶

This design follows azlin's core philosophy:

Ruthless Simplicity - Every module justified
Bricks & Studs - Clear interfaces, isolated modules
Zero-BS - No stubs, no TODOs, real implementations
Quality over Speed - Well-tested, robust code
Regeneratable - Can rebuild from contracts

Next Steps¶

Review contracts - Provide feedback on APIs
Approve architecture - Verify design is sound
Begin implementation - Follow 5-week plan
Iterate - Improve based on real usage
Document learnings - Update guides with insights

Questions?¶

For more information, see the main AI Command Documentation or open an issue for clarification.

Design Status: ✅ Complete and ready for review

Implementation Status: 🚧 Pending approval

Integration Status: ⏳ Awaiting implementation