Recipe Resilience: Branch Sanitization & Sub-Recipe Recovery¶

This document describes two resilience improvements to the amplihack recipe runner:

Branch Name Sanitization (Issue #2952) — default-workflow step 4 now produces valid git branch names from any task_description, including multi-line prompts and strings with special characters.
Sub-Recipe Agentic Recovery (Issue #2953) — when a sub-recipe step fails, the runner invokes an agent recovery step before raising a hard error, giving the workflow a chance to self-heal.

Branch Name Sanitization¶

Problem¶

step-04-setup-worktree in amplifier-bundle/recipes/default-workflow.yaml creates a git worktree using a branch name derived from task_description. Before this fix, the raw value was used directly. Multi-line task descriptions (common when pasting from issue bodies or commit messages) produced branch names containing newlines, which git rejects immediately:

fatal: 'fix login bug\nwith oauth' is not a valid branch name

Other common failure modes:

Input character	Git error
Uppercase letters	Branch created but inconsistent with tooling
`(`, `)`, `/`, `:`	Branch name parsing failures in some git versions
Names > 60 chars	Unwieldy; can hit filesystem path-length limits
Trailing `.` or `-`	`git check-ref-format` rejects them

Solution: Sanitization Pipeline¶

Step 4 now runs task_description through a shell pipeline before constructing the branch name:

newlines → spaces
→ strip leading/trailing whitespace
→ lowercase
→ replace invalid chars ([^a-z0-9_.-]) with hyphens
→ collapse consecutive hyphens
→ truncate to 60 characters
→ strip trailing hyphens and dots
→ validate with git check-ref-format --branch
→ fallback to {prefix}/issue-{n}-task if invalid

The resulting slug is inserted into the branch name as:

{prefix}/issue-{issue_number}-{slug}

Example transformations:

`task_description`	Branch slug
`Fix login bug`	`fix-login-bug`
`Fix authentication bug\nThis affects oauth`	`fix-authentication-bug-this-affects-oauth`
`Add User Authentication`	`add-user-authentication`
`fix: auth/login (oauth2)`	`fix-auth-login-oauth2`
`fix_login_bug`	`fix_login_bug` (underscore preserved)
`bump version 1.2.3`	`bump-version-1.2.3` (dot preserved)
120 'a' characters	`aaaa...` (truncated to 60 chars)
`!@#$%^&*()`	fallback `{prefix}/issue-{n}-task`

Fallback Behavior¶

If the sanitized slug is empty or fails git check-ref-format --branch, the branch name falls back to:

{prefix}/issue-{issue_number}-task

This ensures step 4 never blocks the workflow due to a pathological task_description.

Security Note¶

The task_description value is always passed via a named environment variable ($TASK_DESC), never interpolated directly into the shell command string. This prevents shell injection from attacker-influenced task descriptions.

Sub-Recipe Agentic Recovery¶

Problem¶

When a sub-recipe step failed, RecipeRunner._execute_sub_recipe() raised StepExecutionError immediately. The entire parent workflow halted with no opportunity to recover, even when the failure was transient (e.g., a flaky network call) or when an agent could trivially complete the remaining work.

Solution: Recovery Agent Dispatch¶

On sub-recipe failure, _execute_sub_recipe() now invokes _attempt_agent_recovery() before raising. The recovery agent receives full failure context and can either complete the work or confirm that the failure is unrecoverable.

Recovery Flow¶

sub-recipe fails
    │
    ▼
_attempt_agent_recovery()
    │
    ├─ adapter is None ──────────────────────► return None (no recovery possible)
    │
    ├─ adapter.execute_agent_step() raises ──► return None (log warning)
    │
    ├─ response is empty ────────────────────► return None
    │
    ├─ response contains "UNRECOVERABLE" ────► return None (log warning)
    │
    └─ response is non-empty ────────────────► return recovery_output
         │
         ▼
_execute_sub_recipe():
    ├─ recovery_output is not None ──────────► return recovery_output (success)
    └─ recovery_output is None ──────────────► raise StepExecutionError

Recovery Prompt¶

The recovery agent receives a structured prompt containing:

Sub-recipe name
Names of the failed steps
Original error message
First 500 characters of partial outputs from the failed run
A redacted summary of the current recipe context (up to 20 keys × 80 chars)

Example prompt skeleton:

A sub-recipe execution failed and requires your assessment.

Sub-recipe: build-and-test
Failed steps: step-03-run-tests
Error: Sub-recipe 'build-and-test' failed
Partial outputs (first 500 chars):
...

Please assess whether this failure is recoverable:
1. If you can complete the work that the sub-recipe was supposed to do,
   do so now and provide the result.
2. If the failure is not recoverable (missing prerequisites,
   unresolvable conflicts, etc.), respond with 'UNRECOVERABLE: <reason>'.

Current context summary:
  issue_number: 42
  task_description: fix authentication bug
  api_key: [REDACTED]

Signaling Unrecoverable Failures¶

The recovery agent signals an unrecoverable failure by including the token UNRECOVERABLE anywhere in its response (case-insensitive):

UNRECOVERABLE: the test environment requires Docker which is not installed

Any other non-empty response is treated as a successful recovery and returned to the parent workflow as the step output.

API Reference¶

`RecipeRunner._execute_sub_recipe(step, ctx) -> str`¶

Executes a sub-recipe step. On failure, attempts agent recovery before raising.

Returns: Output string from the sub-recipe (or recovery agent on recovery).

Raises: StepExecutionError if:

Recursion depth exceeds MAX_RECIPE_DEPTH
The recipe field is missing or the recipe file is not found
Both the sub-recipe and the recovery agent fail

`RecipeRunner._attempt_agent_recovery(step, ctx, sub_recipe_name, error_message, failed_step_names, partial_outputs) -> str | None`¶

Builds a recovery prompt and dispatches to adapter.execute_agent_step().

Parameters:

Parameter	Type	Description
`step`	`Step`	The recipe step that triggered the sub-recipe
`ctx`	`RecipeContext`	Current recipe execution context
`sub_recipe_name`	`str`	Name of the failed sub-recipe
`error_message`	`str`	Error message from the original failure
`failed_step_names`	`list[str]`	Names of the failed steps; joined to a comma-separated string internally before prompt construction
`partial_outputs`	`str`	Raw partial output from the failed run; truncated to 500 chars internally before prompt construction

Returns: Agent output string on successful recovery, None otherwise.

Never raises. All adapter exceptions are caught and logged at WARNING level so the caller can decide how to handle the None return.

`RecipeRunner._summarise_context(ctx) -> str`¶

Produces a redacted, human-readable summary of the recipe context for inclusion in recovery prompts.

Caps at 20 keys (remaining keys silently omitted)
Truncates each value preview to 80 characters
Redacts keys whose names contain token, secret, password, or key (case-insensitive substring match)

Working Directory Resolution¶

The recovery agent step uses the same working directory as the step that triggered the sub-recipe:

step.working_dir if set
runner.working_dir otherwise

Logging¶

Event	Level	Message
Sub-recipe failure, recovery starting	`WARNING`	`"Sub-recipe '{name}' failed (step '{steps}'). Attempting agent recovery."`
Recovery prompt constructed	`DEBUG`	`"Recovery prompt for sub-recipe '{name}': {prompt}"`
Recovery succeeded	`INFO`	`"Agent recovery succeeded for sub-recipe '{name}' (step '{step_id}')"`
No adapter configured	`WARNING`	`"Cannot attempt agent recovery: no adapter configured"`
Adapter raised during recovery	`WARNING`	`"Agent recovery invocation failed for sub-recipe '{name}': {exc}"`
Empty recovery response	`WARNING`	`"Agent recovery returned empty output for sub-recipe '{name}'"`
UNRECOVERABLE signal	`WARNING`	`"Agent recovery reported unrecoverable failure for sub-recipe '{name}': {preview}"`

Recovery prompts are logged at DEBUG level only — not INFO — to avoid partial output content (which may be sensitive) appearing in standard logs.

Security Notes¶

partial_outputs is truncated to 500 characters inside _attempt_agent_recovery before prompt construction, regardless of how much raw output the caller passes in, so attacker-influenced content cannot exceed the budget.
Context keys matching sensitive patterns are redacted in _summarise_context.
The recovery agent uses the existing adapter credentials — no new authentication surface is introduced.

Configuration¶

Neither feature introduces new configuration knobs. The sanitization pipeline and recovery flow are always active.

To disable recovery for a specific sub-recipe step (not currently supported via YAML), set adapter=None when constructing RecipeRunner.

Testing¶

Both features have dedicated test suites in tests/:

File	Tests	Coverage
`tests/test_branch_name_sanitization.py`	16	Newlines, special chars, truncation, trailing chars, fallback, `git check-ref-format` validation
`tests/test_sub_recipe_recovery.py`	21	Recovery success, UNRECOVERABLE signal (case-insensitive), empty response, adapter exception, no adapter, successful sub-recipe (no recovery invoked), prompt content, working directory resolution

Run with:

.venv/bin/python -m pytest tests/test_branch_name_sanitization.py tests/test_sub_recipe_recovery.py -x -q

Expected output:

37 passed in ...s

Recipe Resilience: Branch Sanitization & Sub-Recipe Recovery¶

Branch Name Sanitization¶

Problem¶

Solution: Sanitization Pipeline¶

Fallback Behavior¶

Security Note¶

Sub-Recipe Agentic Recovery¶

Problem¶

Solution: Recovery Agent Dispatch¶

Recovery Flow¶

Recovery Prompt¶

Signaling Unrecoverable Failures¶

API Reference¶

RecipeRunner._execute_sub_recipe(step, ctx) -> str¶

RecipeRunner._attempt_agent_recovery(step, ctx, sub_recipe_name, error_message, failed_step_names, partial_outputs) -> str | None¶

RecipeRunner._summarise_context(ctx) -> str¶

Working Directory Resolution¶

Logging¶

Security Notes¶

Configuration¶

Testing¶

`RecipeRunner._execute_sub_recipe(step, ctx) -> str`¶

`RecipeRunner._attempt_agent_recovery(step, ctx, sub_recipe_name, error_message, failed_step_names, partial_outputs) -> str | None`¶

`RecipeRunner._summarise_context(ctx) -> str`¶