Error Handling
Perstack is designed to recover from errors automatically when possible. Most errors are fed back to the LLM for self-correction rather than crashing the run.
How errors are handled
| Error type | Behavior |
|---|---|
| Tool/MCP errors | Fed back to LLM — it can retry or try a different approach |
| LLM generation errors | Automatic retry with error context |
| Fatal errors | Run stops with stoppedByError |
This design lets the LLM handle transient failures (network issues, rate limits, invalid tool arguments) without human intervention.
Stop reasons
A Run ends with one of these checkpoint statuses:
| Status | Meaning |
|---|---|
completed | LLM called attemptCompletion with no remaining todos — task done |
stoppedByExceededMaxSteps | Job’s maxSteps limit reached |
stoppedByInteractiveTool | Waiting for user input (Coordinator only) |
stoppedByDelegate | Waiting for delegate Expert |
stoppedByError | Unrecoverable error |
When a Run stops with stoppedByExceededMaxSteps, you can resume from the last checkpoint. See State Management.
Delegation errors
When a Delegated Expert fails, the Job continues — the error is returned to the Coordinator, which decides how to handle it. See Delegation failure handling for details.
Events for monitoring
Use errorRun events to monitor failures:
npx perstack run my-expert "query" | jq 'select(.type == "errorRun")'For programmatic access:
import { run } from "@perstack/runtime"
await run(params, {
eventListener: (event) => {
if (event.type === "errorRun") {
// Log, alert, or handle the error
}
}
})Common issues
MCP server not starting: Check that requiredEnv variables are set and the command is correct.
Tool call failures: The LLM receives the error and usually retries. If failures persist, check the tool’s input requirements.
Rate limits: The runtime retries automatically. For high-volume usage, configure provider rate limits or add delays.
What’s next
- State Management — resuming after failures
- Runtime — how the agent loop handles errors