How Retries Work
When a step throws an error:- The step attempt is marked as
failedand the error is recorded - The workflow run is rescheduled with exponential backoff
- When the workflow resumes, it replays to the failed step
- The step function executes again
failed.
To prevent runaway executions, each workflow run also has a hard cap of 1000
total step attempts. If the run reaches that cap, it fails immediately and is
not retried.
Step Retries
Steps that throw are retried automatically:Step Retry Policy
Each step can define its own retry policy. If omitted, steps use these defaults:| Field | Default | Description |
|---|---|---|
initialInterval | "1s" | Delay before the first retry |
backoffCoefficient | 2 | Multiplier applied to each subsequent retry delay |
maximumInterval | "100s" | Upper bound for retry delay |
maximumAttempts | 10 | Total attempts including the initial one (0 = unlimited) |
| Attempt | Delay |
|---|---|
| 1 | Immediate |
| 2 | ~1s |
| 3 | ~2s |
| 4 | ~4s |
| 5 | ~8s |
| … | … |
deadlineAt and the next retry
would exceed it.
Workflow Retries
Errors thrown outside ofstep.run(...) are workflow-level failures.
Workflow-level failures are not retried by default — the workflow is
marked as failed.
To enable workflow-level retries, set a retryPolicy on the workflow spec:
Step retries and workflow retries are independent. Step failures use the
step’s own retry policy. The workflow retry policy only applies to errors
thrown outside steps.
Missing Workflow Definitions
If a worker claims a run but doesn’t have the matching workflow registered, it reschedules the run with exponential backoff (starting at 5s, capped at 5min). This keeps the run alive during rolling deploys or multi-worker setups where the right worker hasn’t started yet. Once a worker with the correct definition comes online, it claims the run and executes normally.What Triggers a Retry
Retries happen when:- A step function throws an error or returns a rejected promise
- A worker crashes during step execution (the step is re-executed on recovery)
- Completed steps (cached results are returned)
- Explicitly canceled workflows
- Workflow-level errors (unless a workflow
retryPolicyis configured)
Error Handling
You can catch step errors inside a workflow to run fallback logic:When you catch an error the workflow continues normally. The step is still
recorded as failed, but no retry is triggered.
Terminal Failures
A workflow is permanently markedfailed when step retries are exhausted
(maximumAttempts reached), deadlineAt expires, or the run exceeds the step
attempt cap.
Once terminal, no more automatic retries occur. You can inspect and manually
retry failed workflows from the dashboard.