The Snapshot
The Snapshot is the most critical component for achieving statelessness. It is a serializable JSON object that captures the entire state of a workflow execution at any given moment. By saving and rehydrating this object, you can pause, resume, and retry workflows, even across different processes or machines.
Anatomy of a Snapshot
typescript
interface Snapshot {
workflowId: string;
status: "active" | "paused" | "error" | "completed" | "failed";
currentNodeId: string | null;
context: Context;
version: number;
lastStartedAt?: number;
totalExecutionTime?: number;
metadata: { [key: string]: unknown };
retryState?: {
nodeId: string;
attempts: number;
nextRetryAt?: number;
};
}workflowId: The ID of the workflow being executed.status: The current status of the execution.active: The workflow is currently running.paused: The workflow is waiting for an external event (e.g., human input or a delay).error: The workflow encountered a recoverable error and is waiting for a retry.completed: The workflow finished successfully.failed: The workflow encountered a non-recoverable error or exhausted its retries.
currentNodeId: The ID of the node that is about to be executed or that has just been executed.context: A record of all data produced by the executed nodes. See Context for more details.version: A number that increments with each step. This is crucial for implementing optimistic locking when persisting the snapshot in a database, preventing race conditions in distributed environments.lastStartedAt/totalExecutionTime: Timestamps for monitoring and performance tracking.metadata: An open object to store any custom data related to the execution.retryState: If the workflow is in anerrorstatus, this object contains information about the pending retry, such as the number of attempts and when the next attempt should be scheduled.
The Role of the Snapshot in the Execution Cycle
- Start: A new workflow begins by creating an initial snapshot with
status: "active". - Execute Step: The
WorkflowEnginetakes a snapshot, executes thecurrentNodeId, and produces a new snapshot with the updated state (context,version,currentNodeId, etc.). - Pause: If a node returns
__pause: true, the engine returns a snapshot withstatus: "paused". This snapshot can be saved to a database. - Resume: To resume, you pass the saved snapshot back to the
engine.execute()method, optionally with anexternalPayload. The engine sets the status back toactiveand continues from where it left off. - Error & Retry: If a node fails, the
handleErrormethod checks itsRetryPolicy. If a retry is warranted, it returns a snapshot withstatus: "error"and theretryState. An external system can then decide when to re-execute based on thenextRetryAttimestamp.