5 Critical Mistakes to Avoid with AWS Lambda Durable Functions

While researching AWS Lambda Durable Functions, we uncovered several operational “gotchas” that can cause silent failures or unexpected behavior in production. Because Durable Functions rely on a strict Replay model, coding patterns that work in standard functions will break durable ones.

Key Takeaways

Watch out for these top 5 issues:

  • Non-Determinism: Random code outside steps breaks the replay.
  • Versioning: You cannot invoke $LATEST.
  • IAM Permissions: Checkpointing requires specific permissions.
  • Cross-Account Invites: Not supported directly.
  • SDK Drift: Always bundle your SDK.

Main Explanation

1. The Non-Determinism Trap

The most dangerous pitfall is using non-deterministic logic outside of a checkpointed step. When a function resumes, it replays the handler code from line one. If you have const now = new Date() or Math.random() at the top of your handler (outside a step), that value changes on every replay.

This confuses the runtime, as the logic diverges from the saved history. Fix: Always perform time calculations or randomization inside a context.step() so the result is frozen in the checkpoint.

2. Missing IAM Permissions

Durable functions need to read and write their state to a backend service. A standard Lambda execution role does not have these rights. If your function starts and immediately fails (or hangs), check your role.

Fix: You must attach lambda:CheckpointDurableExecutions and lambda:GetDurableExecutionState to the execution role.

3. The Unqualified ARN Error

You cannot trigger a durable execution using the default $LATEST alias. Durable executions must be tied to a specific code snapshot.

Fix: You are required to publish a version (e.g., aws lambda publish-version) and use the qualified ARN (ending in :1, :2) when invoking the function.

4. Cross-Account Limitations

Do not attempt to use context.invoke() to call a Lambda function in a different AWS account. The research confirms that the current architecture requires invoked functions to reside in the same account as the orchestrator.

5. Relying on Runtime SDKs

While AWS provides the Durable Execution SDK in the runtime (e.g., Node.js 24), relying on it can lead to “drift” if the runtime updates.

Fix: It is a recommended best practice to include the Durable Execution SDK in your deployment package (via npm install or pip install) rather than relying on the pre-installed version. This ensures your production code is stable regardless of underlying platform updates.

Conclusion

Durable Functions are powerful, but they are strict. By adhering to deterministic coding practices, ensuring proper IAM setup, and managing versions correctly, you can deploy reliable workflows that leverage the full power of the checkpoint-and-replay model.