#719 Parallel branch retry creates orphaned waits due to absurd_run_id in join_event_name
Description
EditFIXED: When a workflow with parallel branches is retried (e.g., due to worker crash), the join_event_name used absurd_run_id which changes on each retry. Branches spawned before crash emit events with OLD absurd_run_id, but after retry wait_for_branch listens for NEW absurd_run_id, causing timeout/deadlock.
Root cause: operators.py line 644 used ctx.absurd_run_id (changes on retry) instead of ctx.workflow_run_id (stable).
Fix: Changed parallel_id generation to use workflow_run_id:
- engine/interpreters/operators.py:644
- engine/replay/replay_context.py:207
The idempotency_key at line 685 already used workflow_run_id correctly, so branches won't be duplicated on retry. The fix ensures join_event_name is stable across retries.
All 447 tests pass.
Comments
Loading comments...
Context
Loading context...
Audit History
View AllLoading audit history...