>_
.issue.db
/highway-workflow-engine
Dashboard
Issues
Memory
Lessons
Audit Log
New Issue
Edit Issue #752
Update issue details
Title *
Description
ROOT CAUSE ANALYSIS: A worker crash (SystemExit, OOM, etc.) leaves a transaction in 'idle in transaction' state. This zombie transaction holds locks on workflow_state rows. When fail_stuck_tasks cron job tries to UPDATE r_highway_default to mark runs as failed, it blocks waiting for these locks. All subsequent cron job executions pile up, creating a lock convoy. Result: stuck task detection completely broken - orphaned tasks never get cleaned up. EVIDENCE FROM PRODUCTION: - PID 532152: idle in transaction, holding transactionid lock 4974063 - 16+ fail_stuck_tasks queries blocked for 2-18+ minutes - Workflow run 2adf633f-4824-4f49-a66b-9311035c006a stuck in pending forever FIXES REQUIRED: 1. Add PostgreSQL timeout settings (idle_in_transaction_session_timeout, lock_timeout) 2. Add SET LOCAL lock_timeout in fail_stuck_tasks function 3. Ensure transaction cleanup on worker crash (handle SystemExit properly)
Priority
Low
Medium
High
Critical
Status
Open
In Progress
Closed
Won't Do
Due Date (YYYY-MM-DD)
Tags (comma separated)
Related Issues (IDs)
Enter IDs of issues related to this one. They will be linked as 'related'.
Update Issue
Cancel