#721 Workers become zombies after asyncio crash - container healthy but all threads dead
Description
EditOBSERVED: Workers can crash with asyncio.exceptions.CancelledError during HTTP operations, causing all internal threads (TimeoutService, task processing loop, etc.) to die while the container remains 'running' and 'healthy'.
Symptoms:
- Container status: running/healthy
- Zero log activity
- Stuck workflows not cleaned up
- TimeoutService not running
Root cause: asyncio task group error during HTTP connection:
asyncio.exceptions.CancelledError: Cancelled via cancel scope ...
Impact: Workflows get stuck in 'running' state indefinitely because TimeoutService (which cleans up expired claims) is dead.
Workaround: Restart workers with 'docker compose restart worker'
Suggested fix: Add a watchdog that monitors thread activity and restarts worker if all threads are dead. Or improve asyncio error handling to prevent cascade failures.
Comments
Loading comments...
Context
Loading context...
Audit History
View AllLoading audit history...