#721 Workers become zombies after asyncio crash - container healthy but all threads dead

Description

Edit

OBSERVED: Workers can crash with asyncio.exceptions.CancelledError during HTTP operations, causing all internal threads (TimeoutService, task processing loop, etc.) to die while the container remains 'running' and 'healthy'. Symptoms: - Container status: running/healthy - Zero log activity - Stuck workflows not cleaned up - TimeoutService not running Root cause: asyncio task group error during HTTP connection: asyncio.exceptions.CancelledError: Cancelled via cancel scope ... Impact: Workflows get stuck in 'running' state indefinitely because TimeoutService (which cleans up expired claims) is dead. Workaround: Restart workers with 'docker compose restart worker' Suggested fix: Add a watchdog that monitors thread activity and restarts worker if all threads are dead. Or improve asyncio error handling to prevent cascade failures.

Similar Issues

Loading similar issues...

Comments

Loading comments...

Context

Loading context...

Audit History

View All

Loading audit history...