#724 ## Problem Two issues with worker signal handling prevent proper graceful shutdown: ### Issue 1: Delayed Shutdown Response - Signal handler sets `_shutdown_requested = True` - But `wakeup_event` is a local variable in `worker_command()` - Signal handler can't access it to wake up blocked `Event.wait()` - Worker waits up to `poll_interval` (1-5s) before noticing SIGTERM ### Issue 2: In-Flight Tasks Abandoned - After main loop breaks, `finally` block stops services immediately - No wait for tasks in `orchestrator.bulkhead` to complete - Tasks get killed mid-execution - They stay claimed until heartbeat timeout, then re-execute from scratch ## Impact - K8s rolling updates will kill tasks mid-execution - Wasted compute from task re-execution - Potential data corruption for non-idempotent tasks ## Solution 1. Make `wakeup_event` module-level so signal handler can access it 2. Add graceful drain logic like activity_worker has: - Wait up to 30s for in-flight tasks - Call `bulkhead.shutdown(wait=True)` ## Files - engine/cli/worker.py ## Testing - Send SIGTERM to worker with active tasks - Verify tasks complete before worker exits - Verify shutdown happens within seconds, not waiting for poll_interval

closed high Created 2025-12-28 00:21 · Updated 2025-12-28 00:37

Description

Edit

No description provided.

Comments

Loading comments...

Context

Loading context...

Audit History

View All
Loading audit history...