#453 LLM tool asyncio.run() causes worker zombie state via anyio corruption
Description
EditIn llm.py lines 573-577, when no event loop is running:
```python
try:
asyncio.get_running_loop()
except RuntimeError:
return asyncio.run(coro) # No timeout!
```
When anyio (used by httpx) has corrupted internal state (e.g., after sandbox fallback), asyncio.run() can hang indefinitely because:
1. anyio callbacks fail with 'no running event loop'
2. HTTP request never completes
3. asyncio.run() waits forever
**Evidence from logs:**
- 10:01:07 asyncio error: RuntimeError: no running event loop -> InvalidStateError
- Workers stopped claiming tasks entirely after this
- LISTEN thread still ran (daemon), making workers appear 'alive'
- 590 pending + 7000+ completed tasks, zero being processed
**Root cause:** Mixing sync sandbox fallback -> direct execution -> LLM async call corrupts anyio state
**Fix options:**
1. Add timeout to asyncio.run() path (wrap in thread with timeout)
2. Ensure httpx client lifecycle is properly managed
3. Add watchdog to detect zombie main loop
Comments
Loading comments...
Context
Loading context...
Audit History
View AllLoading audit history...