#696 OBS-01: No system-level alerting for worker failures

closed critical Created 2025-12-25 02:56 · Updated 2025-12-25 03:26

Description

Edit
Location: Not implemented. Issue: If workers crash at 2 AM, no notification to on-call engineers. No integration with PagerDuty/OpsGenie. Fix: Implement AlertingService that monitors get_health_metrics() and sends alerts on: expired claims > 0, online workers = 0, failure rate > threshold.

Comments

Loading comments...

Context

Loading context...

Audit History

View All
Loading audit history...