#696 OBS-01: No system-level alerting for worker failures
Description
EditLocation: Not implemented. Issue: If workers crash at 2 AM, no notification to on-call engineers. No integration with PagerDuty/OpsGenie. Fix: Implement AlertingService that monitors get_health_metrics() and sends alerts on: expired claims > 0, online workers = 0, failure rate > threshold.
Comments
Loading comments...
Context
Loading context...
Audit History
View AllLoading audit history...