#164 Activity worker monitoring API - single view for task status, health, logs

closed high Created 2025-11-29 04:36 · Updated 2025-11-29 05:28

Description

Edit
## Problem Investigating liveliness and healthiness of activity workers is very difficult. Currently requires: - Multiple database queries to highway.activity_queue - Checking systemd journal for each worker - No centralized view of what's running, what's stuck, what failed - No easy way to correlate activity with workflow run ## Current State - activity_queue table has: status, worker_id, last_heartbeat, attempt, claim_expires_at - No API endpoint to query activity status - No aggregated health view - Heartbeat only stores last_heartbeat (no history) - Must manually run: `SELECT * FROM highway.activity_queue WHERE status='running'` ## Required API Endpoints ### GET /api/v1/activities/status Returns real-time status of all activity workers and their current tasks: ```json { "workers": [ { "worker_id": "activity-worker-1", "status": "healthy", "current_activity": { "activity_id": "uuid", "function": "tools.python.run", "running_seconds": 45, "last_heartbeat": "2025-11-29T04:30:00Z" } } ], "summary": { "total_workers": 4, "healthy": 4, "busy": 3, "idle": 1 }, "queue": { "pending": 0, "running": 3, "completed_last_hour": 12, "failed_last_hour": 0 } } ``` ### GET /api/v1/activities/{activity_id} Returns detailed info about specific activity: - Current status - Worker executing it - Function being called - Parameters - Start time, duration - Heartbeat history (if we add it) - Link to workflow run - Link to logs (DataShard or journal) ### GET /api/v1/workflows/{workflow_run_id}/activities Returns all activities spawned by a workflow run ## Additional Requirements - Consider adding heartbeat_history table for debugging stuck activities - Add stale activity detection (last_heartbeat > N seconds old) - Add activity timeout alerting ## Context Discovered during Kafka ETL pipeline implementation. User asked 'give me list of all heartbeats of this workflow' - only last_heartbeat stored, no history. Debugging required multiple manual DB queries.

Comments

Loading comments...

Context

Loading context...

Audit History

View All
Loading audit history...