#164 Activity worker monitoring API - single view for task status, health, logs
Description
Edit## Problem
Investigating liveliness and healthiness of activity workers is very difficult. Currently requires:
- Multiple database queries to highway.activity_queue
- Checking systemd journal for each worker
- No centralized view of what's running, what's stuck, what failed
- No easy way to correlate activity with workflow run
## Current State
- activity_queue table has: status, worker_id, last_heartbeat, attempt, claim_expires_at
- No API endpoint to query activity status
- No aggregated health view
- Heartbeat only stores last_heartbeat (no history)
- Must manually run: `SELECT * FROM highway.activity_queue WHERE status='running'`
## Required API Endpoints
### GET /api/v1/activities/status
Returns real-time status of all activity workers and their current tasks:
```json
{
"workers": [
{
"worker_id": "activity-worker-1",
"status": "healthy",
"current_activity": {
"activity_id": "uuid",
"function": "tools.python.run",
"running_seconds": 45,
"last_heartbeat": "2025-11-29T04:30:00Z"
}
}
],
"summary": {
"total_workers": 4,
"healthy": 4,
"busy": 3,
"idle": 1
},
"queue": {
"pending": 0,
"running": 3,
"completed_last_hour": 12,
"failed_last_hour": 0
}
}
```
### GET /api/v1/activities/{activity_id}
Returns detailed info about specific activity:
- Current status
- Worker executing it
- Function being called
- Parameters
- Start time, duration
- Heartbeat history (if we add it)
- Link to workflow run
- Link to logs (DataShard or journal)
### GET /api/v1/workflows/{workflow_run_id}/activities
Returns all activities spawned by a workflow run
## Additional Requirements
- Consider adding heartbeat_history table for debugging stuck activities
- Add stale activity detection (last_heartbeat > N seconds old)
- Add activity timeout alerting
## Context
Discovered during Kafka ETL pipeline implementation. User asked 'give me list of all heartbeats of this workflow' - only last_heartbeat stored, no history. Debugging required multiple manual DB queries.
Comments
Loading comments...
Context
Loading context...
Audit History
View AllLoading audit history...