#135 Health metrics timeline recorder script for test analysis
Description
EditCreate a script that continuously records health metrics to DataShard for timeline analysis during test runs.
PURPOSE:
Monitor worker distribution and system health during integration tests to visualize how workers behave under load, during disruptions, and across test scenarios.
REQUIREMENTS:
1. Script polls get_health_metrics() every 0.5 seconds
2. Writes results to DataShard table with timestamp
3. Use DataShard timetravel to query historical snapshots
4. Output should enable timeline visualization of:
- Worker count changes (online/offline/paused/draining)
- Batch capacity fluctuations
- Task throughput over time
- Failure spikes and recovery patterns
IMPLEMENTATION:
1. Create script: scripts/health_timeline_recorder.py
2. DataShard table schema:
- timestamp (primary key or indexed)
- metric_name (text)
- metric_value (bigint)
- category (text)
- severity (text)
- snapshot_id (for grouping metrics from same poll)
3. Script features:
- CLI args: --interval (default 0.5s), --duration (optional), --output-table
- Signal handling for graceful stop (Ctrl+C)
- Print summary on exit (total snapshots, duration, etc.)
4. Query helper for timetravel analysis:
- Get metrics at specific timestamp
- Get metric changes over time range
- Aggregate stats (min/max/avg) per metric
USAGE EXAMPLE:
```bash
# Terminal 1: Start recorder
python scripts/health_timeline_recorder.py --interval 0.5
# Terminal 2: Run tests
pytest -n 4 tests/integration/
# Terminal 1: Ctrl+C to stop, then analyze
python scripts/health_timeline_analyzer.py --from '5 minutes ago' --metric registry_online_workers
```
CONTEXT:
- get_health_metrics() SQL function already exists in engine/sql/queries/health_metrics.sql
- DataShard v0.2.1 is installed (pip install datashard==0.2.1)
- See docs/agent-docs/DATASHARD_INTEGRATION.md for DataShard usage
- Worker registration implemented in Issue #134 (worker_registry table)
DEPENDENCIES:
- DataShard timetravel feature
- Existing get_health_metrics() function (40 metrics across 9 categories)
FILES TO CREATE:
- scripts/health_timeline_recorder.py (main recorder script)
- scripts/health_timeline_analyzer.py (query/visualization helper)
NICE TO HAVE:
- ASCII chart output for quick terminal visualization
- Export to CSV for external analysis
- Integration with existing monitoring (health.py API endpoint)
Comments
Loading comments...
Context
Loading context...
Audit History
View AllLoading audit history...