#135 Health metrics timeline recorder script for test analysis

closed medium Created 2025-11-27 11:05 · Updated 2025-11-27 11:28

Description

Edit
Create a script that continuously records health metrics to DataShard for timeline analysis during test runs. PURPOSE: Monitor worker distribution and system health during integration tests to visualize how workers behave under load, during disruptions, and across test scenarios. REQUIREMENTS: 1. Script polls get_health_metrics() every 0.5 seconds 2. Writes results to DataShard table with timestamp 3. Use DataShard timetravel to query historical snapshots 4. Output should enable timeline visualization of: - Worker count changes (online/offline/paused/draining) - Batch capacity fluctuations - Task throughput over time - Failure spikes and recovery patterns IMPLEMENTATION: 1. Create script: scripts/health_timeline_recorder.py 2. DataShard table schema: - timestamp (primary key or indexed) - metric_name (text) - metric_value (bigint) - category (text) - severity (text) - snapshot_id (for grouping metrics from same poll) 3. Script features: - CLI args: --interval (default 0.5s), --duration (optional), --output-table - Signal handling for graceful stop (Ctrl+C) - Print summary on exit (total snapshots, duration, etc.) 4. Query helper for timetravel analysis: - Get metrics at specific timestamp - Get metric changes over time range - Aggregate stats (min/max/avg) per metric USAGE EXAMPLE: ```bash # Terminal 1: Start recorder python scripts/health_timeline_recorder.py --interval 0.5 # Terminal 2: Run tests pytest -n 4 tests/integration/ # Terminal 1: Ctrl+C to stop, then analyze python scripts/health_timeline_analyzer.py --from '5 minutes ago' --metric registry_online_workers ``` CONTEXT: - get_health_metrics() SQL function already exists in engine/sql/queries/health_metrics.sql - DataShard v0.2.1 is installed (pip install datashard==0.2.1) - See docs/agent-docs/DATASHARD_INTEGRATION.md for DataShard usage - Worker registration implemented in Issue #134 (worker_registry table) DEPENDENCIES: - DataShard timetravel feature - Existing get_health_metrics() function (40 metrics across 9 categories) FILES TO CREATE: - scripts/health_timeline_recorder.py (main recorder script) - scripts/health_timeline_analyzer.py (query/visualization helper) NICE TO HAVE: - ASCII chart output for quick terminal visualization - Export to CSV for external analysis - Integration with existing monitoring (health.py API endpoint)

Comments

Loading comments...

Context

Loading context...

Audit History

View All
Loading audit history...