#114 Stress Test

closed critical Created 2025-11-27 04:12 · Updated 2025-11-27 08:11

Description

Edit
I am going to run pytest in another terminal. You need to write a bash, in scripts/disrupt.sh, where first you list all of the highway-* services (except api), including internal worker, activity worker, ..., then you need to randomly kill one or two of those, every 10 second. the system should operate with 60% of capacity, you should kill those by pid, you should make sure they will be restarted. you need to do this around 3 minutes. at the end of the script, all services must be back to normal. again, api shouldn't be killed, the rest can. the bash script must be comprehensive, and not specific to this system's setting, so any deployment can run the disruption. after creating the script, I will run it, and will report back results to you. The goal is, all tests must pass as workers are available, but in lower capacity. --- IMPLEMENTATION COMPLETE: Script: scripts/disrupt.sh Protected services (never killed): - highway-api.service - highway-scheduler.service Killable services (auto-discovered): - highway-worker@{1..8}.service (8 workers) - highway-activity-worker.service - highway-internal-worker.service - highway-frontend.service - dsl-generator.service Capacity math: - 13 total killable services - 60% minimum = ceil(13 * 0.6) = 8 must stay running - Max 5 can be down simultaneously - Kills 1-2 per cycle (random) Chaos pattern: - 10-second intervals - 3-minute duration = 18 cycles - PID-based kill -9 - systemd auto-restart handles recovery - Final cleanup verifies all services back

Comments

Loading comments...

Context

Loading context...

Audit History

View All
Loading audit history...