>_
.issue.db
/highway-workflow-engine
Dashboard
Issues
Memory
Lessons
Audit Log
New Issue
Edit Issue #263
Update issue details
Title *
Description
## Problem tools.python.run imports code dynamically from sys.path (local disk). If a worker is updated with new code while processing an old workflow event, it may execute new logic against old state, causing non-deterministic replay failures. ## Impact - Replay debugging becomes unreliable if code changes between execution and replay - Production workflows may behave differently on retry if code was deployed mid-execution - Non-deterministic failures that are impossible to reproduce ## Technical Details - Current: tools.python.run imports from local sys.path - Problem: Code on disk may differ from code that originally created the workflow - Scenario: 1. Workflow v1.0 starts execution 2. Code deployed: v1.1 with breaking changes 3. Worker restarts mid-workflow 4. Workflow resumes with v1.1 code but v1.0 state 5. Non-deterministic behavior or crash ## Proposed Solution: Strict Artifact Loading Workers should prioritize loading the code version hash referenced in the workflow_run (stored in DataShard) over local disk code. ## Implementation Steps ### Phase 1: Code Hash Tracking 1. On workflow submission, compute SHA256 of relevant Python modules 2. Store code_version_hash in workflow_runs table 3. Store actual code artifacts in DataShard ### Phase 2: Artifact Loading 1. Modify tools.python.run to check for workflow code version 2. If version mismatch detected: - Option A: Load code from DataShard artifact - Option B: Fail with clear error message 3. Add configuration flag: strict_code_versioning (default: warn) ### Phase 3: Validation 1. Add startup check comparing worker code hash vs active workflows 2. Log warning if mismatched workflows detected 3. Add CLI command: hwe validate-code-versions ## Configuration Options strict_code_versioning: - disabled: No version checking (current behavior) - warn: Log warning but continue execution - strict: Fail on version mismatch - artifact: Load from DataShard artifact ## Migration Strategy 1. Deploy with strict_code_versioning=warn initially 2. Monitor for version mismatch warnings 3. Enable strict mode once confident ## Acceptance Criteria - [ ] Code version hash stored on workflow creation - [ ] Version mismatch detection implemented - [ ] Warning mode logs mismatches - [ ] Strict mode fails on mismatch - [ ] Documentation updated with versioning strategy ## Tags: reliability, replay, versioning, determinism
Priority
Low
Medium
High
Critical
Status
Open
In Progress
Closed
Due Date (YYYY-MM-DD)
Tags (comma separated)
Related Issues (IDs)
Enter IDs of issues related to this one. They will be linked as 'related'.
Update Issue
Cancel