#360 CRITICAL: DataShard script storage uses workflow_name+version instead of definition_hash - causes stale data collisions
Description
Edit## Problem
When storing Python DSL scripts in DataShard, the key is based on workflow_name + version instead of definition_hash.
This causes a CRITICAL bug:
1. Submit workflow foo v1 (ForEach) - stored at workflow-scripts/foo/v1/python
2. make db-recreate clears PostgreSQL but NOT DataShard (S3)
3. Submit different workflow foo - gets v1 again (DB empty) - NEW definition_hash
4. DataShard lookup uses workflow_name + version - returns OLD stale code!
## Impact
- Data integrity violation: Different definition_hash can return wrong DSL source
- Audit trail corruption: Cannot reliably trace back to original source code
- Security risk: Could serve outdated/insecure code if DB is restored
## Fix Required
1. Add definition_hash field to WORKFLOW_SCRIPTS_SCHEMA
2. Use definition_hash as the primary filter for storage/retrieval
3. Update store_source_script() to accept and store definition_hash
4. Update get_script_from_datashard() to filter by definition_hash
## Files to modify
- engine/services/workflow_versioning_service.py
Comments
Loading comments...
Context
Loading context...
Audit History
View AllLoading audit history...