#360 CRITICAL: DataShard script storage uses workflow_name+version instead of definition_hash - causes stale data collisions

closed critical Created 2025-12-11 10:26 · Updated 2025-12-11 10:33

Description

Edit
## Problem When storing Python DSL scripts in DataShard, the key is based on workflow_name + version instead of definition_hash. This causes a CRITICAL bug: 1. Submit workflow foo v1 (ForEach) - stored at workflow-scripts/foo/v1/python 2. make db-recreate clears PostgreSQL but NOT DataShard (S3) 3. Submit different workflow foo - gets v1 again (DB empty) - NEW definition_hash 4. DataShard lookup uses workflow_name + version - returns OLD stale code! ## Impact - Data integrity violation: Different definition_hash can return wrong DSL source - Audit trail corruption: Cannot reliably trace back to original source code - Security risk: Could serve outdated/insecure code if DB is restored ## Fix Required 1. Add definition_hash field to WORKFLOW_SCRIPTS_SCHEMA 2. Use definition_hash as the primary filter for storage/retrieval 3. Update store_source_script() to accept and store definition_hash 4. Update get_script_from_datashard() to filter by definition_hash ## Files to modify - engine/services/workflow_versioning_service.py

Comments

Loading comments...

Context

Loading context...

Audit History

View All
Loading audit history...