Lessons

Category Lesson Issue Date Actions
testing Python unittest mock patches: When mocking functions imported inside other functions, patch the original location (engine.config.get_config) not where it appears to be used. Mock target must be where the import comes from. #263 2025-12-05
observability Sidecar pattern for crash-safe logging: Use separate autocommit connection pool for lifecycle events. Small pool 2-5 conns, autocommit=True for immediate persistence, fire-and-forget design so telemetry failures never crash workflow, log before transaction starts and after commit #262 2025-12-05
security OAuth authentication != authorization. After OAuth verifies identity, ALWAYS check user has actual permissions in the system before issuing JWT tokens. Never trust OAuth state parameters for authorization decisions. #260 2025-12-04
security API key rotation security: Default behavior must be SECURE (immediate invalidation), with opt-in grace period. Never leave compromised credentials valid by default. #257 2025-12-04
security API Security Hardening: Use @require_permission decorators (after @spec.validate), g.tenant_id not headers, AST validation for code execution (whitelist not blacklist) #248 2025-12-04
general SECURITY: All API endpoints must have @require_permission decorator. New endpoints require security review before deployment. Tenant isolation must use g.tenant_id from middleware, NEVER request headers directly. - 2025-12-04
general expected behavior when the worker is killed - the shell process dies with it (process group termination). For long-running activities to survive worker restarts, you need a retry policy: Now if you make restart: 1. Worker dies → shell command gets SIGTERM 2. Activity marked as failed (attempt 1 of 5) 3. After 5 seconds delay, new worker picks it up 4. HTTP server restarts automatically Without retry policy: Activity fails permanently on worker crash With retry policy: Activity auto-recovers when new worker starts - 2025-12-03
architecture ActivityContext pattern for long-running activities: Use ActivityContext (not DurableContext) for activities to avoid holding DB connections. ActivityContext.get_connection() provides on-demand short-lived connections that auto-commit. DurableContext.get_connection() just yields its held connection. This prevents connection pool exhaustion during high concurrency of long-running activities. #244 2025-12-03
general DB Connection Management: Never hold database connections during long-running operations (shell commands, HTTP requests, etc.). Connections should be acquired, used briefly, and released. For long-running activities, acquire connection only for: 1) initial setup/variable resolution, 2) status updates. Use separate short-lived connections for periodic updates (heartbeat, PID storage). This prevents connection pool exhaustion when scaling to hundreds of concurrent activities. - 2025-12-03
architecture MCP integration: DurableMCPClient wraps ctx.step() for checkpointing. Tools: tools.mcp.invoke, tools.mcp.list_tools, tools.mcp.read_resource. DSL: WorkflowBuilder.mcp_tool(). DB: mcp_server_config with tenant isolation. Transports: stdio and HTTP. Credentials in Vault. #240 2025-12-03
general Spectree validates responses when resp= specified. UUIDs must be str(uuid). Response must match Pydantic models exactly. - 2025-12-02
replay Saga pattern implemented in DurableContext: (1) step_with_compensation() registers compensation functions, (2) run_compensations() executes in reverse order (LIFO), (3) saga() context manager auto-runs compensations on failure. All compensations are durable via ctx.step() for idempotency. - 2025-12-02
replay Highway replay uses two modes: Display Mode (historical data from checkpoints + audit log) and Simulation Mode (time-travel debugging re-executing code with mocked side effects). Determinism NOT enforced at runtime - relies on developer discipline to use ctx.now, ctx.get_random, ctx.step. - 2025-12-02
general Scheduling a Workflow When you "schedule" a workflow (e.g., via POST /v1/workflows), the following happens: 1. API Endpoint: The request hits api/blueprints/v1/workflows.py. 2. Versioning: The WorkflowVersioningService hashes the definition and stores it in the workflow_definition table. 3. Tracking: A workflow_run record is created in the database with status pending. 4. Enqueuing: The API calls absurd_client.spawn_task to insert a new task into the absurd task queue (specifically t_{queue_name}). The task name is typically tools.workflow.execute. * Atomicity: This insertion happens within a database transaction. Once committed, the workflow is effectively "scheduled" for immediate execution. - 2025-11-30
general The execution flow is as follows: 1. The orchestrator claims a task, which corresponds to the execute_workflow function. 2. The orchestrator instantiates a DurableContext, which involves initiating a database transaction and creating an AbsurdClient. 3. The orchestrator then calls execute_workflow, passing in the newly created ctx. 4. execute_workflow instantiates a WorkflowInterpreter. 5. interpreter.start_workflow(ctx=ctx, ...) is called. 6. This, in turn, calls inline_executor.start_workflow(ctx=ctx, ...). 7. Finally, the InlineExecutor executes the workflow graph, passing the same ctx object to all execute_task_inline calls. - 2025-11-30
general DRAIN LOOP BUG PATTERN: When using while-True drain loops with LIMIT queries, always track seen IDs to prevent infinite loops. If query returns same rows (race condition prevents state change), loop runs forever. Fix: seen_ids set + break if all rows duplicates + MAX_PER_CYCLE limit. - 2025-11-30
general DSL_VARIABLE_INTERPOLATION: Never echo large variable content like {{result.stdout}} or {{result}} in shell tasks. The full content gets interpolated causing 'Argument list too long' errors. Use specific small fields like {{result.returncode}}, {{result.status_code}}, or truncate large outputs. - 2025-11-29