State Persistence

Persistence Subsystem

Data integrity is handled by the StateStore (defined in engine/state.py). Operations executing against the filesystem are heavily mitigated against corruption, even against SIGKILL or OS-level interruption, via strict atomic swapping methodologies.

Atomic Swaps (`atomic_replace`)

Instead of writing directly to expected artifacts (e.g., state.json), the persistence layer marshals data into temporary .tmp buffers. Once serialization completes, an os.replace operation forcibly overwrites the destination atomic inode. A retry-backoff mechanism actively intercepts PermissionError locks triggered by host EDR/Antivirus heuristics scanning newly created binaries.

Schema Versioning

All state objects are tagged with an internal STATE_VERSION. If the engine encounters schema drift during deserialization (e.g., reading a v1 JSON under a v2 runtime), it explicitly deprecates the state object rather than inducing unpredictable mutation bugs.

Workspace Hierarchy

work/<pipeline_name>/
├── run.json             # Serialized `RunState`: Execution metrics and pipeline metadata
├── pipeline.yaml        # Immutable snapshot of the pipeline YAML config during initialization
├── run_manifest.json    # Aggregated macro-overview of all samples (used by Starlette API)
└── samples/
    └── <sample_id>/     # Isolated sandbox
        ├── state.json   # Serialized `SampleState`: Complete ledger of `StageOutput` maps
        ├── context.md   # Synthesized LLM context generated via `engine/context.py`
        └── ...          # Processor-specific artifacts

The `history` Ledger

The SampleState.history dictionary strictly maps processor stage names to instances of StageOutput. Output data structures are strictly isolated and namespaced. A downstream processor extracting metrics from an upstream map will query: history["upstream_processor"].data.get("metric") (see Building Custom Processors).

This namespacing permanently prevents field collision across divergent heuristic analysis techniques.

Persistence Subsystem

Atomic Swaps (atomic_replace)

Schema Versioning

Workspace Hierarchy

The history Ledger

Atomic Swaps (`atomic_replace`)

The `history` Ledger