Every message flows through these layers in sequence. Total latency: ~60ms for routine messages, ~200ms for complex analysis. Zero LLM calls — all deterministic Python.
Unlike earlier versions which used parallel LLM agents for memory search, v9 replaces them with a single deterministic index. Every canonical memory file is parsed line-by-line into a searchable index using BM25 + FAISS reciprocal rank fusion. Lookups return exact matches in <30ms with zero LLM calls.
Each layer maps to anatomical brain regions. Tool-based layers are pure Python, deterministic, and run at $0 cost.
Three cascading stages keep the context window clean. All stages cost $0.
Budget-based pruning on every context assembly. Soft trim at configurable thresholds, hard clear with last-N assistant preservation.
Periodic cron job dumps pruned content to a staging area for later consolidation into long-term memory.
Idle-gated consolidation. Extracts facts from staging into canonical long-term memory using a lightweight model. Deduplicates against existing entries.