LEGBA's architecture mirrors biological brain structure through independent, containerized modules that communicate via high-speed ZeroMQ IPC, creating a distributed brain within a single machine.
Each cognitive function is isolated as a containerized service: Observer, Thinker, Planner, Actor, and Verifier operate independently yet cohesively.
Local ZeroMQ inter-process communication using PUB/SUB, PUSH/PULL, and REQ/REP patterns for efficient module coordination with heartbeats and security.
Orchestrates scheduling, routing, and queue priorities across all modules with high-performance coordination and microsecond-level precision.
Dual-layer memory: short-term vector store for immediate recall and long-term LoRA adapters for persistent learning.
NVIDIA-optimized TensorRT-LLM ensemble hosting sub-models (Nemotron or equivalent) with multi-GPU concurrency.
Single high-performance GPU node (e.g., DGX-class), enabling the entire cognitive system to run on one machine without distributed networking.
Data flows through the system in structured envelopes, each carrying metadata that enables intelligent routing and prioritization.
| Component | Implementation | Notes |
|---|---|---|
| Conductor | High-performance runtime | IPC orchestrator and scheduler |
| Modules | Containerized (Docker/K8s) | Each brain region isolated |
| Models | TensorRT-LLM runtime | Multi-GPU concurrency |
| Memory | MinIO + PostgreSQL/pgvector | Dual-layer STM/LTM |
| Hardware | Single high-performance GPU node | e.g., DGX-class system |
Unlike distributed AI systems that require complex networking, LEGBA's entire cognitive architecture runs on a single high-performance compute node. This design eliminates network latency, simplifies deployment, and ensures sub-millisecond communication between modules - critical for real-time cognitive processing.
Learn how LEGBA's architecture enables continuous learning through its cognitive cycle