SOVEREIGN: The Unified Architecture — A Magnum Opus for Local-First AI Systems That Think for Themselves
The capstone synthesis of every system I have built — Dynamic Persona MoE RAG, agentic knowledge graphs, Control Boundary governance, local inference stacks, and spec-driven code generation — collapsed into one unified sovereign AI architecture called SOVEREIGN. This is the project blueprint.
Daniel Kliewer
Author, Sovereign AI

SOVEREIGN: The Unified Architecture
A Magnum Opus for Local-First AI Systems That Think for Themselves
"The mind that runs on borrowed infrastructure answers to its landlord. Build your own floor."
Preface: Why This Post Exists
Every system I have built over the last several years was an answer to a problem I could not ignore.
SynthInt answered the problem of opaque identity: why should the values baked into an AI's persona belong to someone else? Dynamic Persona MoE RAG answered the problem of context drift: why should yesterday's dead context contaminate today's reasoning? The Private Knowledge Graph answered the problem of relational amnesia: why should the connections between ideas collapse into similarity scores that lose their meaning? DeerFlow 2.0 answered the problem of isolated execution: why should agents be monoliths when they can be swarms? OpenClaw answered the problem of cloud dependency: why should inference require a network request? SpecGen answered the problem of the blank page: why should code generation be non-deterministic when the specification is precise? mcbot01 answered the problem of foundation: why should every project rebuild the local-first scaffold from scratch?
Each of these was a partial answer. A module. A proof-of-concept that one piece of the sovereignty puzzle could be built, deployed, and owned.
This post is the synthesis.
SOVEREIGN — Self-owned Orchestration of Versatile Expert Reasoning, Evaluation, Intelligence, Governance, and Network — is the unified architecture that collapses all of these systems into a single coherent project. It is not a rewrite. It is an integration. Every module you have read about on this site is a subsystem in the larger machine. This post is the blueprint for assembling that machine.
I am writing this for myself first. Then for you — the person who read the Sovereignty Manifesto, who runs Ollama on local hardware, who understands intuitively that the architecture you choose encodes your values. You already know why this matters. This post is about how to build it.
And specifically: this post is written so that a coding agent — given nothing but this document as context — can construct the entire SOVEREIGN system from scratch. The architecture is fully specified here. The scaffolding is complete. The philosophy is embedded in the structure itself, because in sovereign AI, the code is always the philosophy.
I. The Thesis: One Problem, Seven Partial Answers, One Synthesis
The core problem of AI in 2026 is not capability. It is ownership.
The most capable models in the world run on hardware you do not control, store context you did not authorize, evolve in directions you did not choose, and serve objectives that were never yours. You interact with them through an interface that was designed to maximize your dependency, not your agency. The extraction is architectural. It was designed in.
I have spent the better part of a decade building the counter-architecture. Not as a rejection of capability — the sovereign stack I describe here is extraordinarily capable — but as a rejection of the trade embedded in every cloud AI interaction: your context in exchange for their compute.
The seven systems that SOVEREIGN synthesizes each resolved one dimension of this problem:
| System | Problem Solved | Core Contribution | |---|---|---| | SynthInt / Dynamic Persona MoE RAG | Opaque identity, static personas | Personas as versioned, auditable JSON; MoE routing to specialized reasoning agents | | Private Knowledge Graph | Relational amnesia, flat vector retrieval | Explicit semantic relationships via NetworkX/Neo4j; provenance-tracked multi-hop reasoning | | DeerFlow 2.0 | Monolithic agent execution | SuperAgent harness; AIO sandbox; persistent memory across agent invocations | | OpenClaw | Cloud inference dependency | Fully local agent runtime via Ollama + llama.cpp; zero-telemetry execution paths | | SpecGen | Non-deterministic code generation | Spec-driven, RAG-grounded code generation; deterministic output from structured input | | mcbot01 | Fragmented local-first scaffolding | Reactive UI + async FastAPI backend as the reusable foundation layer | | Control Boundary Engine | No governance in the execution path | Intent evaluation before execution; audit-ready pipelines; Colorado AI Act "Reasonable Care" compliance |
SOVEREIGN does not replace these systems. It is the environment in which they all run together, passing context between each other through a shared memory substrate, governed by a unified evaluation loop, exposed through a single interface.
The result is not merely a better RAG system. It is a local-first AI operating system — a platform for thought that you own completely.
II. Architecture Overview: The Seven Layers
SOVEREIGN is organized as seven concentric layers. Each layer is independently deployable, testable, and replaceable. The boundaries between layers are explicit interfaces, not implementation assumptions. This is the sovereignty principle applied to architecture itself: no layer should be dependent on the internal implementation of another.
text1┌─────────────────────────────────────────────────────────────────────┐2│ LAYER 7: INTERFACE LAYER │3│ Next.js 16 (App Router) + React + TypeScript │4│ Conversational UI · Session Management · Persona Selector │5├─────────────────────────────────────────────────────────────────────┤6│ LAYER 6: API GATEWAY LAYER │7│ FastAPI · REST/GraphQL · WebSocket streaming · Auth middleware │8│ Request validation · Rate limiting · Audit log emission │9├─────────────────────────────────────────────────────────────────────┤10│ LAYER 5: ORCHESTRATION LAYER │11│ MoE Orchestrator · Agent Swarm Router · DeerFlow SuperAgent │12│ Intent classification · Persona activation · Result aggregation │13├─────────────────────────────────────────────────────────────────────┤14│ LAYER 4: GOVERNANCE LAYER │15│ Control Boundary Engine · Evaluation Loop · Audit Trail │16│ Intent evaluation · Output scoring · Hallucination detection │17├─────────────────────────────────────────────────────────────────────┤18│ LAYER 3: REASONING LAYER │19│ Dynamic Persona Engine · Specialist Agent Pool · SpecGen │20│ Persona lifecycle · Bounded trait evolution · Code synthesis │21├─────────────────────────────────────────────────────────────────────┤22│ LAYER 2: MEMORY LAYER │23│ Knowledge Graph (Neo4j/NetworkX) · Vector Store (ChromaDB) │24│ Episodic memory · Semantic graph · Embedding index · Pruning │25├─────────────────────────────────────────────────────────────────────┤26│ LAYER 1: INFERENCE LAYER │27│ Ollama · llama.cpp · Local model registry │28│ On-prem inference · Zero telemetry · Reproducible seeds │29└─────────────────────────────────────────────────────────────────────┘
Every request in SOVEREIGN flows downward through these layers and returns upward. The path is never short-circuited. There is no "fast path" that skips governance. There is no "trusted caller" that bypasses the evaluation loop. The architecture enforces the principle that accountability is not optional — it is structural.
III. The Memory Substrate: Dual-Layer Sovereign Memory
The most important architectural decision in SOVEREIGN is the structure of memory. Memory determines what the system knows, what it can reason about, and what it forgets.
SOVEREIGN uses a dual-substrate memory architecture: a semantic knowledge graph for relational, provenance-tracked long-term memory, and a vector store for high-dimensional similarity retrieval. These are not interchangeable. They are complementary, and the architecture uses them for different reasoning tasks.
3.1 The Semantic Knowledge Graph
The knowledge graph in SOVEREIGN is a persistent, typed, directional graph built on Neo4j (for production persistence) with a NetworkX in-memory layer for query-scoped reasoning. The graph is not a flat document store. It is a living model of your knowledge domain.
Every node in the graph carries:
- A unique identifier and type
- A source document reference (provenance)
- A creation timestamp and last-accessed timestamp
- A relevance decay coefficient (used by the pruning engine)
- A confidence weight (updated by the evaluation loop)
Every edge in the graph carries:
- A typed relationship label (CAUSES, SUPPORTS, CONTRADICTS, PRECEDES, DERIVES_FROM, etc.)
- A weight (0.0–1.0) representing relationship strength
- A source (which agent or document established this relationship)
- A timestamp
This structure makes multi-hop reasoning explicit and auditable. When the system traces a path from Concept A to Claim B through Relationship R, that path is a first-class data structure you can inspect, export, and challenge. It is not a black-box attention pattern.
python1# sovereign/memory/knowledge_graph.py23from dataclasses import dataclass, field4from datetime import datetime5from typing import Dict, List, Optional, Any6import networkx as nx7import uuid8910@dataclass11class KGNode:12 """A typed, provenance-tracked node in the sovereign knowledge graph."""13 id: str14 label: str # Entity type: CONCEPT, CLAIM, DOCUMENT, AGENT, EVENT15 content: str # Human-readable representation16 source_document_id: str # Provenance anchor17 confidence: float = 1.0 # Updated by evaluation loop18 access_count: int = 0 # Used by LRU-style pruning19 decay_coefficient: float = 0.95 # Per-session relevance decay20 created_at: str = field(default_factory=lambda: datetime.utcnow().isoformat())21 last_accessed_at: Optional[str] = None22 metadata: Dict[str, Any] = field(default_factory=dict)232425@dataclass26class KGEdge:27 """A typed, weighted, traceable relationship in the sovereign knowledge graph."""28 id: str29 source_id: str30 target_id: str31 relationship: str # CAUSES, SUPPORTS, CONTRADICTS, PRECEDES, DERIVES_FROM32 weight: float = 1.033 established_by: str = "system" # Agent ID or document ID that created this edge34 created_at: str = field(default_factory=lambda: datetime.utcnow().isoformat())35 metadata: Dict[str, Any] = field(default_factory=dict)363738class SovereignKnowledgeGraph:39 """40 Dual-substrate knowledge graph: persistent Neo4j backend with41 NetworkX in-memory layer for query-scoped reasoning.4243 Design principle: every reasoning path is traceable.44 Every node has provenance. Every edge has an author.45 Nothing is inferred without a trail.46 """4748 def __init__(self, config: Dict[str, Any]):49 self.config = config50 self.in_memory_graph = nx.DiGraph()51 self.nodes: Dict[str, KGNode] = {}52 self.edges: List[KGEdge] = []53 self._neo4j_driver = None54 self._init_neo4j()5556 def _init_neo4j(self):57 """Initialize Neo4j connection if configured; fall back to pure NetworkX."""58 try:59 from neo4j import GraphDatabase60 self._neo4j_driver = GraphDatabase.driver(61 self.config.get("neo4j_uri", "bolt://localhost:7687"),62 auth=(63 self.config.get("neo4j_user", "neo4j"),64 self.config.get("neo4j_password", "sovereign")65 )66 )67 except Exception:68 # Graceful degradation: operate as pure in-memory graph69 self._neo4j_driver = None7071 def add_node(self, label: str, content: str, source_document_id: str,72 confidence: float = 1.0, metadata: Optional[Dict] = None) -> KGNode:73 node = KGNode(74 id=str(uuid.uuid4()),75 label=label,76 content=content,77 source_document_id=source_document_id,78 confidence=confidence,79 metadata=metadata or {}80 )81 self.nodes[node.id] = node82 self.in_memory_graph.add_node(83 node.id,84 label=label,85 content=content,86 confidence=confidence87 )88 if self._neo4j_driver:89 self._persist_node_to_neo4j(node)90 return node9192 def add_edge(self, source_id: str, target_id: str, relationship: str,93 weight: float = 1.0, established_by: str = "system") -> Optional[KGEdge]:94 if source_id not in self.nodes or target_id not in self.nodes:95 return None96 edge = KGEdge(97 id=str(uuid.uuid4()),98 source_id=source_id,99 target_id=target_id,100 relationship=relationship,101 weight=weight,102 established_by=established_by103 )104 self.edges.append(edge)105 self.in_memory_graph.add_edge(106 source_id, target_id,107 relationship=relationship,108 weight=weight109 )110 if self._neo4j_driver:111 self._persist_edge_to_neo4j(edge)112 return edge113114 def find_reasoning_path(self, source_id: str, target_id: str,115 relationship_filter: Optional[List[str]] = None) -> List[KGNode]:116 """117 Find an explicit, auditable reasoning path between two nodes.118119 This is not similarity search. This is structured inference.120 The path returned is a chain of evidence, not a probability distribution.121 """122 try:123 path_ids = nx.shortest_path(self.in_memory_graph, source_id, target_id)124 path_nodes = [self.nodes[nid] for nid in path_ids if nid in self.nodes]125 if relationship_filter:126 # Filter edges along the path to the specified relationship types127 path_nodes = self._filter_path_by_relationship(path_ids, relationship_filter)128 # Update access counts — the memory knows it has been used129 for node in path_nodes:130 node.access_count += 1131 node.last_accessed_at = datetime.utcnow().isoformat()132 return path_nodes133 except (nx.NetworkXNoPath, nx.NodeNotFound):134 return []135136 def apply_temporal_decay(self, decay_factor: float = 0.95):137 """138 Apply temporal decay to all node confidence scores.139140 Design philosophy: memory that is never accessed should fade.141 The system forgets gracefully, not catastrophically.142 Forgetting is not failure. It is discernment.143 """144 for node in self.nodes.values():145 if node.last_accessed_at is None:146 node.confidence *= decay_factor147 node.confidence = max(0.01, node.confidence)148149 def prune_low_confidence_nodes(self, threshold: float = 0.1) -> List[str]:150 """151 Remove nodes whose confidence has decayed below the threshold.152 Returns list of pruned node IDs for audit logging.153154 What is pruned is not destroyed — it is archived.155 Sovereignty includes the right to forget deliberately.156 """157 pruned_ids = []158 nodes_to_prune = [159 nid for nid, node in self.nodes.items()160 if node.confidence < threshold161 ]162 for nid in nodes_to_prune:163 self.in_memory_graph.remove_node(nid)164 pruned_ids.append(nid)165 del self.nodes[nid]166 return pruned_ids167168 def export_subgraph(self, node_ids: List[str]) -> Dict[str, Any]:169 """Export a subgraph for inspection, audit, or external analysis."""170 subgraph_nodes = {nid: self.nodes[nid] for nid in node_ids if nid in self.nodes}171 subgraph_edges = [172 e for e in self.edges173 if e.source_id in node_ids and e.target_id in node_ids174 ]175 return {176 "nodes": [vars(n) for n in subgraph_nodes.values()],177 "edges": [vars(e) for e in subgraph_edges],178 "exported_at": datetime.utcnow().isoformat()179 }180181 def _persist_node_to_neo4j(self, node: KGNode):182 with self._neo4j_driver.session() as session:183 session.run(184 "MERGE (n:Node {id: $id}) "185 "SET n.label = $label, n.content = $content, "186 "n.source_document_id = $source_document_id, "187 "n.confidence = $confidence, n.created_at = $created_at",188 id=node.id, label=node.label, content=node.content,189 source_document_id=node.source_document_id,190 confidence=node.confidence, created_at=node.created_at191 )192193 def _persist_edge_to_neo4j(self, edge: KGEdge):194 with self._neo4j_driver.session() as session:195 session.run(196 "MATCH (a:Node {id: $source_id}), (b:Node {id: $target_id}) "197 f"MERGE (a)-[r:{edge.relationship} {{id: $edge_id}}]->(b) "198 "SET r.weight = $weight, r.established_by = $established_by",199 source_id=edge.source_id, target_id=edge.target_id,200 edge_id=edge.id, weight=edge.weight,201 established_by=edge.established_by202 )203204 def _filter_path_by_relationship(self, path_ids: List[str],205 allowed_relationships: List[str]) -> List[KGNode]:206 filtered = []207 for i in range(len(path_ids) - 1):208 edge_data = self.in_memory_graph.get_edge_data(path_ids[i], path_ids[i + 1])209 if edge_data and edge_data.get("relationship") in allowed_relationships:210 if path_ids[i] in self.nodes:211 filtered.append(self.nodes[path_ids[i]])212 return filtered
3.2 The Vector Store Integration
The vector store (ChromaDB in development, Qdrant in production) handles the similarity retrieval that the knowledge graph cannot: dense semantic search across large document corpora where the exact relational structure is not yet known.
The critical design decision here is that the vector store feeds the knowledge graph, not the other way around. Vector retrieval surfaces candidate documents. The knowledge graph determines how those documents relate to each other and to the current query context. The vector store is a search index. The knowledge graph is the mind.
python1# sovereign/memory/vector_store.py23from typing import List, Dict, Any, Optional4import chromadb5from chromadb.config import Settings678class SovereignVectorStore:9 """10 Local-first vector store with zero cloud dependency.1112 ChromaDB in development (file-backed, no server required).13 Qdrant in production (local server, same guarantee).1415 The embeddings are yours. The index is yours.16 Nothing is sent to an external endpoint.17 """1819 def __init__(self, config: Dict[str, Any]):20 self.persist_directory = config.get("persist_directory", "./data/chromadb")21 self.collection_name = config.get("collection_name", "sovereign_documents")22 self.embedding_model = config.get("embedding_model", "nomic-embed-text")2324 # File-backed persistence: data survives restarts on your hardware25 self.client = chromadb.PersistentClient(26 path=self.persist_directory,27 settings=Settings(anonymized_telemetry=False) # Explicit: no telemetry28 )29 self.collection = self.client.get_or_create_collection(30 name=self.collection_name,31 metadata={"hnsw:space": "cosine"}32 )3334 def embed_and_store(self, documents: List[Dict[str, Any]]) -> List[str]:35 """36 Embed documents and persist to local vector store.37 Returns document IDs for graph node linkage.38 """39 doc_ids = []40 for doc in documents:41 doc_id = doc.get("id", str(uuid.uuid4()))42 self.collection.add(43 documents=[doc["content"]],44 metadatas=[{45 "source": doc.get("source", "unknown"),46 "doc_type": doc.get("doc_type", "text"),47 "created_at": datetime.utcnow().isoformat(),48 "provenance": doc.get("provenance", "")49 }],50 ids=[doc_id]51 )52 doc_ids.append(doc_id)53 return doc_ids5455 def query(self, query_text: str, n_results: int = 10,56 where_filter: Optional[Dict] = None) -> List[Dict[str, Any]]:57 """58 Semantic search over local embeddings.59 Returns results with full provenance metadata.60 """61 results = self.collection.query(62 query_texts=[query_text],63 n_results=n_results,64 where=where_filter,65 include=["documents", "metadatas", "distances"]66 )67 return [68 {69 "id": results["ids"][0][i],70 "content": results["documents"][0][i],71 "metadata": results["metadatas"][0][i],72 "relevance_score": 1.0 - results["distances"][0][i]73 }74 for i in range(len(results["ids"][0]))75 ]
IV. The Inference Layer: Local Execution, Zero Dependency
The inference layer is non-negotiable. It is the foundation of every sovereignty guarantee in the system. If inference is remote, the entire stack is a thin wrapper over someone else's infrastructure. Sovereignty is not a frontend feature. It begins at the model.
SOVEREIGN's inference layer supports three execution modes:
Mode 1: Ollama (Primary) — HTTP interface to locally served models. Fast, easy to configure, supports quantized variants of Llama, Qwen, Mistral, Phi, and Gemma families.
Mode 2: llama.cpp (Fallback/Air-Gap) — Direct binary execution. No server process. No HTTP overhead. Used when network interface is unacceptable (air-gapped environments, maximum-security deployments).
Mode 3: Hybrid — Different specialist agents use different models. The orchestrator routes to the fastest suitable model for the current task. Code tasks go to a code-optimized model. Long-context tasks go to a high-context-window model. All models are local.
python1# sovereign/inference/local_engine.py23from typing import Dict, Any, Optional, Generator4import requests5import subprocess6import json789class LocalInferenceEngine:10 """11 Unified interface to local model execution.1213 Design invariant: no request leaves this machine.14 The api_endpoint, even in Ollama mode, resolves to localhost.15 There is no fallback to a cloud endpoint.16 If local inference fails, the system fails loudly — not silently to the cloud.17 """1819 EXECUTION_MODES = ["ollama", "llama_cpp", "hybrid"]2021 def __init__(self, config: Dict[str, Any]):22 self.mode = config.get("execution_mode", "ollama")23 self.ollama_endpoint = config.get("ollama_endpoint", "http://localhost:11434")24 self.llama_cpp_binary = config.get("llama_cpp_binary", "./bin/llama-cli")25 self.model_registry = config.get("model_registry", {})26 self.default_model = config.get("default_model", "llama3.2")27 self.seed = config.get("seed", 42) # Reproducibility by default28 self.default_temperature = config.get("temperature", 0.1)2930 self._validate_local_availability()3132 def _validate_local_availability(self):33 """34 Refuse to initialize if no local inference backend is reachable.3536 This is a hard failure, not a warning.37 Failing loudly protects sovereignty — a silent fallback would not.38 """39 if self.mode in ("ollama", "hybrid"):40 try:41 response = requests.get(f"{self.ollama_endpoint}/api/tags", timeout=5)42 response.raise_for_status()43 except Exception as e:44 raise RuntimeError(45 f"SOVEREIGN requires local inference. Ollama is not reachable at "46 f"{self.ollama_endpoint}. Start Ollama with `ollama serve` and retry.\n"47 f"Original error: {e}"48 )4950 def generate(self, prompt: str, system_prompt: str = "",51 model: Optional[str] = None, temperature: Optional[float] = None,52 max_tokens: int = 2000, seed: Optional[int] = None) -> str:53 """54 Generate a response from the local model.55 Returns the complete response text.56 """57 effective_model = model or self.default_model58 effective_temperature = temperature if temperature is not None else self.default_temperature59 effective_seed = seed if seed is not None else self.seed6061 if self.mode == "ollama":62 return self._generate_ollama(63 prompt, system_prompt, effective_model,64 effective_temperature, max_tokens, effective_seed65 )66 elif self.mode == "llama_cpp":67 return self._generate_llama_cpp(68 prompt, system_prompt, effective_model,69 effective_temperature, max_tokens70 )71 else:72 raise ValueError(f"Unknown execution mode: {self.mode}")7374 def generate_stream(self, prompt: str, system_prompt: str = "",75 model: Optional[str] = None) -> Generator[str, None, None]:76 """77 Stream tokens from local inference for real-time UI updates.78 Every token comes from your hardware.79 """80 effective_model = model or self.default_model81 payload = {82 "model": effective_model,83 "messages": [84 {"role": "system", "content": system_prompt},85 {"role": "user", "content": prompt}86 ],87 "options": {"temperature": self.default_temperature, "seed": self.seed},88 "stream": True89 }90 with requests.post(91 f"{self.ollama_endpoint}/api/chat",92 json=payload,93 stream=True,94 timeout=12095 ) as response:96 for line in response.iter_lines():97 if line:98 chunk = json.loads(line)99 if not chunk.get("done"):100 yield chunk.get("message", {}).get("content", "")101102 def route_to_specialist(self, task_type: str, prompt: str,103 system_prompt: str = "") -> str:104 """105 Route to the best local model for the given task type.106107 The routing table is yours. You decide which model handles what.108 The routing logic is explicit, auditable, and modifiable.109 """110 routing_table = self.model_registry.get("routing", {})111 specialist_model = routing_table.get(task_type, self.default_model)112 return self.generate(prompt, system_prompt, model=specialist_model)113114 def _generate_ollama(self, prompt: str, system_prompt: str, model: str,115 temperature: float, max_tokens: int, seed: int) -> str:116 payload = {117 "model": model,118 "messages": [119 {"role": "system", "content": system_prompt or "You are a helpful, precise assistant."},120 {"role": "user", "content": prompt}121 ],122 "options": {123 "temperature": temperature,124 "seed": seed,125 "num_predict": max_tokens126 },127 "stream": False128 }129 response = requests.post(130 f"{self.ollama_endpoint}/api/chat",131 json=payload,132 timeout=120133 )134 response.raise_for_status()135 return response.json()["message"]["content"]136137 def _generate_llama_cpp(self, prompt: str, system_prompt: str, model: str,138 temperature: float, max_tokens: int) -> str:139 model_path = self.model_registry.get("paths", {}).get(model, model)140 full_prompt = f"<|system|>{system_prompt}<|user|>{prompt}<|assistant|>"141 result = subprocess.run(142 [143 self.llama_cpp_binary,144 "-m", model_path,145 "-p", full_prompt,146 "--temp", str(temperature),147 "-n", str(max_tokens),148 "--silent-prompt",149 "--no-display-prompt"150 ],151 capture_output=True, text=True, timeout=300152 )153 if result.returncode != 0:154 raise RuntimeError(f"llama.cpp execution failed: {result.stderr}")155 return result.stdout.strip()
V. The Persona Engine: Identity as a First-Class Data Structure
Every prior system I have built has wrestled with the same question: what is an AI persona, exactly? In corporate systems, it is a system prompt — a string of text injected at the top of the context window, ephemeral, invisible, unversioned, unauditable. You accept it as a default and interact with a character whose values you did not choose.
In SOVEREIGN, a persona is a typed, versioned, evolvable data structure with a complete lifecycle. It has traits (numeric weights that shape how the reasoning engine processes queries), expertise domains (which determine routing priority), an activation cost (used by the MoE orchestrator to balance resource allocation), and a performance history (updated by the evaluation loop after every query).
The persona is not the model. The model is a reasoning engine. The persona is a constraint vector applied to that engine. You can have dozens of personas sharing a single model instance. You can swap personas without changing the model. You can evolve a persona's trait weights based on its performance without retraining anything. The separation is total.
python1# sovereign/reasoning/persona_engine.py23from dataclasses import dataclass, field4from typing import Dict, List, Optional, Any5from datetime import datetime6import json7import os8import uuid91011@dataclass12class PersonaTrait:13 name: str14 weight: float # 0.0 to 1.015 description: str16 evolution_rate: float = 0.05 # How quickly this trait responds to feedback171819@dataclass20class PersonaPerformance:21 total_queries: int = 022 total_score: float = 0.023 last_used: Optional[str] = None24 success_rate: float = 0.025 domain_scores: Dict[str, float] = field(default_factory=dict)2627 @property28 def average_score(self) -> float:29 if self.total_queries == 0:30 return 0.031 return self.total_score / self.total_queries323334@dataclass35class Persona:36 """37 A sovereign persona: fully owned, fully auditable, fully evolvable.3839 This is not a system prompt. It is a data structure with history,40 with traits that evolve according to rules you define,41 with performance metrics that you evaluate,42 and with a lifecycle that you control.43 """44 id: str45 name: str46 description: str47 traits: Dict[str, PersonaTrait]48 expertise: List[str]49 activation_cost: float = 0.350 status: str = "experimental" # experimental → active → stable → pruned51 version: int = 152 created_at: str = field(default_factory=lambda: datetime.utcnow().isoformat())53 updated_at: Optional[str] = None54 performance: PersonaPerformance = field(default_factory=PersonaPerformance)55 evolution_log: List[Dict[str, Any]] = field(default_factory=list)56 system_prompt_template: str = ""5758 def get_system_prompt(self, context: str = "") -> str:59 """Generate the system prompt from trait weights and context."""60 trait_descriptions = []61 for trait_name, trait in self.traits.items():62 if trait.weight > 0.6:63 trait_descriptions.append(f"strong {trait_name.replace('_', ' ')}")64 elif trait.weight > 0.3:65 trait_descriptions.append(f"moderate {trait_name.replace('_', ' ')}")6667 trait_string = ", ".join(trait_descriptions) if trait_descriptions else "balanced reasoning"68 return (69 f"You are {self.name}. {self.description} "70 f"Your reasoning is characterized by: {trait_string}. "71 f"Your areas of expertise are: {', '.join(self.expertise)}. "72 f"{self.system_prompt_template} "73 f"{f'Current context: {context}' if context else ''}"74 ).strip()7576 def apply_bounded_update(self, feedback_vector: Dict[str, float]) -> Dict[str, Any]:77 """78 Apply the bounded update function: Δw = f(feedback) × (1 − w)7980 The (1 − w) term ensures convergence — high-weight traits resist81 extreme changes. This prevents runaway specialization.82 Stability is a design feature, not a constraint.83 """84 evolution_entry = {85 "timestamp": datetime.utcnow().isoformat(),86 "version": self.version,87 "changes": []88 }8990 for trait_name, trait in self.traits.items():91 feedback_value = feedback_vector.get(trait_name, 0.0)92 delta = feedback_value * trait.evolution_rate * (1.0 - trait.weight)93 new_weight = max(0.0, min(1.0, trait.weight + delta))9495 evolution_entry["changes"].append({96 "trait": trait_name,97 "from": trait.weight,98 "to": new_weight,99 "delta": new_weight - trait.weight,100 "feedback": feedback_value101 })102 trait.weight = new_weight103104 self.version += 1105 self.updated_at = datetime.utcnow().isoformat()106 self.evolution_log.append(evolution_entry)107 return evolution_entry108109110class PersonaEngine:111 """112 Manages the complete lifecycle of sovereign personas.113114 Active → Stable → Pruned → Cold Storage → Recalled.115 The lifecycle is yours to govern.116 Nothing is deleted without your explicit instruction.117 Cold storage preserves everything for potential recall.118 """119120 LIFECYCLE_STATES = ["experimental", "active", "stable", "pruned"]121 PERSONAS_DIR = "./data/personas"122123 def __init__(self, config: Dict[str, Any]):124 self.config = config125 self.active_personas: Dict[str, Persona] = {}126 self.cold_storage: Dict[str, Persona] = {}127 self.personas_dir = config.get("personas_dir", self.PERSONAS_DIR)128 self._ensure_directory_structure()129 self._load_active_personas()130131 def _ensure_directory_structure(self):132 for state in self.LIFECYCLE_STATES:133 os.makedirs(os.path.join(self.personas_dir, state), exist_ok=True)134 os.makedirs(os.path.join(self.personas_dir, "cold_storage"), exist_ok=True)135136 def _load_active_personas(self):137 for state in ["experimental", "active", "stable"]:138 state_dir = os.path.join(self.personas_dir, state)139 for fname in os.listdir(state_dir):140 if fname.endswith(".json"):141 with open(os.path.join(state_dir, fname)) as f:142 data = json.load(f)143 persona = self._deserialize_persona(data)144 self.active_personas[persona.id] = persona145146 def route_to_persona(self, query: str, query_domain: str) -> List[Persona]:147 """148 Select the best personas for the current query using multi-factor routing.149150 Routing considers: domain expertise match, activation cost,151 historical performance in the query domain, and current lifecycle state.152 Only stable and active personas participate in production routing.153 """154 candidates = [155 p for p in self.active_personas.values()156 if p.status in ("active", "stable")157 ]158159 scored_candidates = []160 for persona in candidates:161 domain_match = 1.0 if query_domain in persona.expertise else 0.3162 historical_score = persona.performance.domain_scores.get(query_domain, 0.5)163 cost_penalty = 1.0 - persona.activation_cost164 composite_score = (165 0.4 * domain_match +166 0.4 * historical_score +167 0.2 * cost_penalty168 )169 scored_candidates.append((persona, composite_score))170171 scored_candidates.sort(key=lambda x: x[1], reverse=True)172 max_parallel = self.config.get("max_parallel_personas", 3)173 return [p for p, _ in scored_candidates[:max_parallel]]174175 def prune_persona(self, persona_id: str, reason: str = "performance_threshold") -> bool:176 """177 Retire a persona to cold storage. Not deletion — archival.178 The persona's full history is preserved.179 The reason is logged.180 It can be recalled if context warrants.181 """182 if persona_id not in self.active_personas:183 return False184185 persona = self.active_personas[persona_id]186 persona.status = "pruned"187 persona.updated_at = datetime.utcnow().isoformat()188 persona.evolution_log.append({189 "timestamp": datetime.utcnow().isoformat(),190 "event": "pruned",191 "reason": reason192 })193194 self.cold_storage[persona_id] = persona195 del self.active_personas[persona_id]196 self._save_persona_to_state(persona, "cold_storage")197 return True198199 def recall_persona(self, persona_id: str, query_context: str) -> Optional[Persona]:200 """201 Attempt to recall a pruned persona based on current query context.202203 The system asks: is this dormant knowledge relevant again?204 If yes, it is restored. If no, it remains dormant.205 The question is explicit. The answer is auditable.206 """207 if persona_id not in self.cold_storage:208 return None209210 persona = self.cold_storage[persona_id]211 # Compute context relevance by checking domain overlap212 query_terms = set(query_context.lower().split())213 expertise_terms = set(" ".join(persona.expertise).lower().split())214 overlap = len(query_terms & expertise_terms) / max(len(expertise_terms), 1)215216 recall_threshold = self.config.get("recall_threshold", 0.3)217 if overlap >= recall_threshold:218 persona.status = "active"219 persona.updated_at = datetime.utcnow().isoformat()220 persona.evolution_log.append({221 "timestamp": datetime.utcnow().isoformat(),222 "event": "recalled",223 "context_overlap": overlap224 })225 self.active_personas[persona_id] = persona226 del self.cold_storage[persona_id]227 return persona228 return None229230 def _deserialize_persona(self, data: Dict[str, Any]) -> Persona:231 traits = {232 k: PersonaTrait(**v) if isinstance(v, dict) else PersonaTrait(233 name=k, weight=float(v), description="", evolution_rate=0.05234 )235 for k, v in data.get("traits", {}).items()236 }237 performance_data = data.get("performance", {})238 performance = PersonaPerformance(239 total_queries=performance_data.get("total_queries", 0),240 total_score=performance_data.get("total_score", 0.0),241 last_used=performance_data.get("last_used"),242 success_rate=performance_data.get("success_rate", 0.0),243 domain_scores=performance_data.get("domain_scores", {})244 )245 return Persona(246 id=data.get("id", str(uuid.uuid4())),247 name=data["name"],248 description=data.get("description", ""),249 traits=traits,250 expertise=data.get("expertise", []),251 activation_cost=data.get("activation_cost", 0.3),252 status=data.get("status", "experimental"),253 version=data.get("version", 1),254 created_at=data.get("created_at", datetime.utcnow().isoformat()),255 performance=performance,256 evolution_log=data.get("evolution_log", []),257 system_prompt_template=data.get("system_prompt_template", "")258 )259260 def _save_persona_to_state(self, persona: Persona, state: str):261 filepath = os.path.join(self.personas_dir, state, f"{persona.id}.json")262 with open(filepath, "w") as f:263 json.dump(vars(persona), f, indent=2, default=str)
VI. The Governance Layer: The Control Boundary Engine
The Control Boundary Engine is the system's conscience. It runs on every request. It cannot be bypassed. It evaluates intent before execution, scores outputs after generation, and emits a complete audit trail that satisfies enterprise governance requirements including the Colorado AI Act's "Reasonable Care" standard.
In corporate AI, governance is a post-hoc appendage — a feedback button, a content moderation layer, a logging system bolted onto the side of the architecture after the fact. In SOVEREIGN, governance is embedded in the execution path. You cannot get a response without passing through the evaluation loop. You cannot update a persona without logging the change. You cannot prune a knowledge graph node without recording the decision.
This is not compliance theater. It is the architecture of a system that answers to you.
python1# sovereign/governance/control_boundary.py23from dataclasses import dataclass, field4from typing import Dict, Any, Optional, List5from datetime import datetime6from enum import Enum7import uuid8910class IntentCategory(Enum):11 INFORMATIONAL = "informational"12 GENERATIVE = "generative"13 ANALYTICAL = "analytical"14 EXECUTABLE = "executable" # Triggers higher governance scrutiny15 ADMINISTRATIVE = "administrative" # System modification — maximum scrutiny161718class GovernanceDecision(Enum):19 PROCEED = "proceed"20 PROCEED_WITH_LOGGING = "proceed_with_logging"21 REQUIRE_CONFIRMATION = "require_confirmation"22 BLOCK = "block"232425@dataclass26class ControlBoundaryResult:27 request_id: str28 intent_category: IntentCategory29 governance_decision: GovernanceDecision30 risk_score: float # 0.0 (benign) to 1.0 (high risk)31 justification: str32 audit_record: Dict[str, Any]33 timestamp: str = field(default_factory=lambda: datetime.utcnow().isoformat())34 passed: bool = True353637@dataclass38class OutputEvaluationResult:39 request_id: str40 grounding_score: float # How well anchored to source documents41 coherence_score: float # Internal logical consistency42 coverage_score: float # Query completeness43 hallucination_penalty: float # Detected confabulation44 composite_score: float # Weighted aggregate45 flagged_claims: List[str] # Claims requiring provenance verification46 audit_record: Dict[str, Any]47 timestamp: str = field(default_factory=lambda: datetime.utcnow().isoformat())484950class ControlBoundaryEngine:51 """52 The governance conscience of SOVEREIGN.5354 Every request passes through here before execution.55 Every output passes through here before delivery.56 The audit trail is complete, immutable, and yours.5758 This is not a security layer. It is an accountability layer.59 The distinction matters: security prevents bad actors.60 Accountability ensures the system answers to you.61 """6263 def __init__(self, config: Dict[str, Any]):64 self.config = config65 self.audit_log_path = config.get("audit_log_path", "./logs/audit.jsonl")66 self.risk_thresholds = config.get("risk_thresholds", {67 "block": 0.9,68 "require_confirmation": 0.7,69 "enhanced_logging": 0.470 })71 self._init_audit_log()7273 def _init_audit_log(self):74 import os75 os.makedirs(os.path.dirname(self.audit_log_path), exist_ok=True)7677 def evaluate_request(self, query: str, session_id: str,78 user_context: Dict[str, Any]) -> ControlBoundaryResult:79 """80 Phase 1: Evaluate intent before execution.8182 The system asks itself: what is this request trying to do?83 Is the intent aligned with the configured governance policy?84 What level of scrutiny does this request warrant?85 """86 request_id = str(uuid.uuid4())87 intent_category = self._classify_intent(query)88 risk_score = self._compute_risk_score(query, intent_category, user_context)89 governance_decision = self._make_governance_decision(risk_score, intent_category)9091 justification = self._generate_justification(92 intent_category, risk_score, governance_decision93 )9495 audit_record = {96 "request_id": request_id,97 "session_id": session_id,98 "query_hash": hash(query), # Hash, not raw query — privacy-preserving audit99 "intent_category": intent_category.value,100 "risk_score": risk_score,101 "governance_decision": governance_decision.value,102 "justification": justification,103 "timestamp": datetime.utcnow().isoformat()104 }105106 self._append_to_audit_log(audit_record)107108 return ControlBoundaryResult(109 request_id=request_id,110 intent_category=intent_category,111 governance_decision=governance_decision,112 risk_score=risk_score,113 justification=justification,114 audit_record=audit_record,115 passed=(governance_decision != GovernanceDecision.BLOCK)116 )117118 def evaluate_output(self, output: str, source_nodes: List[Dict],119 query: str, request_id: str) -> OutputEvaluationResult:120 """121 Phase 2: Evaluate output before delivery.122123 The system asks: is this response grounded in evidence?124 Does it make claims that cannot be traced to source documents?125 Is it coherent? Is it complete relative to the query?126127 This is the architectural answer to hallucination.128 Not a post-hoc filter — an embedded evaluation.129 """130 grounding_score = self._compute_grounding_score(output, source_nodes)131 coherence_score = self._compute_coherence_score(output)132 coverage_score = self._compute_coverage_score(output, query)133 hallucination_penalty = self._detect_hallucinations(output, source_nodes)134 flagged_claims = self._extract_flagged_claims(output, source_nodes)135136 composite_score = (137 0.35 * grounding_score +138 0.30 * coherence_score +139 0.25 * coverage_score -140 0.10 * hallucination_penalty141 )142 composite_score = max(0.0, min(1.0, composite_score))143144 audit_record = {145 "request_id": request_id,146 "grounding_score": grounding_score,147 "coherence_score": coherence_score,148 "coverage_score": coverage_score,149 "hallucination_penalty": hallucination_penalty,150 "composite_score": composite_score,151 "flagged_claims_count": len(flagged_claims),152 "timestamp": datetime.utcnow().isoformat()153 }154 self._append_to_audit_log(audit_record)155156 return OutputEvaluationResult(157 request_id=request_id,158 grounding_score=grounding_score,159 coherence_score=coherence_score,160 coverage_score=coverage_score,161 hallucination_penalty=hallucination_penalty,162 composite_score=composite_score,163 flagged_claims=flagged_claims,164 audit_record=audit_record165 )166167 def _classify_intent(self, query: str) -> IntentCategory:168 query_lower = query.lower()169 if any(k in query_lower for k in ["delete", "modify", "update", "configure", "install"]):170 return IntentCategory.ADMINISTRATIVE171 if any(k in query_lower for k in ["execute", "run", "deploy", "create file", "write to"]):172 return IntentCategory.EXECUTABLE173 if any(k in query_lower for k in ["analyze", "compare", "evaluate", "assess"]):174 return IntentCategory.ANALYTICAL175 if any(k in query_lower for k in ["write", "generate", "create", "draft", "produce"]):176 return IntentCategory.GENERATIVE177 return IntentCategory.INFORMATIONAL178179 def _compute_risk_score(self, query: str, intent: IntentCategory,180 context: Dict[str, Any]) -> float:181 base_scores = {182 IntentCategory.INFORMATIONAL: 0.1,183 IntentCategory.GENERATIVE: 0.3,184 IntentCategory.ANALYTICAL: 0.2,185 IntentCategory.EXECUTABLE: 0.6,186 IntentCategory.ADMINISTRATIVE: 0.8187 }188 return base_scores.get(intent, 0.5)189190 def _make_governance_decision(self, risk_score: float,191 intent: IntentCategory) -> GovernanceDecision:192 if risk_score >= self.risk_thresholds["block"]:193 return GovernanceDecision.BLOCK194 if risk_score >= self.risk_thresholds["require_confirmation"]:195 return GovernanceDecision.REQUIRE_CONFIRMATION196 if risk_score >= self.risk_thresholds["enhanced_logging"]:197 return GovernanceDecision.PROCEED_WITH_LOGGING198 return GovernanceDecision.PROCEED199200 def _compute_grounding_score(self, output: str,201 source_nodes: List[Dict]) -> float:202 if not source_nodes:203 return 0.0204 source_terms = set()205 for node in source_nodes:206 content = node.get("content", "")207 source_terms.update(content.lower().split())208 output_terms = set(output.lower().split())209 overlap = len(output_terms & source_terms)210 return min(1.0, overlap / max(len(output_terms), 1) * 3.0)211212 def _compute_coherence_score(self, output: str) -> float:213 sentences = [s.strip() for s in output.split(".") if s.strip()]214 if len(sentences) < 2:215 return 1.0216 return min(1.0, 0.5 + (len(sentences) / 20.0))217218 def _compute_coverage_score(self, output: str, query: str) -> float:219 query_terms = set(query.lower().split())220 output_text = output.lower()221 covered = sum(1 for term in query_terms if term in output_text)222 return covered / max(len(query_terms), 1)223224 def _detect_hallucinations(self, output: str,225 source_nodes: List[Dict]) -> float:226 specific_claims = [227 word for word in output.split()228 if word.replace(",", "").replace(".", "").isdigit()229 or (len(word) > 2 and word[0].isupper())230 ]231 if not specific_claims or not source_nodes:232 return 0.0233 source_content = " ".join(n.get("content", "") for n in source_nodes).lower()234 ungrounded = sum(235 1 for claim in specific_claims236 if claim.lower() not in source_content237 )238 return min(1.0, ungrounded / max(len(specific_claims), 1))239240 def _extract_flagged_claims(self, output: str,241 source_nodes: List[Dict]) -> List[str]:242 source_content = " ".join(n.get("content", "") for n in source_nodes).lower()243 sentences = [s.strip() for s in output.split(".") if s.strip()]244 flagged = []245 for sentence in sentences:246 key_terms = [w for w in sentence.split() if len(w) > 5]247 if key_terms and not any(t.lower() in source_content for t in key_terms):248 flagged.append(sentence)249 return flagged[:5] # Return top 5 flagged sentences250251 def _generate_justification(self, intent: IntentCategory,252 risk_score: float,253 decision: GovernanceDecision) -> str:254 return (255 f"Intent classified as {intent.value} with risk score {risk_score:.2f}. "256 f"Governance decision: {decision.value}. "257 f"Threshold configuration: block={self.risk_thresholds['block']}, "258 f"confirm={self.risk_thresholds['require_confirmation']}."259 )260261 def _append_to_audit_log(self, record: Dict[str, Any]):262 import json263 with open(self.audit_log_path, "a") as f:264 f.write(json.dumps(record) + "\n")
VII. The Orchestration Layer: MoE Routing and Agent Swarms
The MoE orchestrator is the brain of SOVEREIGN's execution path. It receives a query from the API gateway, consults the governance layer for clearance, routes to the persona engine for specialist selection, dispatches parallel persona commentary passes against the knowledge graph, aggregates results through a multi-dimensional evaluation function, and returns a synthesized response with a full execution trace.
This is not a chain. It is a graph. Execution can be parallel, recursive, or branching depending on query complexity and persona routing decisions.
python1# sovereign/orchestration/moe_orchestrator.py23from typing import Dict, List, Any, Optional4from datetime import datetime5import asyncio6import uuid78from sovereign.reasoning.persona_engine import PersonaEngine, Persona9from sovereign.memory.knowledge_graph import SovereignKnowledgeGraph10from sovereign.memory.vector_store import SovereignVectorStore11from sovereign.inference.local_engine import LocalInferenceEngine12from sovereign.governance.control_boundary import ControlBoundaryEngine, GovernanceDecision131415class MoEOrchestrator:16 """17 The Mixture-of-Experts orchestrator for SOVEREIGN.1819 Routes queries to specialist personas, executes parallel20 commentary passes, aggregates results through multi-dimensional21 evaluation, and returns synthesized responses with full execution traces.2223 Every execution is reproducible.24 Every routing decision is logged.25 Every persona contribution is attributed.26 """2728 def __init__(self, config: Dict[str, Any]):29 self.config = config30 self.persona_engine = PersonaEngine(config.get("persona_config", {}))31 self.knowledge_graph = SovereignKnowledgeGraph(config.get("graph_config", {}))32 self.vector_store = SovereignVectorStore(config.get("vector_config", {}))33 self.inference_engine = LocalInferenceEngine(config.get("inference_config", {}))34 self.governance = ControlBoundaryEngine(config.get("governance_config", {}))3536 def execute(self, query: str, session_id: str,37 user_context: Optional[Dict[str, Any]] = None) -> Dict[str, Any]:38 """39 Full orchestration pipeline.4041 Phase 1: Governance pre-check42 Phase 2: Context retrieval (vector + graph)43 Phase 3: Persona routing44 Phase 4: Parallel persona commentary passes45 Phase 5: Aggregation and synthesis46 Phase 6: Governance post-check47 Phase 7: Persona evolution update48 Phase 8: Return with full execution trace49 """50 execution_trace = {51 "execution_id": str(uuid.uuid4()),52 "query": query,53 "session_id": session_id,54 "started_at": datetime.utcnow().isoformat(),55 "phases": []56 }5758 # ── Phase 1: Governance Pre-Check ────────────────────────────────────────59 governance_result = self.governance.evaluate_request(60 query, session_id, user_context or {}61 )62 execution_trace["phases"].append({63 "phase": "governance_precheck",64 "result": governance_result.audit_record65 })6667 if not governance_result.passed:68 return self._build_blocked_response(query, governance_result, execution_trace)6970 # ── Phase 2: Context Retrieval ────────────────────────────────────────────71 vector_results = self.vector_store.query(query, n_results=10)72 query_domain = self._infer_domain(query, vector_results)7374 # Build query-scoped graph from retrieved documents75 source_node_ids = self._build_query_graph(query, vector_results)76 execution_trace["phases"].append({77 "phase": "context_retrieval",78 "vector_results_count": len(vector_results),79 "graph_nodes_constructed": len(source_node_ids),80 "inferred_domain": query_domain81 })8283 # ── Phase 3: Persona Routing ──────────────────────────────────────────────84 activated_personas = self.persona_engine.route_to_persona(query, query_domain)85 execution_trace["phases"].append({86 "phase": "persona_routing",87 "activated_personas": [p.id for p in activated_personas],88 "persona_count": len(activated_personas)89 })9091 if not activated_personas:92 return self._build_no_persona_response(query, execution_trace)9394 # ── Phase 4: Parallel Persona Commentary ─────────────────────────────────95 persona_results = self._execute_persona_passes(96 query, activated_personas, vector_results, source_node_ids97 )98 execution_trace["phases"].append({99 "phase": "persona_commentary",100 "results_count": len(persona_results)101 })102103 # ── Phase 5: Aggregation and Synthesis ───────────────────────────────────104 aggregated_response = self._aggregate_and_synthesize(105 query, persona_results, vector_results106 )107 execution_trace["phases"].append({108 "phase": "aggregation",109 "composite_score": aggregated_response["evaluation_score"],110 "synthesis_length": len(aggregated_response["synthesis"])111 })112113 # ── Phase 6: Governance Post-Check ───────────────────────────────────────114 output_evaluation = self.governance.evaluate_output(115 aggregated_response["synthesis"],116 vector_results,117 query,118 governance_result.request_id119 )120 execution_trace["phases"].append({121 "phase": "governance_postcheck",122 "grounding_score": output_evaluation.grounding_score,123 "hallucination_penalty": output_evaluation.hallucination_penalty,124 "flagged_claims_count": len(output_evaluation.flagged_claims)125 })126127 # ── Phase 7: Persona Evolution ────────────────────────────────────────────128 self._update_persona_evolution(129 activated_personas, persona_results,130 aggregated_response["evaluation_score"], query_domain131 )132133 # ── Phase 8: Prune underperformers ───────────────────────────────────────134 self._run_pruning_cycle()135136 execution_trace["completed_at"] = datetime.utcnow().isoformat()137138 return {139 "response": aggregated_response["synthesis"],140 "evaluation": {141 "composite_score": aggregated_response["evaluation_score"],142 "grounding_score": output_evaluation.grounding_score,143 "coherence_score": output_evaluation.coherence_score,144 "hallucination_penalty": output_evaluation.hallucination_penalty145 },146 "provenance": {147 "source_documents": [r["metadata"].get("source") for r in vector_results[:5]],148 "activated_personas": [p.name for p in activated_personas],149 "flagged_claims": output_evaluation.flagged_claims150 },151 "execution_trace": execution_trace152 }153154 def _execute_persona_passes(self, query: str, personas: List[Persona],155 vector_results: List[Dict],156 source_node_ids: List[str]) -> List[Dict[str, Any]]:157 """Execute parallel persona commentary passes."""158 context = self._format_context_for_inference(vector_results)159 results = []160161 for persona in personas:162 start_time = datetime.utcnow()163 system_prompt = persona.get_system_prompt(context=query)164165 inference_prompt = (166 f"Based on the following context, provide your expert analysis:\n\n"167 f"CONTEXT:\n{context}\n\n"168 f"QUERY: {query}\n\n"169 f"Provide a detailed analysis from your perspective as {persona.name}. "170 f"Reference specific information from the context. "171 f"Identify key insights and any limitations in the available information."172 )173174 try:175 commentary = self.inference_engine.generate(176 inference_prompt, system_prompt, max_tokens=1500177 )178 latency_ms = (datetime.utcnow() - start_time).total_seconds() * 1000179180 results.append({181 "persona_id": persona.id,182 "persona_name": persona.name,183 "commentary": commentary,184 "relevance_score": self._score_relevance(commentary, query),185 "key_insights": self._extract_key_insights(commentary),186 "latency_ms": latency_ms,187 "success": True188 })189 except Exception as e:190 results.append({191 "persona_id": persona.id,192 "persona_name": persona.name,193 "commentary": "",194 "relevance_score": 0.0,195 "key_insights": [],196 "latency_ms": 0,197 "success": False,198 "error": str(e)199 })200201 return results202203 def _aggregate_and_synthesize(self, query: str, persona_results: List[Dict],204 vector_results: List[Dict]) -> Dict[str, Any]:205 """Synthesize persona commentaries into a unified response."""206 successful_results = [r for r in persona_results if r["success"]]207208 if not successful_results:209 return {"synthesis": "No successful persona passes completed.", "evaluation_score": 0.0}210211 synthesis_prompt = (212 "Synthesize the following expert analyses into a single, coherent response. "213 "Preserve the key insights from each perspective. "214 "Resolve contradictions explicitly. "215 "Be precise about what is known versus inferred.\n\n"216 )217218 for result in successful_results:219 synthesis_prompt += (220 f"### {result['persona_name']} Analysis:\n"221 f"{result['commentary']}\n\n"222 )223224 synthesis_prompt += f"\nQuery to address: {query}\n\nProvide a unified synthesis:"225226 synthesis = self.inference_engine.generate(227 synthesis_prompt,228 system_prompt="You are a synthesis engine. Combine multiple expert perspectives into clear, grounded analysis.",229 max_tokens=2000230 )231232 evaluation_score = self._evaluate_synthesis(233 [r["commentary"] for r in successful_results],234 [insight for r in successful_results for insight in r["key_insights"]],235 query236 )237238 return {"synthesis": synthesis, "evaluation_score": evaluation_score}239240 def _evaluate_synthesis(self, commentaries: List[str],241 insights: List[str], query: str) -> float:242 if not commentaries:243 return 0.0244245 coverage = min(1.0, len(insights) / max(len(query.split()), 1) * 2.0)246247 if len(commentaries) < 2:248 coherence = 1.0249 else:250 all_terms = [set(c.lower().split()) for c in commentaries]251 pairwise_overlaps = []252 for i in range(len(all_terms)):253 for j in range(i + 1, len(all_terms)):254 union = all_terms[i] | all_terms[j]255 intersection = all_terms[i] & all_terms[j]256 pairwise_overlaps.append(len(intersection) / max(len(union), 1))257 coherence = sum(pairwise_overlaps) / max(len(pairwise_overlaps), 1)258259 query_terms = set(query.lower().split())260 all_output = " ".join(commentaries).lower()261 relevance = sum(1 for t in query_terms if t in all_output) / max(len(query_terms), 1)262263 return 0.4 * coverage + 0.3 * coherence + 0.3 * relevance264265 def _build_query_graph(self, query: str,266 vector_results: List[Dict]) -> List[str]:267 """Construct a query-scoped knowledge graph from retrieved documents."""268 node_ids = []269 for result in vector_results:270 node = self.knowledge_graph.add_node(271 label="DOCUMENT",272 content=result["content"][:500],273 source_document_id=result["id"],274 confidence=result["relevance_score"]275 )276 node_ids.append(node.id)277278 # Connect related documents279 for i in range(len(node_ids) - 1):280 self.knowledge_graph.add_edge(281 node_ids[i], node_ids[i + 1],282 relationship="RELATED_TO",283 weight=0.5,284 established_by="query_construction"285 )286 return node_ids287288 def _update_persona_evolution(self, personas: List[Persona],289 results: List[Dict],290 aggregate_score: float, domain: str):291 for persona in personas:292 persona_result = next(293 (r for r in results if r["persona_id"] == persona.id), None294 )295 if not persona_result:296 continue297298 individual_score = persona_result.get("relevance_score", aggregate_score)299 feedback_vector = {300 trait_name: individual_score301 for trait_name in persona.traits.keys()302 }303 persona.apply_bounded_update(feedback_vector)304305 persona.performance.total_queries += 1306 persona.performance.total_score += individual_score307 persona.performance.last_used = datetime.utcnow().isoformat()308 persona.performance.domain_scores[domain] = (309 persona.performance.domain_scores.get(domain, 0.5) * 0.8 +310 individual_score * 0.2311 )312 if individual_score >= 0.6:313 persona.performance.success_rate = (314 persona.performance.success_rate * 0.9 + 0.1315 )316317 def _run_pruning_cycle(self):318 """Retire consistently underperforming personas."""319 prune_threshold = self.config.get("prune_threshold", 0.3)320 for persona_id, persona in list(self.persona_engine.active_personas.items()):321 if (persona.performance.total_queries >= 10 and322 persona.performance.average_score < prune_threshold):323 self.persona_engine.prune_persona(324 persona_id, reason=f"average_score {persona.performance.average_score:.2f} below threshold {prune_threshold}"325 )326327 def _infer_domain(self, query: str, vector_results: List[Dict]) -> str:328 domain_keywords = {329 "code": ["function", "class", "algorithm", "implement", "debug", "code", "python", "typescript"],330 "research": ["analyze", "study", "evidence", "research", "paper", "data", "statistics"],331 "writing": ["write", "draft", "compose", "article", "blog", "narrative", "story"],332 "architecture": ["system", "design", "architecture", "infrastructure", "deploy", "scale"],333 "governance": ["compliance", "policy", "audit", "risk", "regulation", "governance"]334 }335 query_lower = query.lower()336 domain_scores = {}337 for domain, keywords in domain_keywords.items():338 domain_scores[domain] = sum(1 for kw in keywords if kw in query_lower)339 return max(domain_scores, key=domain_scores.get)340341 def _format_context_for_inference(self, vector_results: List[Dict]) -> str:342 context_parts = []343 for i, result in enumerate(vector_results[:5]):344 source = result["metadata"].get("source", "unknown")345 content = result["content"][:400]346 score = result["relevance_score"]347 context_parts.append(f"[Source {i+1}: {source} | Relevance: {score:.2f}]\n{content}")348 return "\n\n".join(context_parts)349350 def _score_relevance(self, commentary: str, query: str) -> float:351 query_terms = set(query.lower().split())352 commentary_terms = set(commentary.lower().split())353 return len(query_terms & commentary_terms) / max(len(query_terms), 1)354355 def _extract_key_insights(self, commentary: str) -> List[str]:356 sentences = [s.strip() for s in commentary.split(".") if len(s.strip()) > 40]357 return sentences[:3]358359 def _build_blocked_response(self, query: str, governance_result: Any,360 trace: Dict) -> Dict[str, Any]:361 return {362 "response": f"Request blocked by governance layer. Reason: {governance_result.justification}",363 "blocked": True,364 "governance_result": governance_result.audit_record,365 "execution_trace": trace366 }367368 def _build_no_persona_response(self, query: str, trace: Dict) -> Dict[str, Any]:369 return {370 "response": "No active personas available for this query domain. Review persona configuration.",371 "no_personas": True,372 "execution_trace": trace373 }
VIII. The SpecGen Module: Deterministic Code from Specification
One of the most powerful — and underutilized — components in the system is SpecGen: the deterministic code generation engine that produces production-ready implementations from structured technical specifications.
SpecGen was born from a frustration I could not resolve with vanilla LLM code generation: non-determinism. Given the same specification twice, most code generation systems will produce meaningfully different implementations. The patterns, the naming conventions, the error handling strategies, the test coverage — all of it varies with temperature and token sampling. This is fine for exploration. It is unacceptable for production infrastructure.
SpecGen solves this through three mechanisms: (1) a structured specification format that eliminates ambiguity before generation, (2) RAG-grounded generation that anchors output to your existing codebase patterns, and (3) a fixed-seed inference call that produces deterministic output given the same specification and context.
python1# sovereign/specgen/spec_generator.py23from dataclasses import dataclass, field4from typing import Dict, List, Optional, Any5import json6import hashlib789@dataclass10class ComponentSpec:11 """12 A fully specified component for deterministic code generation.1314 Ambiguity in the spec means ambiguity in the output.15 Every field is required because every field shapes the generated code.16 Underspecified components produce underspecified implementations.17 """18 name: str19 component_type: str # service, model, api_endpoint, utility, test, config20 language: str # python, typescript, sql, yaml, bash21 description: str22 inputs: List[Dict[str, str]] # [{name, type, description, required}]23 outputs: List[Dict[str, str]] # [{name, type, description}]24 dependencies: List[str] # Other component names this depends on25 constraints: List[str] # Explicit behavioral constraints26 error_handling: List[str] # Error cases and handling strategies27 test_scenarios: List[Dict] # [{name, given, when, then}]28 existing_patterns: List[str] # Code patterns from codebase to follow2930 @property31 def spec_hash(self) -> str:32 """Deterministic hash of the specification — same spec = same hash = same code."""33 spec_string = json.dumps(34 {k: v for k, v in vars(self).items() if k != "spec_hash"},35 sort_keys=True36 )37 return hashlib.sha256(spec_string.encode()).hexdigest()[:12]383940class SpecGenerator:41 """42 Deterministic code generation from structured specifications.4344 The key insight: LLM code generation is non-deterministic by default45 because the prompt is underspecified and the sampling is random.46 Remove the underspecification. Fix the seed.47 Now the generation is deterministic.4849 Your codebase is a corpus. New code should be grounded in existing patterns.50 SpecGen retrieves those patterns before generating.51 The result is code that looks like it was written by the same author52 as the rest of the codebase — because it was trained on the same corpus.53 """5455 def __init__(self, config: Dict[str, Any], vector_store, inference_engine):56 self.config = config57 self.vector_store = vector_store58 self.inference_engine = inference_engine59 self.generation_seed = config.get("generation_seed", 42)60 self.spec_cache: Dict[str, str] = {}6162 def generate_component(self, spec: ComponentSpec) -> Dict[str, Any]:63 """Generate a complete, production-ready component from specification."""6465 # Check spec cache — same spec always produces same code66 if spec.spec_hash in self.spec_cache:67 return {68 "code": self.spec_cache[spec.spec_hash],69 "spec_hash": spec.spec_hash,70 "cache_hit": True71 }7273 # Retrieve existing patterns from the codebase74 pattern_context = self._retrieve_existing_patterns(spec)7576 # Build deterministic generation prompt77 generation_prompt = self._build_generation_prompt(spec, pattern_context)78 system_prompt = self._build_system_prompt(spec)7980 # Generate with fixed seed for determinism81 generated_code = self.inference_engine.generate(82 generation_prompt,83 system_prompt=system_prompt,84 temperature=0.0, # Zero temperature: maximum determinism85 seed=self.generation_seed,86 max_tokens=300087 )8889 # Generate tests in a separate pass90 test_code = self._generate_tests(spec, generated_code, pattern_context)9192 result = {93 "component_name": spec.name,94 "component_type": spec.component_type,95 "language": spec.language,96 "spec_hash": spec.spec_hash,97 "implementation": generated_code,98 "tests": test_code,99 "dependencies": spec.dependencies,100 "cache_hit": False101 }102103 self.spec_cache[spec.spec_hash] = generated_code104 return result105106 def _retrieve_existing_patterns(self, spec: ComponentSpec) -> str:107 """Retrieve relevant code patterns from the existing codebase."""108 search_query = f"{spec.component_type} {spec.language} {' '.join(spec.existing_patterns[:3])}"109 results = self.vector_store.query(110 search_query,111 n_results=5,112 where_filter={"doc_type": "code"}113 )114 if not results:115 return "No existing patterns found in codebase."116 return "\n\n".join([117 f"# Pattern from {r['metadata'].get('source', 'unknown')}:\n{r['content']}"118 for r in results119 ])120121 def _build_generation_prompt(self, spec: ComponentSpec, pattern_context: str) -> str:122 return f"""Generate a production-ready {spec.language} {spec.component_type} named {spec.name}.123124SPECIFICATION:125- Description: {spec.description}126- Inputs: {json.dumps(spec.inputs, indent=2)}127- Outputs: {json.dumps(spec.outputs, indent=2)}128- Dependencies: {', '.join(spec.dependencies)}129- Constraints: {chr(10).join(f' - {c}' for c in spec.constraints)}130- Error handling: {chr(10).join(f' - {e}' for e in spec.error_handling)}131132EXISTING CODEBASE PATTERNS TO FOLLOW:133{pattern_context}134135Generate ONLY the implementation code. No preamble. No explanation. No markdown fences.136The code must be complete, typed, and production-ready."""137138 def _build_system_prompt(self, spec: ComponentSpec) -> str:139 language_instructions = {140 "python": "Use type hints, dataclasses, explicit error handling, and docstrings. Follow PEP 8.",141 "typescript": "Use strict TypeScript with explicit types. No `any`. Prefer interfaces over types for objects.",142 "sql": "Use explicit column names, proper indexes, and transactional safety.",143 }144 return (145 f"You are a senior software engineer generating production {spec.language} code. "146 f"{language_instructions.get(spec.language, '')} "147 f"Output ONLY valid {spec.language} code. No explanations."148 )149150 def _generate_tests(self, spec: ComponentSpec, implementation: str,151 pattern_context: str) -> str:152 test_prompt = f"""Generate comprehensive tests for this {spec.language} {spec.component_type}.153154IMPLEMENTATION:155{implementation}156157TEST SCENARIOS:158{json.dumps(spec.test_scenarios, indent=2)}159160Generate complete test code following the patterns in the codebase.161Cover success cases, edge cases, and each error handling scenario.162Output ONLY test code."""163 return self.inference_engine.generate(164 test_prompt,165 system_prompt=f"Generate complete {spec.language} tests. Output ONLY code.",166 temperature=0.0,167 seed=self.generation_seed,168 max_tokens=2000169 )
IX. The API Gateway: FastAPI Backend
python1# sovereign/api/main.py23from fastapi import FastAPI, HTTPException, BackgroundTasks, WebSocket4from fastapi.middleware.cors import CORSMiddleware5from pydantic import BaseModel, Field6from typing import Dict, Any, Optional, List7import uuid8import yaml9from sovereign.orchestration.moe_orchestrator import MoEOrchestrator10from sovereign.governance.control_boundary import ControlBoundaryEngine111213def load_config(path: str = "./config/sovereign.yaml") -> Dict[str, Any]:14 with open(path) as f:15 return yaml.safe_load(f)161718config = load_config()19app = FastAPI(20 title="SOVEREIGN API",21 description="Self-owned local-first AI orchestration. No cloud. No telemetry. Your inference.",22 version="1.0.0"23)2425app.add_middleware(26 CORSMiddleware,27 allow_origins=config.get("cors_origins", ["http://localhost:3000"]),28 allow_credentials=True,29 allow_methods=["*"],30 allow_headers=["*"],31)3233orchestrator = MoEOrchestrator(config)343536class QueryRequest(BaseModel):37 query: str = Field(..., min_length=1, max_length=10000)38 session_id: str = Field(default_factory=lambda: str(uuid.uuid4()))39 persona_override: Optional[List[str]] = None40 domain_hint: Optional[str] = None41 stream: bool = False424344class DocumentIngestRequest(BaseModel):45 documents: List[Dict[str, Any]]46 collection: Optional[str] = "default"47 extract_entities: bool = True48 build_graph_edges: bool = True495051@app.post("/query")52async def query(request: QueryRequest) -> Dict[str, Any]:53 """54 Primary query endpoint. Runs the full 8-phase orchestration pipeline.55 Returns response with evaluation scores, provenance, and execution trace.56 """57 try:58 result = orchestrator.execute(59 query=request.query,60 session_id=request.session_id,61 user_context={"domain_hint": request.domain_hint}62 )63 return result64 except Exception as e:65 raise HTTPException(status_code=500, detail=str(e))666768@app.websocket("/query/stream")69async def query_stream(websocket: WebSocket):70 """71 Streaming query endpoint for real-time token delivery.72 Every token comes from local inference.73 """74 await websocket.accept()75 try:76 data = await websocket.receive_json()77 query_text = data.get("query", "")78 session_id = data.get("session_id", str(uuid.uuid4()))7980 for token in orchestrator.inference_engine.generate_stream(query_text):81 await websocket.send_json({"token": token, "done": False})8283 await websocket.send_json({"token": "", "done": True})84 except Exception as e:85 await websocket.send_json({"error": str(e), "done": True})86 finally:87 await websocket.close()888990@app.post("/documents/ingest")91async def ingest_documents(request: DocumentIngestRequest,92 background_tasks: BackgroundTasks) -> Dict[str, Any]:93 """Ingest documents into the memory substrate (vector store + knowledge graph)."""94 doc_ids = orchestrator.vector_store.embed_and_store(request.documents)95 return {96 "ingested_count": len(doc_ids),97 "document_ids": doc_ids,98 "collection": request.collection99 }100101102@app.get("/personas")103async def list_personas() -> Dict[str, Any]:104 """List all personas with their current lifecycle state and performance metrics."""105 active = {106 pid: {107 "name": p.name,108 "status": p.status,109 "expertise": p.expertise,110 "average_score": p.performance.average_score,111 "total_queries": p.performance.total_queries,112 "version": p.version113 }114 for pid, p in orchestrator.persona_engine.active_personas.items()115 }116 cold = {117 pid: {"name": p.name, "status": p.status}118 for pid, p in orchestrator.persona_engine.cold_storage.items()119 }120 return {"active": active, "cold_storage": cold}121122123@app.post("/personas/{persona_id}/recall")124async def recall_persona(persona_id: str, query_context: str) -> Dict[str, Any]:125 """Attempt to recall a pruned persona based on query context."""126 recalled = orchestrator.persona_engine.recall_persona(persona_id, query_context)127 if recalled:128 return {"recalled": True, "persona_name": recalled.name, "persona_id": recalled.id}129 return {"recalled": False, "reason": "Context relevance below recall threshold"}130131132@app.get("/audit/log")133async def get_audit_log(limit: int = 50) -> Dict[str, Any]:134 """Return the most recent audit log entries."""135 import json136 entries = []137 try:138 with open(config.get("governance_config", {}).get("audit_log_path", "./logs/audit.jsonl")) as f:139 for line in f:140 if line.strip():141 entries.append(json.loads(line))142 except FileNotFoundError:143 entries = []144 return {"entries": entries[-limit:], "total_count": len(entries)}145146147@app.get("/health")148async def health() -> Dict[str, Any]:149 return {150 "status": "sovereign",151 "inference_mode": config.get("inference_config", {}).get("execution_mode", "ollama"),152 "cloud_dependency": False,153 "telemetry": False154 }
X. Complete Project Scaffolding
This is the directory structure for a coding agent to construct from scratch. Every file listed is necessary. Every directory serves a specific architectural purpose.
text1sovereign/2├── README.md3├── pyproject.toml4├── docker-compose.yml5├── Makefile6│7├── config/8│ ├── sovereign.yaml # Master configuration9│ ├── personas/ # Persona definition templates10│ │ ├── analytical.json11│ │ ├── creative.json12│ │ ├── technical.json13│ │ ├── critical.json14│ │ └── generalist.json15│ └── model_registry.yaml # Local model routing table16│17├── sovereign/ # Core Python package18│ ├── __init__.py19│ │20│ ├── inference/21│ │ ├── __init__.py22│ │ └── local_engine.py # Ollama + llama.cpp unified interface23│ │24│ ├── memory/25│ │ ├── __init__.py26│ │ ├── knowledge_graph.py # Dual-substrate KG (Neo4j + NetworkX)27│ │ ├── vector_store.py # ChromaDB/Qdrant local vector store28│ │ └── document_loader.py # PDF, Markdown, HTML, JSON loaders29│ │30│ ├── reasoning/31│ │ ├── __init__.py32│ │ ├── persona_engine.py # Persona lifecycle + bounded evolution33│ │ └── domain_classifier.py34│ │35│ ├── orchestration/36│ │ ├── __init__.py37│ │ ├── moe_orchestrator.py # 8-phase query execution pipeline38│ │ └── agent_swarm.py # Multi-agent parallel execution39│ │40│ ├── governance/41│ │ ├── __init__.py42│ │ ├── control_boundary.py # Intent evaluation + output scoring43│ │ └── audit_exporter.py # Export audit trail to CSV/JSON44│ │45│ ├── specgen/46│ │ ├── __init__.py47│ │ ├── spec_generator.py # Deterministic code generation48│ │ └── spec_validator.py # Validate spec completeness before generation49│ │50│ └── api/51│ ├── __init__.py52│ ├── main.py # FastAPI application53│ ├── middleware.py # Request logging, auth54│ └── models.py # Pydantic request/response models55│56├── frontend/ # Next.js 14 interface57│ ├── package.json58│ ├── tsconfig.json59│ ├── next.config.ts60│ ├── tailwind.config.ts61│ │62│ ├── app/63│ │ ├── layout.tsx64│ │ ├── page.tsx # Main chat interface65│ │ ├── globals.css66│ │ │67│ │ ├── chat/68│ │ │ └── page.tsx # Conversational query UI69│ │ ├── personas/70│ │ │ └── page.tsx # Persona management dashboard71│ │ ├── knowledge/72│ │ │ └── page.tsx # Knowledge graph visualization73│ │ ├── audit/74│ │ │ └── page.tsx # Audit log viewer75│ │ └── specgen/76│ │ └── page.tsx # SpecGen UI: spec input → code output77│ │78│ └── components/79│ ├── ChatInterface.tsx80│ ├── PersonaCard.tsx81│ ├── GraphViewer.tsx # D3.js or Cytoscape knowledge graph viz82│ ├── AuditLog.tsx83│ ├── EvaluationScore.tsx84│ ├── ProvenancePanel.tsx85│ └── SpecForm.tsx86│87├── data/88│ ├── personas/89│ │ ├── experimental/90│ │ ├── active/91│ │ ├── stable/92│ │ ├── pruned/93│ │ └── cold_storage/94│ ├── chromadb/ # Local vector store persistence95│ ├── graph_snapshots/ # Exported knowledge graph states96│ └── documents/ # Source document repository97│98├── logs/99│ ├── audit.jsonl # Governance audit trail (append-only)100│ ├── execution_traces/ # Per-query execution traces101│ └── persona_evolution/ # Persona lifecycle change logs102│103├── scripts/104│ ├── setup.sh # One-command environment setup105│ ├── ingest_documents.py # Batch document ingestion106│ ├── create_persona.py # Interactive persona creation wizard107│ ├── export_audit.py # Audit trail export utility108│ ├── run_specgen.py # SpecGen CLI109│ └── graph_snapshot.py # Export knowledge graph state110│111└── tests/112 ├── unit/113 │ ├── test_knowledge_graph.py114 │ ├── test_persona_engine.py115 │ ├── test_control_boundary.py116 │ ├── test_local_engine.py117 │ └── test_spec_generator.py118 ├── integration/119 │ ├── test_orchestration_pipeline.py120 │ └── test_api_endpoints.py121 └── fixtures/122 ├── sample_personas.json123 ├── sample_documents/124 └── sample_specs.json
XI. Configuration: The Master Manifest
yaml1# config/sovereign.yaml2# Every value here is yours to set. Nothing is a default you cannot override.3# Read this file as a declaration of your own system's values.45sovereign:6 version: "1.0.0"7 environment: "development" # development | production | air_gap89inference_config:10 execution_mode: "ollama" # ollama | llama_cpp | hybrid11 ollama_endpoint: "http://localhost:11434"12 default_model: "llama3.2"13 seed: 42 # Reproducibility: same seed = same output14 temperature: 0.1 # Low temperature: precision over creativity15 max_tokens: 200016 model_registry:17 routing:18 code: "qwen2.5-coder:7b"19 research: "llama3.2"20 writing: "mistral:7b"21 architecture: "llama3.2"22 governance: "llama3.2"23 paths: {} # For llama_cpp mode: model file paths2425graph_config:26 neo4j_uri: "bolt://localhost:7687"27 neo4j_user: "neo4j"28 neo4j_password: "sovereign" # Change this before production29 decay_factor: 0.95 # Temporal decay per session30 prune_confidence_threshold: 0.13132vector_config:33 persist_directory: "./data/chromadb"34 collection_name: "sovereign_documents"35 embedding_model: "nomic-embed-text"3637persona_config:38 personas_dir: "./data/personas"39 max_parallel_personas: 340 prune_threshold: 0.341 recall_threshold: 0.342 evolution_rate: 0.05 # How quickly persona traits respond to feedback43 min_queries_before_prune: 104445governance_config:46 audit_log_path: "./logs/audit.jsonl"47 risk_thresholds:48 block: 0.949 require_confirmation: 0.750 enhanced_logging: 0.451 reasonable_care_mode: true # Colorado AI Act alignment5253specgen_config:54 generation_seed: 4255 temperature: 0.0 # Zero temperature: maximum determinism56 cache_generated_specs: true5758api_config:59 host: "0.0.0.0"60 port: 800061 cors_origins:62 - "http://localhost:3000"6364frontend_config:65 api_base_url: "http://localhost:8000"66 websocket_url: "ws://localhost:8000/query/stream"67 graph_visualization: "cytoscape" # d3 | cytoscape
XII. Bootstrap: From Zero to Sovereign in Ten Commands
bash1# 1. Clone and enter2git clone https://github.com/kliewerdaniel/sovereign.git3cd sovereign45# 2. Install Python dependencies6pip install -r requirements.txt78# 3. Install spaCy language model (for entity extraction in governance layer)9python -m spacy download en_core_web_sm1011# 4. Start Ollama and pull your primary model12ollama serve &13ollama pull llama3.214ollama pull nomic-embed-text # For local embeddings1516# 5. Start Neo4j (optional: skip for pure in-memory graph)17docker run -d \18 --name sovereign-neo4j \19 -p 7474:7474 -p 7687:7687 \20 -e NEO4J_AUTH=neo4j/sovereign \21 neo4j:latest2223# 6. Create directory structure24python scripts/setup.sh2526# 7. Ingest your first documents27python scripts/ingest_documents.py --source ./data/documents/2829# 8. Start the API backend30uvicorn sovereign.api.main:app --reload --port 80003132# 9. Start the frontend33cd frontend && npm install && npm run dev3435# 10. Open your sovereign AI at http://localhost:300036# No API keys. No cloud. No telemetry.37# Your hardware. Your inference. Your memory.38echo "SOVEREIGN is running. You own this."
XIII. The Knowledge Graph of the Blog — Why This Project Is the Synthesis
Every post I have written on this blog is a node in a knowledge graph. Every project I have built is an edge between concepts. SOVEREIGN is the traversal of that graph from end to end — the path that passes through every significant node and resolves the relationships between them.
text1[local inference] ──ENABLES──▶ [data sovereignty]2[data sovereignty] ──REQUIRES──▶ [audit trails]3[audit trails] ──REQUIRES──▶ [control boundary]4[control boundary] ──GOVERNS──▶ [MoE orchestration]5[MoE orchestration] ──ROUTES_TO──▶ [persona engine]6[persona engine] ──QUERIES──▶ [knowledge graph]7[knowledge graph] ──GROUNDS──▶ [RAG retrieval]8[RAG retrieval] ──FEEDS──▶ [SpecGen]9[SpecGen] ──GENERATES──▶ [new sovereign components]10[new sovereign components] ──EXPAND──▶ [knowledge graph]11 ▲12 └── (the loop closes)
This is not a coincidence of architecture. It is the point. A sovereign AI system should be able to reason about its own architecture. The knowledge graph should contain documentation of the system itself. SpecGen should be able to generate new components for the system from its own specifications. The orchestrator should be able to route queries about how to improve the orchestrator.
The system is self-referential by design. Not self-modifying — you remain the author of every change. But self-aware in the sense that every component can be queried, explained, and improved using the system itself.
That is what sovereignty means at full depth. Not just that your data stays local. Not just that your inference is on-prem. But that the system you use to think can be used to improve the way you think, and the improvement remains yours.
XIV. What This Is Not
SOVEREIGN is not:
-
A replacement for the best frontier models. GPT-5 and Claude and Gemini outperform every local model on raw capability benchmarks. If capability on cloud hardware with their data on their telemetry is the only thing you care about, this architecture is not for you.
-
A finished product. It is an architecture. A blueprint. A starting point. The personas you define will shape it. The documents you ingest will train its memory. The governance thresholds you configure will determine its behavior. The code this post generates is scaffolding, not a ceiling.
-
A political statement against any particular company. It is a structural argument: systems designed to extract from you produce different architecture than systems designed to serve you. Both exist. The choice between them is yours to make.
What this is: the most complete expression of everything I understand about building AI systems that answer to the person running them. Every module in this codebase is the distillation of a problem I could not stop thinking about until I had an implementation that solved it.
Build it. Modify it. Extend it. Publish your modifications. The graph grows in every direction from here.
Closing: The Architecture Is the Argument
The code in this post is an argument.
The bounded update function Δw = f(feedback) × (1 − w) is an argument that stability matters — that a system should resist extremes, not optimize toward them.
The query-scoped knowledge graph is an argument that memory should be deliberate — that accumulation without discernment is not intelligence, it is noise.
The governance layer in the execution path is an argument that accountability cannot be post-hoc — that a system which can only be evaluated after the fact cannot be meaningfully controlled.
The local inference requirement is an argument that the execution path should belong to the person executing — that cognitive infrastructure has an owner, and that owner should be you.
Every design choice in SOVEREIGN is downstream of one question: who is this system for?
I built it for myself. And then I wrote it down so you could build it for yourself too.
That is what sovereignty means in practice: not the absence of dependency on everything, but the deliberate choice of which dependencies you accept and which you refuse. The cloud can keep the telemetry. You keep the mind.
Appendix A: Python Dependencies
toml1# pyproject.toml2[project]3name = "sovereign"4version = "1.0.0"5description = "Self-owned local-first AI orchestration system"6requires-python = ">=3.11"78dependencies = [9 # Core10 "fastapi>=0.110.0",11 "uvicorn[standard]>=0.29.0",12 "pydantic>=2.6.0",13 "pyyaml>=6.0",1415 # Inference16 "requests>=2.31.0",1718 # Memory19 "chromadb>=0.4.24",20 "networkx>=3.2",21 "neo4j>=5.18.0",2223 # Document processing24 "pypdf>=4.1.0",25 "python-docx>=1.1.0",26 "markdown>=3.6",2728 # NLP / Entity extraction29 "spacy>=3.7.4",3031 # Utilities32 "python-multipart>=0.0.9",33 "aiofiles>=23.2.1",34 "websockets>=12.0",35]3637[project.optional-dependencies]38dev = [39 "pytest>=8.1.0",40 "pytest-asyncio>=0.23.0",41 "httpx>=0.27.0",42 "black>=24.3.0",43 "ruff>=0.3.0",44 "mypy>=1.9.0",45]
Appendix B: Docker Compose
yaml1# docker-compose.yml2# Complete local stack. No external services. No internet required after initial pull.34version: "3.9"56services:7 sovereign-api:8 build: .9 ports:10 - "8000:8000"11 volumes:12 - ./data:/app/data13 - ./logs:/app/logs14 - ./config:/app/config15 environment:16 - OLLAMA_ENDPOINT=http://ollama:1143417 - NEO4J_URI=bolt://neo4j:768718 depends_on:19 - ollama20 - neo4j21 networks:22 - sovereign-network2324 sovereign-frontend:25 build: ./frontend26 ports:27 - "3000:3000"28 environment:29 - NEXT_PUBLIC_API_URL=http://localhost:800030 networks:31 - sovereign-network3233 ollama:34 image: ollama/ollama:latest35 ports:36 - "11434:11434"37 volumes:38 - ollama-models:/root/.ollama39 deploy:40 resources:41 reservations:42 devices:43 - driver: nvidia44 count: all45 capabilities: [gpu]46 networks:47 - sovereign-network4849 neo4j:50 image: neo4j:551 ports:52 - "7474:7474"53 - "7687:7687"54 environment:55 - NEO4J_AUTH=neo4j/sovereign56 volumes:57 - neo4j-data:/data58 networks:59 - sovereign-network6061volumes:62 ollama-models:63 neo4j-data:6465networks:66 sovereign-network:67 driver: bridge
SOVEREIGN is the synthesis of every system documented on this blog. Every component described here has a prior post that goes deeper on its individual design. The knowledge graph of danielkliewer.com is the context this post assumes you already carry. If you arrived here without that context, the blog is the prerequisite.
Repository: github.com/kliewerdaniel/sovereign
Series: Sovereignty Manifesto · Architecture as Autonomy · Architecture of Autonomy · Private Knowledge Graph · DeerFlow 2.0 · OpenClaw Guide · SOVEREIGN — This Post

Sovereign AI: Building Local-First Intelligent Systems
by Daniel Kliewer · Paperback · 72 pages
The hands-on guide to building AI that runs on your hardware, keeps your data private, and eliminates cloud dependence. Working code included.