Performance & Resource Usage

Token usage (AI agent context)

Scenario	Without Knowledge Master	With Knowledge Master
"How does auth work?"	Paste 5 files (~15K tokens)	Retrieves 5 chunks (~2K tokens)
"What depends on postgres?"	Manual grep + explain (~5K tokens)	Blast radius in ~200 tokens
"Does this follow conventions?"	Describe rules manually (~1K tokens)	Structured check in ~300 tokens
Re-explaining architecture	Every session (~3K tokens)	Never (persisted in graph)

Net effect: 60-80% fewer tokens for codebase questions.

Method	Precision@5	Notes
LLM guessing (no RAG)	~50%	Hallucinates confidently
Raw vector search	~60-70%	Finds topically related chunks
Vector + re-ranking	~80-85%	Promotes actual answers
Vector + re-ranking + graph	~85-90%	Adds relationship context
Blast radius (graph only)	~100%	Deterministic traversal
Convention check	~100%	Rule-based verification

Component	RAM	CPU	Disk
FalkorDB	80-128 MB	<1%	Depends on data
Postgres	30-64 MB	<1%	Minimal
Ollama (model loaded)	300 MB	0%	274 MB (model file)
Total	~400-500 MB	<1%

Component	RAM	CPU	Notes
FalkorDB	200-500 MB	Low	Grows with data
Ollama	500 MB-1 GB	High (1 core)	Embedding inference
Python process	100-200 MB	Low	Chunking + I/O
Total	~1-1.5 GB	1 core

Rule of thumb: ~3x raw data size for vectors + metadata.

Index selectively — skip generated files, vendor dirs, build artifacts
Use --type docs for non-code directories (skips git history extraction)
Restart Ollama if embedding slows down (memory leak in long sessions)
Increase FalkorDB memory for large graphs: edit deploy.resources.limits.memory in docker-compose.yml