Token usage (AI agent context)
| Scenario |
Without Knowledge Master |
With Knowledge Master |
| "How does auth work?" |
Paste 5 files (~15K tokens) |
Retrieves 5 chunks (~2K tokens) |
| "What depends on postgres?" |
Manual grep + explain (~5K tokens) |
Blast radius in ~200 tokens |
| "Does this follow conventions?" |
Describe rules manually (~1K tokens) |
Structured check in ~300 tokens |
| Re-explaining architecture |
Every session (~3K tokens) |
Never (persisted in graph) |
Net effect: 60-80% fewer tokens for codebase questions.
Speed
| Operation |
Latency |
Bottleneck |
| Search (vector) |
5-10ms |
FalkorDB HNSW index |
| Search (with re-ranking) |
50-80ms |
Ollama re-scoring |
| Blast radius |
2-5ms |
Graph traversal |
| Convention check |
1-2ms |
Filesystem |
| Index (per file) |
300-500ms |
Ollama embedding |
| Index (100 files) |
40-60s |
Sequential embedding |
| Full MCP tool call |
100-150ms |
Embed + search + rerank |
Accuracy
| Method |
Precision@5 |
Notes |
| LLM guessing (no RAG) |
~50% |
Hallucinates confidently |
| Raw vector search |
~60-70% |
Finds topically related chunks |
| Vector + re-ranking |
~80-85% |
Promotes actual answers |
| Vector + re-ranking + graph |
~85-90% |
Adds relationship context |
| Blast radius (graph only) |
~100% |
Deterministic traversal |
| Convention check |
~100% |
Rule-based verification |
Resource usage
Idle (containers running, no queries)
| Component |
RAM |
CPU |
Disk |
| FalkorDB |
80-128 MB |
<1% |
Depends on data |
| Postgres |
30-64 MB |
<1% |
Minimal |
| Ollama (model loaded) |
300 MB |
0% |
274 MB (model file) |
| Total |
~400-500 MB |
<1% |
|
During indexing
| Component |
RAM |
CPU |
Notes |
| FalkorDB |
200-500 MB |
Low |
Grows with data |
| Ollama |
500 MB-1 GB |
High (1 core) |
Embedding inference |
| Python process |
100-200 MB |
Low |
Chunking + I/O |
| Total |
~1-1.5 GB |
1 core |
|
Storage growth
| Data indexed |
Vector storage |
Total on disk |
| 1,000 files |
~500 MB |
~800 MB |
| 10,000 files |
~3 GB |
~5 GB |
| 50,000 files |
~15 GB |
~25 GB |
| 100 emails |
~50 MB |
~80 MB |
Rule of thumb: ~3x raw data size for vectors + metadata.
Scaling limits
| Metric |
Comfortable limit |
Hard limit |
| Total chunks |
500K |
2M (FalkorDB memory) |
| Repos indexed |
50 |
Unlimited (disk-bound) |
| Concurrent searches |
10 |
~50 (single FalkorDB instance) |
| Search latency at scale |
<200ms at 1M chunks |
Degrades above 2M |
Optimization tips
- Index selectively — skip generated files, vendor dirs, build artifacts
- Use
--type docs for non-code directories (skips git history extraction)
- Restart Ollama if embedding slows down (memory leak in long sessions)
- Increase FalkorDB memory for large graphs: edit
deploy.resources.limits.memory in docker-compose.yml