benchmark results

Engram sets a new
state of the art.

Evaluated on LOCOMO (Long-term Conversational Memory), the standard benchmark for agent memory systems. 10 conversations, ~418 turns each, 1,540 questions across 4 categories. Engram achieves a 19.6% relative improvement over Mem0.

Same benchmark used by Mem0 to claim state-of-the-art.

Overall Accuracy

LLM-as-a-Judge score on LOCOMO benchmark

Engram80%

Mem066.9%

MEMORY.md28.8%

Each system evaluated using its preferred/published LLM. MEMORY.md baseline uses a manually maintained memory file.

Key insight

"Recall-based beats extraction-based."

Engram invests intelligence at read time, when you know the query, not write time when you don't know what'll matter. This is the fundamental architectural difference.

How we stack up

Engram vs Mem0 vs Zep vs Letta

Comparing the leading AI agent memory solutions. Engram is a persistent memory MCP server built for Claude Code, Cursor, and any AI coding agent.

LOCOMO scores from arXiv:2402.17753. DMR scores from arXiv:2310.08560 (MemGPT/MSC-Self-Instruct, 500 conversations). Engram tested with Gemini 2.5 Flash; Mem0 scores from their published results with OpenAI. Zep LOCOMO corrected per Mem0 replication. Zep and MemGPT DMR scores use GPT-4 Turbo.

Feature	EngramYOU ARE HERE	Mem0	Zep / Graphiti	Letta / MemGPT
LOCOMO Benchmark	80.0%	66.9%	~58.4% (corrected by Mem0)	Not published
DMR Benchmark	92.0%	Not published	94.8%	93.4%
Token Efficiency	776 tokens/query	~2,000+ tokens/query	High (graph traversal)	Variable (agent-managed)
MCP Support	Native MCP server	Community MCP wrapper	No native MCP	No native MCP
Setup	npm install -g engram-sdk && engram init	pip install mem0ai + Qdrant setup	Docker Compose + Neo4j + config	pip install letta + server setup
Language	TypeScript / Node.js	Python	Python	Python
Architecture	SQLite + sqlite-vec, single binary	Python + Qdrant / PostgreSQL	Python + Neo4j + PostgreSQL	Python agent framework + PostgreSQL
Consolidation	Automatic LLM consolidation with spreading activation	Basic deduplication	Graph-based temporal reasoning	Agent-managed memory editing
Temporal Memory	Bi-temporal (valid_from / valid_until)	No	Full bi-temporal graph	No
Knowledge Graph	Knowledge graph with entity extraction	Limited entity extraction	Neo4j knowledge graph (core feature)	No
Self-Hosted	Yes, zero dependencies	Yes, requires Qdrant	Requires Neo4j + Docker	Yes, requires PostgreSQL
LLM Support	Gemini, OpenAI, Groq, Ollama, any OpenAI-compatible	OpenAI, Anthropic, others via LiteLLM	OpenAI primarily	OpenAI, Anthropic, others
Pricing	Free (personal use) + hosted plans from $29/mo	Open source + hosted API	Open source + Zep Cloud	Open source + Letta Cloud

Why developers choose Engram over Mem0, Zep, and Letta

Most AI agent memory solutions require heavy infrastructure. Mem0 needs a Qdrant vector database. Zep and Graphiti require Neo4j and Docker. Letta (formerly MemGPT) needs PostgreSQL and a separate server process. Engram runs as a single binary with SQLite, installs via npm, and works as a native MCP server for Claude Code and Cursor.

On the LOCOMO benchmark (1,540 questions across 10 conversations), Engram achieves 80.0% accuracy while using 96.6% fewer tokens than full-context approaches. The key architectural insight: invest intelligence at read time (when the query is known), not write time (when you don't know what will matter).

Engram supports any OpenAI-compatible LLM provider, including Gemini, Groq, Cerebras, Ollama, and Together AI, via a single environment variable. No vendor lock-in, no API key requirements beyond your existing LLM provider.

Token efficiency

93.6% fewer tokens than full context

Engram

1,504

tokens per query

Full Context

23,423

tokens per query

Token reduction93.6%

Better accuracy with 15x fewer tokens than stuffing the full conversation into context.

Methodology

How we tested

BenchmarkLOCOMO (arXiv:2402.17753)

Conversations evaluated10 of 10

Questions1,540

ScoringLLM-as-a-Judge

Engram LLMGemini 2.0 Flash

Mem0 LLMGPT-4o-mini (published)

Mem0 sourcearXiv:2504.19413

LOCOMO Paper Mem0 Paper Engram on GitHub

Mem0 scores are from their published paper (10 conversations, 10 runs averaged). Engram evaluated on all 10 LOCOMO conversations (1,540 questions). Same LLM-as-a-Judge methodology as Mem0's paper.