Evaluated on LOCOMO (Long-term Conversational Memory), the standard benchmark for agent memory systems. 10 conversations, ~418 turns each, 1,540 questions across 4 categories. Engram achieves a 19.6% relative improvement over Mem0.
Same benchmark used by Mem0 to claim state-of-the-art.
LLM-as-a-Judge score on LOCOMO benchmark
Each system evaluated using its preferred/published LLM. MEMORY.md baseline uses a manually maintained memory file.
Key insight
"Recall-based beats extraction-based."
Engram invests intelligence at read time, when you know the query, not write time when you don't know what'll matter. This is the fundamental architectural difference.
How we stack up
Comparing the leading AI agent memory solutions. Engram is a persistent memory MCP server built for Claude Code, Cursor, and any AI coding agent.
LOCOMO scores from arXiv:2402.17753. DMR scores from arXiv:2310.08560 (MemGPT/MSC-Self-Instruct, 500 conversations). Engram tested with Gemini 2.5 Flash; Mem0 scores from their published results with OpenAI. Zep LOCOMO corrected per Mem0 replication. Zep and MemGPT DMR scores use GPT-4 Turbo.
| Feature | EngramYOU ARE HERE | Mem0 | Zep / Graphiti | Letta / MemGPT |
|---|---|---|---|---|
| LOCOMO Benchmark | 80.0% | 66.9% | ~58.4% (corrected by Mem0) | Not published |
| DMR Benchmark | 92.0% | Not published | 94.8% | 93.4% |
| Token Efficiency | 776 tokens/query | ~2,000+ tokens/query | High (graph traversal) | Variable (agent-managed) |
| MCP Support | Native MCP server | Community MCP wrapper | No native MCP | No native MCP |
| Setup | npm install -g engram-sdk && engram init | pip install mem0ai + Qdrant setup | Docker Compose + Neo4j + config | pip install letta + server setup |
| Language | TypeScript / Node.js | Python | Python | Python |
| Architecture | SQLite + sqlite-vec, single binary | Python + Qdrant / PostgreSQL | Python + Neo4j + PostgreSQL | Python agent framework + PostgreSQL |
| Consolidation | Automatic LLM consolidation with spreading activation | Basic deduplication | Graph-based temporal reasoning | Agent-managed memory editing |
| Temporal Memory | Bi-temporal (valid_from / valid_until) | No | Full bi-temporal graph | No |
| Knowledge Graph | Knowledge graph with entity extraction | Limited entity extraction | Neo4j knowledge graph (core feature) | No |
| Self-Hosted | Yes, zero dependencies | Yes, requires Qdrant | Requires Neo4j + Docker | Yes, requires PostgreSQL |
| LLM Support | Gemini, OpenAI, Groq, Ollama, any OpenAI-compatible | OpenAI, Anthropic, others via LiteLLM | OpenAI primarily | OpenAI, Anthropic, others |
| Pricing | Free (personal use) + hosted plans from $29/mo | Open source + hosted API | Open source + Zep Cloud | Open source + Letta Cloud |
Most AI agent memory solutions require heavy infrastructure. Mem0 needs a Qdrant vector database. Zep and Graphiti require Neo4j and Docker. Letta (formerly MemGPT) needs PostgreSQL and a separate server process. Engram runs as a single binary with SQLite, installs via npm, and works as a native MCP server for Claude Code and Cursor.
On the LOCOMO benchmark (1,540 questions across 10 conversations), Engram achieves 80.0% accuracy while using 96.6% fewer tokens than full-context approaches. The key architectural insight: invest intelligence at read time (when the query is known), not write time (when you don't know what will matter).
Engram supports any OpenAI-compatible LLM provider, including Gemini, Groq, Cerebras, Ollama, and Together AI, via a single environment variable. No vendor lock-in, no API key requirements beyond your existing LLM provider.
Token efficiency
Engram
1,504
tokens per query
Full Context
23,423
tokens per query
Better accuracy with 15x fewer tokens than stuffing the full conversation into context.
Methodology
Mem0 scores are from their published paper (10 conversations, 10 runs averaged). Engram evaluated on all 10 LOCOMO conversations (1,540 questions). Same LLM-as-a-Judge methodology as Mem0's paper.