Tech

Git-Backed Memory for AI Agents: Why Markdown Beats Vector Stores

Akshay Sarode Jan 30, 2026

Direct answer

Make your agent's memory .md files in a git repo (or a Cloud Storage bucket synced to one). YAML frontmatter for metadata. BM25 search baseline; vectors as a derived index. Benefits: version history, diff, grep, easy migration, no vendor lock-in.

The default shape for AI agent memory in 2026 is "vector database with proprietary chunking." Mem0, Zep, Pinecone-backed-stores. They work. They also bury your data in vendor schemas, make audit hard, and make migration painful.

There's a different shape: markdown files, version-controlled. Your agent's memory is git-grade, not database-grade. Here's the case for it.

What "git-backed memory" actually means

Each memory entry is a .md file
YAML frontmatter holds structured metadata (type, tags, created, updated, source)
Files live in a directory (or bucket prefix) with semantic naming (user_facts/preferred_name.md, conversations/2026-01-15/thread-abc.md)
The directory is the source of truth — any database/index is derived
Optional: the directory is a git repo, with commits per change (or sync'd to one)

What you get

Version history

Your agent updated its memory of "user prefers JSON over YAML" on Jan 15. On Feb 3 the user changed their mind. git log shows you both. The agent that learned twice has a history.

Grep

"Did the agent ever record anything about Postgres?" — grep -r postgres ~/agent-memory/. Done. No SDK, no query language, no vendor.

Diff

"What changed in the agent's memory this week?" — git diff @{1.week.ago}. Done.

No vendor lock-in

Migration to a different agent stack: rsync the directory. Your data isn't trapped in Pinecone schemas or Mem0's proprietary chunking.

Audit and compliance

Hash-chained audit logs (see our audit post) reference memory by file path. The audit trail and the memory live in the same domain — no opaque IDs to map across systems.

Human-readable

Your agent's memory of last Tuesday's customer call is a .md file. Open it in any editor. Read it. Edit it if it's wrong. The agent picks up the change on next read. Try doing that with a vector store.

What you give up

Pure-vector retrieval ergonomics

Vector search isn't worse — it's just an extra layer. You compute embeddings, store them in a derived index (Firestore, pgvector, qdrant), search by similarity. The embeddings are derived from the markdown; the markdown is the truth.

"Magic" deduplication

Mem0's deduplication and update-vs-append heuristics are slick. With markdown, you handle dedup at the agent level (or with a small library). Different tradeoff: less magic, more visibility.

Some embedding-model-specific features

If a vendor's special sauce is "we trained a fine-tuned embedding model for agent memory," you don't get that benefit by switching to markdown + Vertex AI / OpenAI embeddings. Most use cases don't need it.

The shape on disk

A memory entry is one file. The path encodes namespace; the body is the content; the frontmatter is structured metadata:

# user_facts/config_format_preference.md
---
type: preference
tags: [config, format]
source: conversation:2026-01-15
created: 2026-01-15T10:23:00Z
---

User prefers JSON over YAML for config files.

Search

Three modes, all derived from the markdown:

BM25 — text indexing on Firestore (or postgres-fts). Fast, no embedding cost, exact-keyword.
Vector — embed each file, similarity-search at query time. Vertex AI / OpenAI / Cohere — your choice.
Hybrid — BM25 + vector merged, max-of-both. Best general-purpose.

Re-indexing is a directory walk. The bucket / repo is the truth; the index is a cache.

Git as the sync layer

Optional but powerful. Sync your agent's memory bucket to a git repo:

# Daily cron
gsutil rsync -r gs://my-agent-memory/ ./agent-memory-repo/
cd agent-memory-repo
git add . && git commit -m "Memory snapshot $(date -u +%F)"
git push origin main

Now your memory has a public git history. Your agent's evolution is auditable. You can revert. You can branch.

The agent code stays small

from ujex_recall import RecallStore

store = RecallStore(api_key=..., agent_id='my-agent')

# Write
store.put(('user_facts',), 'config_format_preference',
          'User prefers JSON over YAML.',
          metadata={'tags': ['config', 'format']})

# Read
fact = store.get(('user_facts',), 'config_format_preference')

# Search
hits = store.search(('user_facts',), query='JSON config preference', limit=5)

Same shape as Mem0's API. Different storage shape underneath.

Comparing approaches

	Vector store (Pinecone-backed)	Letta tier model	Git-backed markdown
Storage primitive	Vector + metadata	Three-tier internal format	`.md` file
Human-readable	✗	~	✓
Greppable	✗	✗	✓
Version history	Vendor-specific	Vendor-specific	git, native
Migration cost	Schema mapping	Schema mapping	rsync
Vendor lock-in	High	Medium	None

What we ship

Ujex Recall is the markdown-first implementation. Bucket as source of truth; Firestore + Vertex AI as derived index; SDKs in Python / Go / TS, all Apache-2.0. How it compares to Mem0 / Letta / Zep.

FAQ

How do I deduplicate?

Either at the agent level (use a deterministic filename based on content hash, so duplicates overwrite) or by post-processing on write (BM25 search before insert; if >X similarity, update existing instead of creating new).

What about partial updates?

Edit the file. The next read sees the new content. The next embedding refresh updates the vector index. git commits track the diff.

How big can the bucket get?

Cloud Storage is effectively unlimited. The Firestore index scales linearly with file count; budget about $1/month per 100k files for query cost.

Doesn't markdown lose semantic structure?

Frontmatter holds structure. The body is the human-readable part. If you need richer schemas, embed JSON in the frontmatter.