How I gave my AI tools a shared memory using MCP and pgvector
Software SystemsNetworking & Security
Most "AI memory" projects treat memory as a feature of one tool: Cursor with persistence, Claude with notebooks. That misses the actual pain.
The pain is handoff. You think with one AI, code with another, debug with a third. Every switch is a context bankruptcy:
Re-paste the plan
Re-paste the file list
Re-paste the decision and the reason
Re-paste what you already tried
Hope you didn't miss anything
This is the AI dev experience nobody talks about: manual context shipping between tools.
The industry's answer is "use a longer context window." That's like saying you don't need shared file storage because your laptop has more RAM now.
What I wanted instead: a small piece of plumbing that both AIs can read and write to. Hand-offs become a tool call, not a copy-paste.
That's SessionVault.
The Workflow This Unlocks
sequenceDiagram participant U as You participant C as Claude Desktop participant SV as SessionVault participant Cur as Cursor U->>C: "Help me design the auth module" Note over C,U: Brainstorm, pick bcrypt + JWT U->>C: "save this session as 'auth-jwt-v1'" C->>SV: save_session({name, decisions, files, todos}) SV-->>C: ok Note over U: Switch to Cursor U->>Cur: "load session 'auth-jwt-v1'" Cur->>SV: load_session({name:"auth-jwt-v1"}) SV-->>Cur: full structured record U->>Cur: "implement what we planned" Note over Cur,U: Cursor builds with the full plan, no re-explaining
That's it. The handoff is one tool call in each direction.
Why this matters:
Without SessionVault
With SessionVault
Copy-paste 500–1500 tokens between tools
One tool call each side
Forget half the decisions
Structured fields persist verbatim
Lose the reasoning behind choices
decisions array survives the jump
Each new chat = blank slate
search_sessions finds related past work
Single-tool memory
Multi-tool shared context
The killer feature isn't memory. It's interop.
Honest caveat: which AIs work today
You need MCP support on both ends. As of mid-2026:
Claude Desktop — full MCP support
Cursor — full MCP support
ChatGPT (consumer) — not yet (use Claude or Cursor instead)
Anything custom via the OpenAI API — yes, MCP works via the SDKs
So today's real-world handoff is Claude ↔ Cursor. As more clients ship MCP, this gets bigger.
Each AI client spawns its own SessionVault process over stdio. Both processes talk to the same Postgres instance. That's the shared bus — Postgres is the source of truth, the MCP servers are just thin per-client adapters.
Layer
Tech
Role
Integration
MCP (stdio, JSON-RPC)
Any MCP-aware client connects
Server
TypeScript + Zod
5 tools, validation, error hints
Memory
Mem0 OSS
Fact extraction + embedding
Inference
LM Studio or OpenAI
LLM + embeddings
Storage
PostgreSQL + pgvector
Vectors + metadata, always local
Tests
Vitest
15 unit tests, runs in 14 ms
CI
GitHub Actions
Build + test on Node 20/22
The Design Call That Made It Trustworthy: Dual Storage
The first version of load_session quietly returned wrong data.
You'd save auth-jwt-v1. Later, load_session("auth-jwt-v1") would call Mem0's semantic search internally — but Mem0 extracts atomic facts, not literal session text. The facts didn't contain the literal session name. So the search would return vaguely-similar facts from other sessions. No error. Just plausible-looking garbage.
That's the worst kind of bug. Especially when one AI just handed off "the plan" to another.
The fix: store every session twice.
saveSession(input)
│
┌───────────────┼───────────────┐
▼ ▼
Layer 1: Verbatim raw record Layer 2: LLM-extracted facts
• infer:false (no LLM) • Mem0 runs the LLM
• Exact bytes in/out • One row per atomic fact
• type:session_raw • type:session_fact
• Source of truth for LOAD • Powers SEMANTIC SEARCH
│ │
└───────────────┬───────────────┘
▼
PostgreSQL + pgvector
// Layer 1: bytes-in, bytes-out — survives even if the LLM is downawait memory.add(JSON.stringify(record), { userId, metadata: rawMetadata(record), // raw: JSON.stringify(record) infer: false, // skip LLM extraction entirely});// Layer 2: best-effort fact extraction for semantic searchtry { const res = await memory.add( [{ role: "user", content: sessionText(input) }], { userId, metadata: factMetadata(input) } ); factsExtracted = res?.results?.length ?? 0;} catch { // facts are an enhancement; raw save above already succeeded}
Why this is bulletproof:
load_session is now deterministic. It does getAll({filters: {session_name, type:raw}}) — pure metadata lookup. You get back exactly what you saved, or {found: false}. No silent wrong answers, ever. Critical when one AI is handing off to another.
search_sessions still gets focused facts — one fact per row makes semantic recall better than embedding huge blobs.
LLM down? Raw save still succeeds. Fact extraction is wrapped in try/catch. The verbatim record is the contract.
One extra row per save. Massive correctness win.
LM Studio vs OpenAI (Your Choice)
Set one env var: MEMORY_PROVIDER=lmstudio or openai.
LM Studio (default)
OpenAI
Cost
Free (your hardware)
Pay per token
Privacy
Inference stays on-device
Text sent to OpenAI for extract/embed
Vectors
Local Postgres
Local Postgres (always)
Embed dims
768 (nomic-embed-text)
1536 (text-embedding-3-small)
Critical: chat models (Llama, Gemma) cannot embed. LM Studio needs a dedicated embedding model loaded alongside the chat model, or /v1/embeddings hangs forever. I learned that the hard way.
Switching providers? pnpm run db:reset — vector dimensions must match.
The 5 MCP Tools
Every MCP tool costs ~500 tokens in the host's context just by being registered. So each one earns its slot.
Tool
What it does
save_session
Dual-write: verbatim record + extracted facts. Re-save same name = overwrite
load_session
Deterministic exact-name lookup. brief / normal / full modes
search_sessions
Semantic search. Optional max_tokens cap and repo filter
list_sessions
Newest-first list, deduped, optional repo filter
delete_session
Removes raw record AND extracted facts
Handoff example: save in Claude, load in Cursor
In Claude Desktop:
User: "We just designed the auth module. Save this as auth-jwt-v1 for me."
LM Studio with chat + embedding models, or OpenAI API key
Claude Desktop and/or Cursor (both speak MCP)
Setup
git clone <your-repo-url>cd mcp_memoryserverpnpm install && pnpm run buildcp .env.example .envpnpm run db:up # Postgres on :5433pnpm test # 15/15 in ~14 ms
Wire it into Claude Desktop AND Cursor
The exact same server can serve both clients — they just spawn separate processes. Add this entry to both config files:
Claude Desktop: ~/Library/Application Support/Claude/claude_desktop_config.jsonCursor: ~/.cursor/mcp.json
Restart both apps → SessionVault's 5 tools appear in each. Now you can save in one and load in the other.
The Takeaway
The exciting frontier in AI tooling isn't longer context windows. It's interop — letting different AIs share state so you can use the right tool for each step of your work.
A small shared bus + structured snapshots + deterministic load gets you most of the way there. MCP makes the wiring easy. pgvector + Mem0 make the storage cheap and local.
Plan with Claude. Build in Cursor. Skip the copy-paste.
That's the whole pitch.
Built with TypeScript, the Model Context Protocol, Mem0 OSS, PostgreSQL/pgvector, and LM Studio or OpenAI. Vector data stays on your machine.
Using a different MCP-aware client? Open an issue — I'd love to expand the compatibility list.