How I gave my AI tools a shared memory using MCP and pgvector

Most "AI memory" projects treat memory as a feature of one tool: Cursor with persistence, Claude with notebooks. That misses the actual pain.

The pain is handoff. You think with one AI, code with another, debug with a third. Every switch is a context bankruptcy:

Re-paste the plan
Re-paste the file list
Re-paste the decision and the reason
Re-paste what you already tried
Hope you didn't miss anything

This is the AI dev experience nobody talks about: manual context shipping between tools.

The industry's answer is "use a longer context window." That's like saying you don't need shared file storage because your laptop has more RAM now.

What I wanted instead: a small piece of plumbing that both AIs can read and write to. Hand-offs become a tool call, not a copy-paste.

That's SessionVault.

The Workflow This Unlocks

sequenceDiagram
    participant U as You
    participant C as Claude Desktop
    participant SV as SessionVault
    participant Cur as Cursor
 
    U->>C: "Help me design the auth module"
    Note over C,U: Brainstorm, pick bcrypt + JWT
    U->>C: "save this session as 'auth-jwt-v1'"
    C->>SV: save_session({name, decisions, files, todos})
    SV-->>C: ok
    Note over U: Switch to Cursor
    U->>Cur: "load session 'auth-jwt-v1'"
    Cur->>SV: load_session({name:"auth-jwt-v1"})
    SV-->>Cur: full structured record
    U->>Cur: "implement what we planned"
    Note over Cur,U: Cursor builds with the full plan, no re-explaining

That's it. The handoff is one tool call in each direction.

Why this matters:

Without SessionVault	With SessionVault
Copy-paste 500–1500 tokens between tools	One tool call each side
Forget half the decisions	Structured fields persist verbatim
Lose the reasoning behind choices	`decisions` array survives the jump
Each new chat = blank slate	`search_sessions` finds related past work
Single-tool memory	Multi-tool shared context

The killer feature isn't memory. It's interop.

Honest caveat: which AIs work today

You need MCP support on both ends. As of mid-2026:

Claude Desktop — full MCP support
Cursor — full MCP support
ChatGPT (consumer) — not yet (use Claude or Cursor instead)
Anything custom via the OpenAI API — yes, MCP works via the SDKs

So today's real-world handoff is Claude ↔ Cursor. As more clients ship MCP, this gets bigger.

Architecture

┌──────────────────────────────────────────────────────────────┐
│                       Your Machine                            │
│                                                               │
│   Claude Desktop ──┐                       ┌── Cursor IDE     │
│                    │                       │                  │
│            stdio (MCP, JSON-RPC)   stdio (MCP, JSON-RPC)      │
│                    │                       │                  │
│                    └─────► SessionVault ◄──┘                  │
│                                  │                            │
│                             Mem0 OSS SDK                      │
│                ┌─────────────────┴──────────────────┐         │
│                ▼                                    ▼         │
│         LLM + Embeddings                      PostgreSQL      │
│         ┌──────────────┐                      + pgvector      │
│         │ LM Studio    │                      (Docker, :5433) │
│         │  :1234       │                                      │
│         │     OR       │                                      │
│         │ OpenAI API   │                                      │
│         └──────────────┘                                      │
└──────────────────────────────────────────────────────────────┘

Each AI client spawns its own SessionVault process over stdio. Both processes talk to the same Postgres instance. That's the shared bus — Postgres is the source of truth, the MCP servers are just thin per-client adapters.

Layer	Tech	Role
Integration	MCP (stdio, JSON-RPC)	Any MCP-aware client connects
Server	TypeScript + Zod	5 tools, validation, error hints
Memory	Mem0 OSS	Fact extraction + embedding
Inference	LM Studio or OpenAI	LLM + embeddings
Storage	PostgreSQL + pgvector	Vectors + metadata, always local
Tests	Vitest	15 unit tests, runs in 14 ms
CI	GitHub Actions	Build + test on Node 20/22

The Design Call That Made It Trustworthy: Dual Storage

The first version of load_session quietly returned wrong data.

You'd save auth-jwt-v1. Later, load_session("auth-jwt-v1") would call Mem0's semantic search internally — but Mem0 extracts atomic facts, not literal session text. The facts didn't contain the literal session name. So the search would return vaguely-similar facts from other sessions. No error. Just plausible-looking garbage.

That's the worst kind of bug. Especially when one AI just handed off "the plan" to another.

The fix: store every session twice.

                  saveSession(input)
                        │
        ┌───────────────┼───────────────┐
        ▼                               ▼
Layer 1: Verbatim raw record    Layer 2: LLM-extracted facts
• infer:false (no LLM)          • Mem0 runs the LLM
• Exact bytes in/out            • One row per atomic fact
• type:session_raw              • type:session_fact
• Source of truth for LOAD      • Powers SEMANTIC SEARCH
        │                               │
        └───────────────┬───────────────┘
                        ▼
              PostgreSQL + pgvector

// Layer 1: bytes-in, bytes-out — survives even if the LLM is down
await memory.add(JSON.stringify(record), {
  userId,
  metadata: rawMetadata(record),  // raw: JSON.stringify(record)
  infer: false,                    // skip LLM extraction entirely
});
 
// Layer 2: best-effort fact extraction for semantic search
try {
  const res = await memory.add(
    [{ role: "user", content: sessionText(input) }],
    { userId, metadata: factMetadata(input) }
  );
  factsExtracted = res?.results?.length ?? 0;
} catch {
  // facts are an enhancement; raw save above already succeeded
}

Why this is bulletproof:

load_session is now deterministic. It does getAll({filters: {session_name, type:raw}}) — pure metadata lookup. You get back exactly what you saved, or {found: false}. No silent wrong answers, ever. Critical when one AI is handing off to another.
search_sessions still gets focused facts — one fact per row makes semantic recall better than embedding huge blobs.
LLM down? Raw save still succeeds. Fact extraction is wrapped in try/catch. The verbatim record is the contract.

One extra row per save. Massive correctness win.

LM Studio vs OpenAI (Your Choice)

Set one env var: MEMORY_PROVIDER=lmstudio or openai.

	LM Studio (default)	OpenAI
Cost	Free (your hardware)	Pay per token
Privacy	Inference stays on-device	Text sent to OpenAI for extract/embed
Vectors	Local Postgres	Local Postgres (always)
Embed dims	768 (nomic-embed-text)	1536 (text-embedding-3-small)

Critical: chat models (Llama, Gemma) cannot embed. LM Studio needs a dedicated embedding model loaded alongside the chat model, or /v1/embeddings hangs forever. I learned that the hard way.

Switching providers? pnpm run db:reset — vector dimensions must match.

The 5 MCP Tools

Every MCP tool costs ~500 tokens in the host's context just by being registered. So each one earns its slot.

Tool	What it does
`save_session`	Dual-write: verbatim record + extracted facts. Re-save same name = overwrite
`load_session`	Deterministic exact-name lookup. `brief` / `normal` / `full` modes
`search_sessions`	Semantic search. Optional `max_tokens` cap and `repo` filter
`list_sessions`	Newest-first list, deduped, optional `repo` filter
`delete_session`	Removes raw record AND extracted facts

Handoff example: save in Claude, load in Cursor

In Claude Desktop:

User: "We just designed the auth module. Save this as auth-jwt-v1 for me."

Claude calls save_session:

{
  "name": "auth-jwt-v1",
  "repo": "my-app",
  "summary": "JWT auth + bcrypt; replacing passport.js",
  "decisions": ["jsonwebtoken over passport.js (lighter)", "bcrypt cost 12", "1h access token"],
  "files": ["src/auth.ts", "src/middleware.ts"],
  "todos": ["Add refresh token rotation"],
  "errors": []
}

Response: {"status":"saved","facts_extracted":4}

In Cursor (30 minutes later):

User: "Load session auth-jwt-v1 and implement what we planned."

Cursor calls load_session({name:"auth-jwt-v1", mode:"normal"}):

{
  "found": true,
  "session": {
    "name": "auth-jwt-v1",
    "decisions": ["jsonwebtoken over passport.js (lighter)", "bcrypt cost 12", ...],
    "files": ["src/auth.ts", "src/middleware.ts"],
    ...
  }
}

Cursor now has every decision, file, and TODO — verbatim from Claude. No copy-paste. No re-explaining.

Try It Yourself

~400 lines of TypeScript across 6 source files. Full docs in README.md and SYSTEM_DESIGN.md.

Prereqs

Node (20+ recommended), pnpm 9+, Docker
LM Studio with chat + embedding models, or OpenAI API key
Claude Desktop and/or Cursor (both speak MCP)

Setup

git clone <your-repo-url>
cd mcp_memoryserver
pnpm install && pnpm run build
cp .env.example .env
pnpm run db:up         # Postgres on :5433
pnpm test              # 15/15 in ~14 ms

Wire it into Claude Desktop AND Cursor

The exact same server can serve both clients — they just spawn separate processes. Add this entry to both config files:

Claude Desktop: ~/Library/Application Support/Claude/claude_desktop_config.json Cursor: ~/.cursor/mcp.json

{
  "mcpServers": {
    "sessionvault": {
      "command": "node",
      "args": ["/absolute/path/to/mcp_memoryserver/dist/index.js"],
      "env": {
        "MEMORY_PROVIDER": "lmstudio",
        "LMSTUDIO_BASE_URL": "http://localhost:1234/v1",
        "LMSTUDIO_LLM_MODEL": "your-chat-model-id",
        "LMSTUDIO_EMBED_MODEL": "your-embedding-model-id",
        "LMSTUDIO_EMBED_DIMS": "768",
        "POSTGRES_HOST": "127.0.0.1",
        "POSTGRES_PORT": "5433",
        "POSTGRES_USER": "sessionvault",
        "POSTGRES_PASSWORD": "sessionvault",
        "POSTGRES_DB": "sessionvault",
        "SESSIONVAULT_USER_ID": "developer"
      }
    }
  }
}

Restart both apps → SessionVault's 5 tools appear in each. Now you can save in one and load in the other.

The Takeaway

The exciting frontier in AI tooling isn't longer context windows. It's interop — letting different AIs share state so you can use the right tool for each step of your work.

A small shared bus + structured snapshots + deterministic load gets you most of the way there. MCP makes the wiring easy. pgvector + Mem0 make the storage cheap and local.

Plan with Claude. Build in Cursor. Skip the copy-paste.

That's the whole pitch.

Built with TypeScript, the Model Context Protocol, Mem0 OSS, PostgreSQL/pgvector, and LM Studio or OpenAI. Vector data stays on your machine.

Using a different MCP-aware client? Open an issue — I'd love to expand the compatibility list.

Most "AI memory" projects treat memory as a feature of one tool: Cursor with persistence, Claude with notebooks. That misses the actual pain.

The pain is handoff. You think with one AI, code with another, debug with a third. Every switch is a context bankruptcy:

Re-paste the plan
Re-paste the file list
Re-paste the decision and the reason
Re-paste what you already tried
Hope you didn't miss anything

This is the AI dev experience nobody talks about: manual context shipping between tools.

The industry's answer is "use a longer context window." That's like saying you don't need shared file storage because your laptop has more RAM now.

What I wanted instead: a small piece of plumbing that both AIs can read and write to. Hand-offs become a tool call, not a copy-paste.

That's SessionVault.

The Workflow This Unlocks

sequenceDiagram
    participant U as You
    participant C as Claude Desktop
    participant SV as SessionVault
    participant Cur as Cursor
 
    U->>C: "Help me design the auth module"
    Note over C,U: Brainstorm, pick bcrypt + JWT
    U->>C: "save this session as 'auth-jwt-v1'"
    C->>SV: save_session({name, decisions, files, todos})
    SV-->>C: ok
    Note over U: Switch to Cursor
    U->>Cur: "load session 'auth-jwt-v1'"
    Cur->>SV: load_session({name:"auth-jwt-v1"})
    SV-->>Cur: full structured record
    U->>Cur: "implement what we planned"
    Note over Cur,U: Cursor builds with the full plan, no re-explaining

That's it. The handoff is one tool call in each direction.

Why this matters:

Without SessionVault	With SessionVault
Copy-paste 500–1500 tokens between tools	One tool call each side
Forget half the decisions	Structured fields persist verbatim
Lose the reasoning behind choices	`decisions` array survives the jump
Each new chat = blank slate	`search_sessions` finds related past work
Single-tool memory	Multi-tool shared context

The killer feature isn't memory. It's interop.

Honest caveat: which AIs work today

You need MCP support on both ends. As of mid-2026:

Claude Desktop — full MCP support
Cursor — full MCP support
ChatGPT (consumer) — not yet (use Claude or Cursor instead)
Anything custom via the OpenAI API — yes, MCP works via the SDKs

So today's real-world handoff is Claude ↔ Cursor. As more clients ship MCP, this gets bigger.

Architecture

┌──────────────────────────────────────────────────────────────┐
│                       Your Machine                            │
│                                                               │
│   Claude Desktop ──┐                       ┌── Cursor IDE     │
│                    │                       │                  │
│            stdio (MCP, JSON-RPC)   stdio (MCP, JSON-RPC)      │
│                    │                       │                  │
│                    └─────► SessionVault ◄──┘                  │
│                                  │                            │
│                             Mem0 OSS SDK                      │
│                ┌─────────────────┴──────────────────┐         │
│                ▼                                    ▼         │
│         LLM + Embeddings                      PostgreSQL      │
│         ┌──────────────┐                      + pgvector      │
│         │ LM Studio    │                      (Docker, :5433) │
│         │  :1234       │                                      │
│         │     OR       │                                      │
│         │ OpenAI API   │                                      │
│         └──────────────┘                                      │
└──────────────────────────────────────────────────────────────┘

Layer	Tech	Role
Integration	MCP (stdio, JSON-RPC)	Any MCP-aware client connects
Server	TypeScript + Zod	5 tools, validation, error hints
Memory	Mem0 OSS	Fact extraction + embedding
Inference	LM Studio or OpenAI	LLM + embeddings
Storage	PostgreSQL + pgvector	Vectors + metadata, always local
Tests	Vitest	15 unit tests, runs in 14 ms
CI	GitHub Actions	Build + test on Node 20/22

The Design Call That Made It Trustworthy: Dual Storage

The first version of load_session quietly returned wrong data.

That's the worst kind of bug. Especially when one AI just handed off "the plan" to another.

The fix: store every session twice.

                  saveSession(input)
                        │
        ┌───────────────┼───────────────┐
        ▼                               ▼
Layer 1: Verbatim raw record    Layer 2: LLM-extracted facts
• infer:false (no LLM)          • Mem0 runs the LLM
• Exact bytes in/out            • One row per atomic fact
• type:session_raw              • type:session_fact
• Source of truth for LOAD      • Powers SEMANTIC SEARCH
        │                               │
        └───────────────┬───────────────┘
                        ▼
              PostgreSQL + pgvector

// Layer 1: bytes-in, bytes-out — survives even if the LLM is down
await memory.add(JSON.stringify(record), {
  userId,
  metadata: rawMetadata(record),  // raw: JSON.stringify(record)
  infer: false,                    // skip LLM extraction entirely
});
 
// Layer 2: best-effort fact extraction for semantic search
try {
  const res = await memory.add(
    [{ role: "user", content: sessionText(input) }],
    { userId, metadata: factMetadata(input) }
  );
  factsExtracted = res?.results?.length ?? 0;
} catch {
  // facts are an enhancement; raw save above already succeeded
}

Why this is bulletproof:

load_session is now deterministic. It does getAll({filters: {session_name, type:raw}}) — pure metadata lookup. You get back exactly what you saved, or {found: false}. No silent wrong answers, ever. Critical when one AI is handing off to another.
search_sessions still gets focused facts — one fact per row makes semantic recall better than embedding huge blobs.
LLM down? Raw save still succeeds. Fact extraction is wrapped in try/catch. The verbatim record is the contract.

One extra row per save. Massive correctness win.

LM Studio vs OpenAI (Your Choice)

Set one env var: MEMORY_PROVIDER=lmstudio or openai.

	LM Studio (default)	OpenAI
Cost	Free (your hardware)	Pay per token
Privacy	Inference stays on-device	Text sent to OpenAI for extract/embed
Vectors	Local Postgres	Local Postgres (always)
Embed dims	768 (nomic-embed-text)	1536 (text-embedding-3-small)

Switching providers? pnpm run db:reset — vector dimensions must match.

The 5 MCP Tools

Every MCP tool costs ~500 tokens in the host's context just by being registered. So each one earns its slot.

Tool	What it does
`save_session`	Dual-write: verbatim record + extracted facts. Re-save same name = overwrite
`load_session`	Deterministic exact-name lookup. `brief` / `normal` / `full` modes
`search_sessions`	Semantic search. Optional `max_tokens` cap and `repo` filter
`list_sessions`	Newest-first list, deduped, optional `repo` filter
`delete_session`	Removes raw record AND extracted facts

Handoff example: save in Claude, load in Cursor

In Claude Desktop:

User: "We just designed the auth module. Save this as auth-jwt-v1 for me."

Claude calls save_session:

{
  "name": "auth-jwt-v1",
  "repo": "my-app",
  "summary": "JWT auth + bcrypt; replacing passport.js",
  "decisions": ["jsonwebtoken over passport.js (lighter)", "bcrypt cost 12", "1h access token"],
  "files": ["src/auth.ts", "src/middleware.ts"],
  "todos": ["Add refresh token rotation"],
  "errors": []
}

Response: {"status":"saved","facts_extracted":4}

In Cursor (30 minutes later):

User: "Load session auth-jwt-v1 and implement what we planned."

Cursor calls load_session({name:"auth-jwt-v1", mode:"normal"}):

{
  "found": true,
  "session": {
    "name": "auth-jwt-v1",
    "decisions": ["jsonwebtoken over passport.js (lighter)", "bcrypt cost 12", ...],
    "files": ["src/auth.ts", "src/middleware.ts"],
    ...
  }
}

Cursor now has every decision, file, and TODO — verbatim from Claude. No copy-paste. No re-explaining.

Try It Yourself

~400 lines of TypeScript across 6 source files. Full docs in README.md and SYSTEM_DESIGN.md.

Prereqs

Node (20+ recommended), pnpm 9+, Docker
LM Studio with chat + embedding models, or OpenAI API key
Claude Desktop and/or Cursor (both speak MCP)

Setup

git clone <your-repo-url>
cd mcp_memoryserver
pnpm install && pnpm run build
cp .env.example .env
pnpm run db:up         # Postgres on :5433
pnpm test              # 15/15 in ~14 ms

Wire it into Claude Desktop AND Cursor

The exact same server can serve both clients — they just spawn separate processes. Add this entry to both config files:

Claude Desktop: ~/Library/Application Support/Claude/claude_desktop_config.json Cursor: ~/.cursor/mcp.json

{
  "mcpServers": {
    "sessionvault": {
      "command": "node",
      "args": ["/absolute/path/to/mcp_memoryserver/dist/index.js"],
      "env": {
        "MEMORY_PROVIDER": "lmstudio",
        "LMSTUDIO_BASE_URL": "http://localhost:1234/v1",
        "LMSTUDIO_LLM_MODEL": "your-chat-model-id",
        "LMSTUDIO_EMBED_MODEL": "your-embedding-model-id",
        "LMSTUDIO_EMBED_DIMS": "768",
        "POSTGRES_HOST": "127.0.0.1",
        "POSTGRES_PORT": "5433",
        "POSTGRES_USER": "sessionvault",
        "POSTGRES_PASSWORD": "sessionvault",
        "POSTGRES_DB": "sessionvault",
        "SESSIONVAULT_USER_ID": "developer"
      }
    }
  }
}

Restart both apps → SessionVault's 5 tools appear in each. Now you can save in one and load in the other.

The Takeaway

The exciting frontier in AI tooling isn't longer context windows. It's interop — letting different AIs share state so you can use the right tool for each step of your work.

A small shared bus + structured snapshots + deterministic load gets you most of the way there. MCP makes the wiring easy. pgvector + Mem0 make the storage cheap and local.

Plan with Claude. Build in Cursor. Skip the copy-paste.

That's the whole pitch.

Built with TypeScript, the Model Context Protocol, Mem0 OSS, PostgreSQL/pgvector, and LM Studio or OpenAI. Vector data stays on your machine.

Using a different MCP-aware client? Open an issue — I'd love to expand the compatibility list.

How I gave my AI tools a shared memory using MCP and pgvector

The Workflow This Unlocks

Honest caveat: which AIs work today

Architecture

The Design Call That Made It Trustworthy: Dual Storage

LM Studio vs OpenAI (Your Choice)

The 5 MCP Tools

Handoff example: save in Claude, load in Cursor

Try It Yourself

Prereqs

Setup

Wire it into Claude Desktop AND Cursor

The Takeaway

On this page

How I gave my AI tools a shared memory using MCP and pgvector

The Workflow This Unlocks

Honest caveat: which AIs work today

Architecture

The Design Call That Made It Trustworthy: Dual Storage

LM Studio vs OpenAI (Your Choice)

The 5 MCP Tools

Handoff example: save in Claude, load in Cursor

Try It Yourself

Prereqs

Setup

Wire it into Claude Desktop AND Cursor

The Takeaway

On this page