One brain. Every agent. Solo to enterprise.

One memory across every AI agent you use.

Claude, ChatGPT, Cursor, Cline, Continue, Kilo Code — whatever your team picks today and switches to tomorrow, they all read and write the same memory store. Hybrid keyword + vector + graph retrieval. Self-hostable on a laptop or as a multi-tenant brain for a whole company.

Warm tier
Postgres FTS
Active entries, full-text indexed, milliseconds to retrieve. Hot-path queries hit here first.
Cold tier
Qdrant vectors
Demoted entries kept as embeddings. Reactive promotion back to warm on relevant search hit.
Relations
FalkorDB graph
Edges between related memories. Surfaces neighbours of a hit so adjacent context comes for free.
novamem dashboard — metrics page showing per-token throughput, tier hit rates, store sizes

Built-in dashboard · live throughput · per-token usage · 24h history · tier hit rates

The problem

LLMs forget. RAG isolates. Each agent reinvents.

A 200K-token context isn't memory — it's a goldfish bowl that resets every session. The usual workaround, "stuff a vector DB behind it," misses everything you actually want from memory. And the moment you have more than one agent, every one of them ends up with its own private notes.

01 / context limit
Sessions lose state
Every new conversation rebuilds the agent's worldview from scratch. Decisions, preferences, ongoing projects — all evaporate at session end.
02 / vector-only
Pure vector search misses literals
Cosine similarity is great for "fuzzy" recall and useless for exact ids, function names, or hashes. You need keyword + vector + graph fused.
03 / fragmented agents
Each agent has its own brain
Ask Claude on Monday, Cursor on Tuesday, ChatGPT on Wednesday. Each builds its own siloed notes. Switch tools and the context is gone — same project, same you, three brains.
04 / no isolation
One bucket for the whole team
Most "memory" tools assume one user. Real teams need per-user isolation with deliberate sharing — sub-brains for projects, a private store for everything else.
One brain. Every agent.

Different agents. Same memory.

The agent landscape moves fast. Whatever you and your team use this quarter — and whatever you switch to next quarter — they all read and write the same novamem store. No re-onboarding. No silos. One canonical place where everything goes.

CC
Claude Code
MCP
CD
Claude Desktop
MCP stdio
CG
ChatGPT
HTTP / GPT
CR
Cursor
MCP
CL
Cline
MCP
CO
Continue
MCP
OC
OpenCode
MCP
KC
Kilo Code
MCP
RC
RooCode
MCP
CX
Codex
MCP
SDK
Agent SDKs
HTTP
+
Anything else
REST + MCP
Why it matters

Claude on Monday, ChatGPT on Tuesday — the project keeps going.

Tell Claude Code that the deploy target is k3s on 192.168.10.248. Switch to Cursor on a different machine — the same fact is there, retrievable by hybrid search, automatically scoped to the project. Switch to ChatGPT through the HTTP API for an architecture review — it sees every decision the others made. One canonical memory. Not three.

Usage

The mental model — and the seven tools.

Every entry belongs to a single user. An entry can additionally belong to a project — a sub-brain that's shareable. User-global entries are private; project entries are visible to every member of that project. Every search runs three signals in parallel and fuses them into one ranked list.

memory_search
Hybrid retrieval. Override weights for keyword-only / vector-only / graph-heavy. Fans out across populated namespaces by default.
memory_remember
Write a new entry with optional namespace, sourceType, capturedFrom, confidence. Worthiness gate + SHA dedup are applied automatically.
memory_recent / memory_today
Newest-first feed of entries (recent: any window via since; today: last 24 h). Useful for "what did the agent learn today" digests.
memory_neighbors
Walk graph edges from a seed entry to its strongly-linked neighbours. Depth 1, 2, or 3 — adjacent context for free.
memory_update / memory_forget
In-place rewrite (preserves id + edges + hits, re-embeds content) and explicit deletion. Idempotent — second-call forget returns deleted:false.
project_*
Create / list / activate / share / unshare / delete sub-brains. Active project mode unions the project with your private store on every search.

Full usage guide → covers worthiness gates, decay maths, dream cycle, namespaces, and weight tuning.

What you get

Batteries included. No vendor SaaS.

One docker compose up -d brings the whole stack online. No external services. No proprietary embedding API. No hidden per-token charges.

Hybrid retrieval
Keyword + vector + graph fused per query. Adjustable weights. The graph signal alone is unique to novamem — it surfaces "what was related" to your hit.
Projects = sub-brains
Carve out a project, share it with teammates. Memory stays isolated by default; sharing is explicit and revocable. Active-project mode unions the project with your private store.
Synaptic decay
Old, unused entries demote to cold. Frequently-hit ones re-promote. The math (7 · log₂(hits+1)) is tunable per-tenant.
MCP + HTTP, both first-class
Model Context Protocol via SSE and stdio for Claude Code, Claude Desktop, Cursor, OpenCode, Cline, Continue. Plain JSON HTTP for everything else.
Built-in dashboard
Sign in, mint tokens, browse memories, watch the graph, monitor health and per-token throughput. No separate Grafana to wire up.
Pluggable embeddings
Local @xenova/transformers by default — no API keys, runs on CPU. Swap in any OpenAI-compatible endpoint with a single env var.
How it works

Three retrieval signals, fused. One coherent answer.

Every search runs keyword (FTS), vector (cosine), and graph (neighbour traversal) in parallel. Results fuse via min-max-normalised weighted scoring with sensible defaults you can override per call.

query keyword (FTS) postgres vector qdrant graph neighbours falkordb fuse
1

Remember

Write a memory entry. A worthiness gate rejects conversational filler; a SHA-256 dedupe path returns the existing id for exact duplicates. The entry lands in warm + cold + graph atomically.

2

Search hybrid

One query fans out to all three indexes. Results are fused with weighted scoring. Override weights per call — `{keyword:1, vector:0}` for exact-id lookups, `{vector:1}` to lean fully semantic.

3

Decay & promote

Entries decay on a synaptic schedule — effectiveDays = 7 · log₂(hits + 1). Hits in cold reactively promote back to warm. A nightly dream cycle compacts duplicates and promotes shared neighbours.

4

Isolate & share

Every entry is per-user by default. Create a project to carve out a sub-brain; share it by adding members. Memory crosses user boundaries only through explicit project membership.

Three tiers, one query

What each tier does — concretely.

A simple example. Imagine you've been remembering project-notes for weeks. Today you ask your agent: "How did we end up choosing Postgres for the main store?" Here's what each tier contributes — and why fusing all three beats any one alone.

Q
Your question
"How did we end up choosing Postgres for the main store?"
Warm tier · Postgres FTS
Exact & recent
Best for: literal terms, recent entries, fast keyword recall.
Full-text-indexed memory entries that are active and frequently used. Tokenises your query (postgres, main store) and returns rows where those words appear verbatim.
→ matches
"ADR-021 — Postgres for main store. Decided 2026-02-14 because of MVCC + extensibility (pgvector, FTS, jsonb)."
Cold tier · Qdrant vectors
Semantic recall
Best for: "remind me of related things even if I worded it differently."
Embedding-based recall over older entries that have decayed off the warm tier. Doesn't care about literal words — it understands the meaning.
→ matches
"We rejected SQLite back in Jan because we needed concurrent writes and FTS5 wasn't enough for our query shapes."
Graph tier · FalkorDB edges
Adjacent context
Best for: "what was related to this decision that I forgot to ask about?"
Walks edges from any matching entry to its strongly-linked neighbours. Surfaces the supporting context that lives around a hit.
→ pulls in
"ADR-024 — pgvector for embeddings, same Postgres instance" + "Cost analysis: managed Postgres on Hetzner = €40/mo"

All three signals run in parallel and fuse into one ranked list. The warm hit gets you the literal ADR. The cold hit pulls in the prior reasoning even though it never said "Postgres". The graph hit ties the decision to its supporting cost analysis. Your agent answers with all of it — not just the one tier that happened to match.

Install

Stand it up. Three paths.

Pick whichever fits your environment. All three lead to the same server image, the same dashboard, the same MCP surface.

Docker Compose · recommended
# clone, set 3 secrets, up
git clone https://github.com/azrtydxb/novamem.git
cd novamem && cp .env.example .env

echo "POSTGRES_PASSWORD=$(openssl rand -base64 24)" >> .env
echo "NOVAMEM_BOOTSTRAP_ADMIN_PASSWORD=$(openssl rand -base64 24)" >> .env
echo "NOVAMEM_COOKIE_SECRET=$(openssl rand -hex 32)" >> .env

docker compose up -d
# http://localhost:7778/admin
Single-host. ~30 s. Full walkthrough →
Manual
# bring your own Postgres + Qdrant + FalkorDB
# prereqs: Node 20+, pnpm 9+
git clone https://github.com/azrtydxb/novamem.git
cd novamem
pnpm install && pnpm build

cp .env.example .env
# point at your existing datastores
pnpm --filter @azrtydxb/novamem-server start
Local dev / custom stack. Full walkthrough →
Kubernetes
# multi-arch image on ghcr.io
# manifests in deploy/k8s/
git clone https://github.com/azrtydxb/novamem.git
cd novamem/deploy/k8s

# edit secrets.yaml + ingress.yaml host
# then:
kubectl apply -k .
kubectl -n novamem rollout status deploy/novamem
HA · multi-tenant · enterprise. Full walkthrough →
Connect

Wire every AI host on your machine. One command.

@azrtydxb/novamem-init detects 30+ supported hosts, asks for your server URL + dashboard credentials, mints a fresh bearer, and writes the config each host expects. Idempotent — won't clobber existing entries.

one command, every host
npx -y @azrtydxb/novamem-init

# detects: Claude Code · Claude Desktop · ChatGPT (via HTTP)
#          Cursor · OpenCode · Codex CLI · Cline · Continue
#          Kilo Code · RooCode · Gemini CLI · Copilot · Windsurf
#          Factory · Amazon Q · & ~16 skill-only hosts

Prefer manual setup? Per-host walkthroughs: Claude Code · Claude Desktop · Cursor · Kilo Code · Other hosts & skills

Solo to enterprise

Same product, three deploys.

Same code. Same MCP surface. Same dashboard. The only thing that changes between a personal laptop and a 5,000-engineer company is how you stand it up. Multi-tenancy and project-based sub-brains are first-class from day one.

Stage 01
Solo developer
docker compose up -d on your laptop or homelab. One user. Private memory across every AI host on your machine. Zero SaaS dependency, zero per-token cost.
Stage 02
Small team
One server, Postgres + Qdrant + FalkorDB on the same host. Each teammate signs in with their own account, mints their own bearer, gets their own private memory. Share a project, and that sub-brain becomes a team workspace — every member's agents see and contribute to the same notes.
API · architecture · security

Specs & references.

Everything you need to operate, integrate, or audit. The OpenAPI spec is generated from the same Zod schemas the server uses at runtime — so it's always accurate.

OpenAPI 3.0
Generated spec at docs/api/openapi.json + human-readable reference. Live Swagger UI at /api-docs on your deployment.
Architecture diagrams
docs/architecture.md — system shape, data flow, the engine layer, mermaid diagrams of search and remember paths.
Security model
SECURITY.md — auth flows (Better Auth sessions, tenant bearers), RBAC, hardening checklist for production deploys.
Packages & releases
Per-package versioning via Changesets; see GitHub Releases. Server image at ghcr.io/azrtydxb/novamem.
Why this stack

The decisions, on the table.

novamem is opinionated about a few things — and the rest is yours.

Self-hostable, zero SaaS dependency
Postgres, Qdrant, FalkorDB. All open source. Run on your laptop, your homelab, your cluster.
Worthiness gate at write
Hard rules drop "thanks", "ok", and 12-char filler. SHA-256 dedupe collapses exact duplicates within a scope. Your memory store stays signal.
Provenance on every entry
sourceType, capturedFrom, confidence — so you can filter "what did Claude infer" from "what was directly observed".
Apache 2.0, no telemetry
Use commercially. Fork it. We don't phone home. The project ships with a SECURITY.md and an audited auth model.

Give your agents a memory.

Self-host in a minute. Wire every AI host on your machine in one npx command.