novamem

The problem

LLMs forget. RAG isolates. Each agent reinvents.

A 200K-token context isn't memory — it's a goldfish bowl that resets every session. The usual workaround, "stuff a vector DB behind it," misses everything you actually want from memory. And the moment you have more than one agent, every one of them ends up with its own private notes.

01 / context limit

Sessions lose state

Every new conversation rebuilds the agent's worldview from scratch. Decisions, preferences, ongoing projects — all evaporate at session end.

02 / vector-only

Pure vector search misses literals

Cosine similarity is great for "fuzzy" recall and useless for exact ids, function names, or hashes. You need keyword + vector + graph fused.

03 / fragmented agents

Each agent has its own brain

Ask Claude on Monday, Cursor on Tuesday, ChatGPT on Wednesday. Each builds its own siloed notes. Switch tools and the context is gone — same project, same you, three brains.

04 / no isolation

One bucket for the whole team

Most "memory" tools assume one user. Real teams need per-user isolation with deliberate sharing — sub-brains for projects, a private store for everything else.

One brain. Every agent.

Different agents. Same memory.

The agent landscape moves fast. Whatever you and your team use this quarter — and whatever you switch to next quarter — they all read and write the same novamem store. No re-onboarding. No silos. One canonical place where everything goes.

CC

Claude Code

MCP

CD

Claude Desktop

MCP stdio

CG

ChatGPT

HTTP / GPT

CR

Cursor

MCP

CL

Cline

MCP

CO

Continue

MCP

OC

OpenCode

MCP

KC

Kilo Code

MCP

RC

RooCode

MCP

CX

Codex

MCP

SDK

Agent SDKs

HTTP

+

Anything else

REST + MCP

Why it matters

Claude on Monday, ChatGPT on Tuesday — the project keeps going.

Tell Claude Code that the deploy target is k3s on 192.168.10.248. Switch to Cursor on a different machine — the same fact is there, retrievable by hybrid search, automatically scoped to the project. Switch to ChatGPT through the HTTP API for an architecture review — it sees every decision the others made. One canonical memory. Not three.

Usage

The mental model — and the seven tools.

Every entry belongs to a single user. An entry can additionally belong to a project — a sub-brain that's shareable. User-global entries are private; project entries are visible to every member of that project. Every search runs three signals in parallel and fuses them into one ranked list.

memory_search

Hybrid retrieval. Override weights for keyword-only / vector-only / graph-heavy. Fans out across populated namespaces by default.

memory_remember

Write a new entry with optional namespace, sourceType, capturedFrom, confidence. Worthiness gate + SHA dedup are applied automatically.

memory_recent / memory_today

Newest-first feed of entries (recent: any window via since; today: last 24 h). Useful for "what did the agent learn today" digests.

memory_neighbors

Walk graph edges from a seed entry to its strongly-linked neighbours. Depth 1, 2, or 3 — adjacent context for free.

memory_update / memory_forget

In-place rewrite (preserves id + edges + hits, re-embeds content) and explicit deletion. Idempotent — second-call forget returns deleted:false.

project_*

Create / list / activate / share / unshare / delete sub-brains. Active project mode unions the project with your private store on every search.

Full usage guide → covers worthiness gates, decay maths, dream cycle, namespaces, and weight tuning.

What you get

Batteries included. No vendor SaaS.

One docker compose up -d brings the whole stack online. No external services. No proprietary embedding API. No hidden per-token charges.

Hybrid retrieval

Keyword + vector + graph fused per query. Adjustable weights. The graph signal alone is unique to novamem — it surfaces "what was related" to your hit.

Projects = sub-brains

Carve out a project, share it with teammates. Memory stays isolated by default; sharing is explicit and revocable. Active-project mode unions the project with your private store.

Synaptic decay

Old, unused entries demote to cold. Frequently-hit ones re-promote. The math (7 · log₂(hits+1)) is tunable per-tenant.

MCP + HTTP, both first-class

Model Context Protocol via SSE and stdio for Claude Code, Claude Desktop, Cursor, OpenCode, Cline, Continue. Plain JSON HTTP for everything else.

Built-in dashboard

Sign in, mint tokens, browse memories, watch the graph, monitor health and per-token throughput. No separate Grafana to wire up.

Pluggable embeddings

Local @xenova/transformers by default — no API keys, runs on CPU. Swap in any OpenAI-compatible endpoint with a single env var.

How it works

Three retrieval signals, fused. One coherent answer.

Every search runs keyword (FTS), vector (cosine), and graph (neighbour traversal) in parallel. Results fuse via min-max-normalised weighted scoring with sensible defaults you can override per call.

1

Remember

Write a memory entry. A worthiness gate rejects conversational filler; a SHA-256 dedupe path returns the existing id for exact duplicates. The entry lands in warm + cold + graph atomically.

2

Search hybrid

One query fans out to all three indexes. Results are fused with weighted scoring. Override weights per call — `{keyword:1, vector:0}` for exact-id lookups, `{vector:1}` to lean fully semantic.

3

Decay & promote

Entries decay on a synaptic schedule — effectiveDays = 7 · log₂(hits + 1). Hits in cold reactively promote back to warm. A nightly dream cycle compacts duplicates and promotes shared neighbours.

4

Isolate & share

Every entry is per-user by default. Create a project to carve out a sub-brain; share it by adding members. Memory crosses user boundaries only through explicit project membership.

Three tiers, one query

What each tier does — concretely.

A simple example. Imagine you've been remembering project-notes for weeks. Today you ask your agent: "How did we end up choosing Postgres for the main store?" Here's what each tier contributes — and why fusing all three beats any one alone.

Q

Your question

"How did we end up choosing Postgres for the main store?"

Warm tier · Postgres FTS

Exact & recent

Best for: literal terms, recent entries, fast keyword recall.

Full-text-indexed memory entries that are active and frequently used. Tokenises your query (postgres, main store) and returns rows where those words appear verbatim.

→ matches
"ADR-021 — Postgres for main store. Decided 2026-02-14 because of MVCC + extensibility (pgvector, FTS, jsonb)."

Cold tier · Qdrant vectors

Semantic recall

Best for: "remind me of related things even if I worded it differently."

Embedding-based recall over older entries that have decayed off the warm tier. Doesn't care about literal words — it understands the meaning.

→ matches
"We rejected SQLite back in Jan because we needed concurrent writes and FTS5 wasn't enough for our query shapes."

Graph tier · FalkorDB edges

Adjacent context

Best for: "what was related to this decision that I forgot to ask about?"

Walks edges from any matching entry to its strongly-linked neighbours. Surfaces the supporting context that lives around a hit.

→ pulls in
"ADR-024 — pgvector for embeddings, same Postgres instance" + "Cost analysis: managed Postgres on Hetzner = €40/mo"

All three signals run in parallel and fuse into one ranked list. The warm hit gets you the literal ADR. The cold hit pulls in the prior reasoning even though it never said "Postgres". The graph hit ties the decision to its supporting cost analysis. Your agent answers with all of it — not just the one tier that happened to match.

Install

Stand it up. Three paths.

Pick whichever fits your environment. All three lead to the same server image, the same dashboard, the same MCP surface.

Docker Compose · recommended

# clone, set 3 secrets, up
git clone https://github.com/azrtydxb/novamem.git
cd novamem && cp .env.example .env

echo "POSTGRES_PASSWORD=$(openssl rand -base64 24)" >> .env
echo "NOVAMEM_BOOTSTRAP_ADMIN_PASSWORD=$(openssl rand -base64 24)" >> .env
echo "NOVAMEM_COOKIE_SECRET=$(openssl rand -hex 32)" >> .env

docker compose up -d
# http://localhost:7778/admin

Single-host. ~30 s. Full walkthrough →

Manual

# bring your own Postgres + Qdrant + FalkorDB
# prereqs: Node 20+, pnpm 9+
git clone https://github.com/azrtydxb/novamem.git
cd novamem
pnpm install && pnpm build

cp .env.example .env
# point at your existing datastores
pnpm --filter @azrtydxb/novamem-server start

Local dev / custom stack. Full walkthrough →

Kubernetes

# multi-arch image on ghcr.io
# manifests in deploy/k8s/
git clone https://github.com/azrtydxb/novamem.git
cd novamem/deploy/k8s

# edit secrets.yaml + ingress.yaml host
# then:
kubectl apply -k .
kubectl -n novamem rollout status deploy/novamem

HA · multi-tenant · enterprise. Full walkthrough →

Connect

Wire every AI host on your machine. One command.

@azrtydxb/novamem-init detects 30+ supported hosts, asks for your server URL + dashboard credentials, mints a fresh bearer, and writes the config each host expects. Idempotent — won't clobber existing entries.

one command, every host

npx -y @azrtydxb/novamem-init

# detects: Claude Code · Claude Desktop · ChatGPT (via HTTP)
#          Cursor · OpenCode · Codex CLI · Cline · Continue
#          Kilo Code · RooCode · Gemini CLI · Copilot · Windsurf
#          Factory · Amazon Q · & ~16 skill-only hosts

Prefer manual setup? Per-host walkthroughs: Claude Code · Claude Desktop · Cursor · Kilo Code · Other hosts & skills

Solo to enterprise

Same product, three deploys.

Same code. Same MCP surface. Same dashboard. The only thing that changes between a personal laptop and a 5,000-engineer company is how you stand it up. Multi-tenancy and project-based sub-brains are first-class from day one.

Stage 01

Solo developer

docker compose up -d on your laptop or homelab. One user. Private memory across every AI host on your machine. Zero SaaS dependency, zero per-token cost.

Stage 02

Small team

One server, Postgres + Qdrant + FalkorDB on the same host. Each teammate signs in with their own account, mints their own bearer, gets their own private memory. Share a project, and that sub-brain becomes a team workspace — every member's agents see and contribute to the same notes.

Stage 03 · enterprise

Multi-tenant on Kubernetes

One central deployment serves the whole company. Tenants partition organisations, departments, or product lines. Users sit inside tenants. Projects are the cross-team unit of sharing. Standard k8s patterns: HA Postgres, Qdrant cluster, FalkorDB Sentinel, ingress with mTLS, OTel exports to your existing Tempo/Loki/Prometheus stack.

API · architecture · security

Specs & references.

Everything you need to operate, integrate, or audit. The OpenAPI spec is generated from the same Zod schemas the server uses at runtime — so it's always accurate.

OpenAPI 3.0

Generated spec at docs/api/openapi.json + human-readable reference. Live Swagger UI at /api-docs on your deployment.

Architecture diagrams

docs/architecture.md — system shape, data flow, the engine layer, mermaid diagrams of search and remember paths.

Security model

SECURITY.md — auth flows (Better Auth sessions, tenant bearers), RBAC, hardening checklist for production deploys.

Packages & releases

Per-package versioning via Changesets; see GitHub Releases. Server image at ghcr.io/azrtydxb/novamem.

Why this stack

The decisions, on the table.

novamem is opinionated about a few things — and the rest is yours.

Self-hostable, zero SaaS dependency

Postgres, Qdrant, FalkorDB. All open source. Run on your laptop, your homelab, your cluster.

Worthiness gate at write

Hard rules drop "thanks", "ok", and 12-char filler. SHA-256 dedupe collapses exact duplicates within a scope. Your memory store stays signal.

Provenance on every entry

sourceType, capturedFrom, confidence — so you can filter "what did Claude infer" from "what was directly observed".

Apache 2.0, no telemetry

Use commercially. Fork it. We don't phone home. The project ships with a SECURITY.md and an audited auth model.

Give your agents a memory.

Self-host in a minute. Wire every AI host on your machine in one npx command.

Open on GitHub Read the docs

One memory across every AI agent you use.