Memory infrastructure for AI agents · B2D

Ai agents that actually remember

Six memory types. One REST API. Connect in minutes — no vector pipelines, chunking, or knowledge graphs to build from scratch.

Free tier · 2 instances · 100K tokens/month · No credit card

ingest + query

curl -X POST https://mnemoniqa.com/api/v1/instances/inst_abc/ingest \
-H “Authorization: Bearer ms_live_xxx” \
-d ‘{“content”: “User prefers brief answers
with code examples.”}’

curl -X POST https://mnemoniqa.com/api/v1/instances/inst_abc/query \
-H “Authorization: Bearer ms_live_xxx” \
-d ‘{“query”: “What do we know about this
user?”, “user_id”: “user_123”}’

• Citations in every answer

• 202 async ingest

• user_id scoping

The problem

Every agent team rebuilds the same memory stack

Months on infrastructure

Vector DBs, chunking, embedding pipelines, cache invalidation — instead of product work.

Retrieval ≠ memory

Semantic search finds similar chunks. Memory answers: what does the agent know, how did it change, and where did it come from?

One-off solutions

Each startup invents its own knowledge base, graph, and timeline — without audit trails or provenance.

Wrong model for extraction

Using expensive models for extraction is a common mistake. Structured pipelines with the right model tier cut cost and improve quality.

Memory types

Six memory types. Pick what your agent needs.

Each type is a managed instance with ingest, query, health metrics, and full source lineage.

RAG

Vector search over documents with hierarchical clustering and citations to sources.

Wiki

Compiling knowledge base with revisable concepts, cross-links, contradiction detection, bi-temporal facts.

Episodic

Chronological conversation memory with decay — old episodes fade naturally.

Working

Short-term key-value memory with TTL for the current task or session.

Graph

Entities and typed relationships with ontology, invalidation, and repair queue.

Reflective

Behavioral, emotional, and motivational patterns extracted from long interaction history.

Compare all memory types →

How you build

Standalone, unified Agent, or self-editing memory

Standalone

One type, one instance

Perfect for FAQ bots (RAG only) or simple session state (Working only).

POST /instances/:id/query

Agent · Unified

Up to 6 layers, one query

Combine RAG + Episodic + Working + Wiki + Graph + Reflective in a single query with weighted merge and synthesis.

Working

Reflective

Episodic

RAG

WIKI

Graph

Self-editing

The agent edits itself

Manages its own memory via tool calls during conversation: core memory, recall search, archival insert/update.

core · recall · archival

Get started

From zero to remembering agent in three steps

Create a memory instance

Choose a type (or build an Agent with multiple layers) in the dashboard. Get an API key.

Ingest your data

Upload docs, conversation logs, or structured facts. Async by default — returns task_id, progress via webhooks.

Connect your agent

REST API, Python/TypeScript SDK, or MCP for Claude Desktop / Cursor. Scope by user_id and session_id.

Why Mnemoniqa

Memory with provenance, not narrative generation

Full lineage

Every claim traces: source → segment → concept → answer. Responses include citations [Concept:ID] or [Source:UUID]. Not “text from similar chunks” — verified knowledge.

source→ segment → concept → answer

“Cancellation is self-service in Settings.” [Concept:c_9a1] [Source:doc_44]

Fast model
Structured extraction
low reasoning · high volume

Smart model
Synthesis & gardening
high reasoning · low volume

−48% cost · +36% quality vs. single-model extraction

Split extraction pipeline

Fast model for structured extraction, smart model for synthesis and gardening. Lower cost, higher quality on concept-heavy memory (Wiki, Graph, Reflective).

Gardener — memory that maintains itself

Phase 0: cheap model proposes merges and splits. Phase 1: smart model applies surgical fixes. Proposals await your review — no silent auto-merge.

PHASE 0 Cheap model proposes merges & splits → queued for review

PHASE 1 Smart model applies surgical refactoring on approval

3 proposals pending · 0 auto-applied

fact plan = “Pro”
valid_time 2026-01-04 → now
system_time 2026-01-04 09:12 UTC
query as-of 2025-12-01 → plan = “Starter”

Bi-temporal facts

Facts store valid time (when true in the world) and system time (when recorded). Query what the agent “knew” at any point in the past — critical for audit and compliance.

Use cases

Built for real agent products

🎮

Game / NPC

NPCs that remember the player across sessions.

Working

Reflective

Episodic

🧠

Personal coach

Grows with the user — goals, history, patterns.

Working

Reflective

Episodic

🎧

Support bot

Knows your product + each customer’s ticket history.

Working

Episodic

RAG

📚

Docs / KB bot

Answers with citations; concepts evolve as docs change.

Working

RAG

🎓

EdTech tutor

Remembers what each student learned and where they struggled.

Reflective

Episodic

RAG

💼

Sales / CRM agent

Every touchpoint, objection, and deal stage per contact.

Working

Episodic

Graph

🔬

Research assistant

Indexes papers, extracts concepts, finds cross-source links.

RAG

Graph

WIKI

🤝

HR onboarding

Company policies + per-employee onboarding progress.

Episodic

RAG

WIKI

Your product

Mix any layers. Build the memory your agent actually needs.

Start building →

Developer experience

More than an API endpoint

REST API + Dashboard

Playground for ingest/query, health metrics, action log, API keys.

Python & TypeScript SDK

pip install memoryservice / npm install @memoryservice/sdk — retry, 402 handling, async task.wait().

MCP Server

Connect memory to Claude Desktop, Cursor, or any MCP client in 30 seconds.

Webhooks

React to concept.created, gardener.proposals_ready, memory.ingest.completed, tokens.threshold_reached.

Scoping & GDPR

One instance serves thousands of end-users via user_id / session_id. GDPR delete endpoint included.

Read the docs

Full API reference, guides, and recipes for every memory type.

Open docs →

Dashboard · Playground

Ingest

{“content”: “Customer on Pro plan…”, “user_id”: “user_123”}

→ 202 Accepted · task_id: tsk_7f2a

Query

{“query”: “How should I contact this customer?”, “user_id”: “user_123“}

→ “Reach out by email” [Concept:c_2b8]

Pricing

Start free. Scale with subscription + usage.

Monthly tokens included. Buy token packs when you need more.

Free

$ 0

2 instances
100K tokens / mo
100 MB storage

RAG only.

Starter

$ 50/mo

10 instances
1M tokens / mo
1 GB storage

RAG, Wiki, Episodic, Working.

Pro ★ POPULAR

$ 230/mo

50 instances
10M tokens / mo
10 GB storage

All types, Gardener, Reflective, MCP

Business

$ 999/mo

Unlimited instances
50M tokens / mo
100 GB storage

Priority workers, all features.

Token packs from $99 · View full pricing →

LLM usage billed from your token balance. Server-side language model API configured by the platform.

Integration

Five lines to persistent agent memory

Python

TypeScript

curl

Python

from memoryservice import MemoryClient

client = MemoryClient(api_key="ms_live_xxx")
rag = client.instances.create(name="FAQ Bot", memory_type="rag")

task = rag.ingest("Your documentation here...", metadata={"source": "docs"})
task.wait()

result = rag.query("How do I cancel my subscription?", user_id="user_123")
print(result.answer)
print(result.citations)

TypeScript

import { MemoryClient } from "@memoryservice/sdk";

const client = new MemoryClient({ apiKey: "ms_live_xxx" });
const rag = await client.instances.create({ name: "FAQ Bot", memoryType: "rag" });

const task = await rag.ingest("Your documentation here...", { source: "docs" });
await task.wait();

const result = await rag.query("How do I cancel my subscription?", { userId: "user_123" });
console.log(result.answer);
console.log(result.citations);

curl

# create instance
curl -X POST https://mnemoniqa.com/api/v1/instances \
  -H "Authorization: Bearer ms_live_xxx" \
  -d '{"name": "FAQ Bot", "memory_type": "rag"}'

# query
curl -X POST https://mnemoniqa.com/api/v1/instances/inst_abc/query \
  -H "Authorization: Bearer ms_live_xxx" \
  -d '{"query": "How do I cancel?", "user_id": "user_123"}'

FAQ

Give your agents memory that lasts

Create your first instance in under a minute. Free tier, no credit card.

Ai agents that actually remember

• Citations in every answer

• 202 async ingest

• user_id scoping

Every agent team rebuilds the same memory stack

Months on infrastructure

Retrieval ≠ memory

One-off solutions

Wrong model for extraction

Six memory types. Pick what your agent needs.

Standalone, unified Agent, or self-editing memory

One type, one instance

Up to 6 layers, one query

The agent edits itself

From zero to remembering agent in three steps

Create a memory instance

Ingest your data

Connect your agent

Memory with provenance, not narrative generation

Full lineage

Split extraction pipeline

Gardener — memory that maintains itself

Bi-temporal facts

Built for real agent products

Game / NPC

Personal coach

Support bot

Docs / KB bot

EdTech tutor

Sales / CRM agent

Research assistant

HR onboarding

Your product

More than an API endpoint

REST API + Dashboard

Python & TypeScript SDK

MCP Server

Webhooks

Scoping & GDPR

Read the docs

Start free. Scale with subscription + usage.

Five lines to persistent agent memory

Questions, answered

How is Mnemoniqa different from a vector DB?

Can I use one instance for many end-users?

What models power extraction and synthesis?

Is ingest synchronous?

What happens when tokens run out?

Do you support MCP and Claude Desktop?

Give your agents memory that lasts