Skip links
Memory infrastructure for AI agents · B2D

Ai agents that actually remember

Six memory types. One REST API. Connect in minutes — no vector pipelines, chunking, or knowledge graphs to build from scratch.

Free tier · 2 instances · 100K tokens/month · No credit card

ingest + query

curl -X POST https://mnemoniqa.com/api/v1/instances/inst_abc/ingest \
-H “Authorization: Bearer ms_live_xxx” \
-d ‘{“content”: “User prefers brief answers
with code examples.”}’

curl -X POST https://mnemoniqa.com/api/v1/instances/inst_abc/query \
-H “Authorization: Bearer ms_live_xxx” \
-d ‘{“query”: “What do we know about this
user?”, “user_id”: “user_123”}’

• Citations in every answer

• 202 async ingest

• user_id scoping

The problem

Every agent team rebuilds the same memory stack

Months on infrastructure

Vector DBs, chunking, embedding pipelines, cache invalidation — instead of product work.

Retrieval ≠ memory

Semantic search finds similar chunks. Memory answers: what does the agent know, how did it change, and where did it come from?

One-off solutions

Each startup invents its own knowledge base, graph, and timeline — without audit trails or provenance.

Wrong model for extraction

Using expensive models for extraction is a common mistake. Structured pipelines with the right model tier cut cost and improve quality.

Memory types

Six memory types. Pick what your agent needs.

Each type is a managed instance with ingest, query, health metrics, and full source lineage.

RAG

Vector search over documents with hierarchical clustering and citations to sources.

Wiki

Compiling knowledge base with revisable concepts, cross-links, contradiction detection, bi-temporal facts.

Episodic

Chronological conversation memory with decay — old episodes fade naturally.

Working

Short-term key-value memory with TTL for the current task or session.

Graph

Entities and typed relationships with ontology, invalidation, and repair queue.

Reflective

Behavioral, emotional, and motivational patterns extracted from long interaction history.

How you build

Standalone, unified Agent, or self-editing memory

Standalone

One type, one instance

Perfect for FAQ bots (RAG only) or simple session state (Working only).

POST /instances/:id/query

Agent · Unified

Up to 6 layers, one query

Combine RAG + Episodic + Working + Wiki + Graph + Reflective in a single query with weighted merge and synthesis.

Working

Reflective

Episodic

RAG

WIKI

Graph

Self-editing

The agent edits itself

Manages its own memory via tool calls during conversation: core memory, recall search, archival insert/update.

core · recall · archival
Get started

From zero to remembering agent in three steps

1

Create a memory instance

Choose a type (or build an Agent with multiple layers) in the dashboard. Get an API key.

2

Ingest your data

Upload docs, conversation logs, or structured facts. Async by default — returns task_id, progress via webhooks.

3

Connect your agent

REST API, Python/TypeScript SDK, or MCP for Claude Desktop / Cursor. Scope by user_id and session_id.

Why Mnemoniqa

Memory with provenance, not narrative generation

Full lineage

Every claim traces: source → segment → concept → answer. Responses include citations [Concept:ID] or [Source:UUID]. Not “text from similar chunks” — verified knowledge.

sourcesegmentconceptanswer

“Cancellation is self-service in Settings.” [Concept:c_9a1] [Source:doc_44]

Fast model
Structured extraction
low reasoning · high volume
Smart model
Synthesis & gardening
high reasoning · low volume

−48% cost · +36% quality vs. single-model extraction

Split extraction pipeline

Fast model for structured extraction, smart model for synthesis and gardening. Lower cost, higher quality on concept-heavy memory (Wiki, Graph, Reflective).

Gardener — memory that maintains itself

Phase 0: cheap model proposes merges and splits. Phase 1: smart model applies surgical fixes. Proposals await your review — no silent auto-merge.

PHASE 0 Cheap model proposes merges & splits → queued for review

PHASE 1 Smart model applies surgical refactoring on approval

3 proposals pending · 0 auto-applied

fact plan = “Pro”
valid_time 2026-01-04 → now
system_time 2026-01-04 09:12 UTC
query as-of 2025-12-01 → plan = “Starter”

Bi-temporal facts

Facts store valid time (when true in the world) and system time (when recorded). Query what the agent “knew” at any point in the past — critical for audit and compliance.

Use cases

Built for real agent products

🎮

Game / NPC

NPCs that remember the player across sessions.

Working

Reflective

Episodic

🧠

Personal coach

Grows with the user — goals, history, patterns.

Working

Reflective

Episodic

🎧

Support bot

Knows your product + each customer’s ticket history.

Working

Episodic

RAG

📚

Docs / KB bot

Answers with citations; concepts evolve as docs change.

Working

RAG

🎓

EdTech tutor

Remembers what each student learned and where they struggled.

Reflective

Episodic

RAG

💼

Sales / CRM agent

Every touchpoint, objection, and deal stage per contact.

Working

Episodic

Graph

🔬

Research assistant

Indexes papers, extracts concepts, finds cross-source links.

RAG

Graph

WIKI

🤝

HR onboarding

Company policies + per-employee onboarding progress.

Episodic

RAG

WIKI

Your product

Mix any layers. Build the memory your agent actually needs.

Developer experience

More than an API endpoint

REST API + Dashboard

Playground for ingest/query, health metrics, action log, API keys.

Python & TypeScript SDK

pip install memoryservice / npm install @memoryservice/sdk — retry, 402 handling, async task.wait().

MCP Server

Connect memory to Claude Desktop, Cursor, or any MCP client in 30 seconds.

Webhooks

React to concept.created, gardener.proposals_ready, memory.ingest.completed, tokens.threshold_reached.

Scoping & GDPR

One instance serves thousands of end-users via user_id / session_id. GDPR delete endpoint included.

Read the docs

Full API reference, guides, and recipes for every memory type.

Dashboard · Playground

Ingest

{“content”: “Customer on Pro plan…”, “user_id”: “user_123”}

→ 202 Accepted · task_id: tsk_7f2a

Query

{“query”: “How should I contact this customer?”, “user_id”: “user_123“}

→ “Reach out by email” [Concept:c_2b8]

Pricing

Start free. Scale with subscription + usage.

Monthly tokens included. Buy token packs when you need more.

Free

$ 0
2 instances
100K tokens / mo
100 MB storage

RAG only.

Starter

$ 50/mo
10 instances
1M tokens / mo
1 GB storage

RAG, Wiki, Episodic, Working.

Pro ★ POPULAR

$ 230/mo
50 instances
10M tokens / mo
10 GB storage

All types, Gardener, Reflective, MCP

Business

$ 999/mo
Unlimited instances
50M tokens / mo
100 GB storage

Priority workers, all features.

Token packs from $99 · View full pricing →

LLM usage billed from your token balance. Server-side language model API configured by the platform.

Integration

Five lines to persistent agent memory

FAQ

Questions, answered

We provide memory semantics: concepts, lineage, decay, bi-temporal facts, and self-maintenance — not just embedding search.

Yes. Pass user_id and session_id on ingest/query. One instance scales to thousands of users.

Configurable per instance on Pro+. The platform manages language model API access — you don’t wire providers yourself.

Async by default (202 + task_id). Sync mode is available for small payloads (?sync=true, 120s timeout).

The API returns 402 Payment Required. Buy a token pack or upgrade your plan.

Yes, on Pro and Business plans. Copy the MCP URL from the dashboard Connect page.

Give your agents memory that lasts

Create your first instance in under a minute. Free tier, no credit card.

This website uses cookies to improve your web experience.