Documentation Index
Fetch the complete documentation index at: https://docs.openmemind.com/llms.txt
Use this file to discover all available pages before exploring further.
Overview
Memory Retrieval is how your application reads useful context from Memind.
Memind provides two retrieval strategies:
| Strategy | Latency profile | Best for |
|---|
SIMPLE | Millisecond-level latency | Low-latency agents, chatbots, and frequent memory injection. |
DEEP | Second-level latency | Complex questions, higher-quality retrieval, and workflows that can trade latency for completeness. |
Use SIMPLE when memory should feel instant.
Use DEEP when memory should be more complete.
This page focuses on how to call retrieval APIs, choose a strategy, use the result as agent context, configure retrieval, and debug retrieval behavior.
For how retrieval works internally, see Retrieval.
When to retrieve memory
Retrieve memory when the agent needs context that may not be present in the current prompt.
Common moments include:
- before generating an assistant response
- before planning a task
- before calling tools
- when a user asks about past preferences, decisions, or project context
- when restoring context across sessions
- when building a prompt for a long-running agent
- when the agent needs prior tool experience or reusable playbooks
You do not need to retrieve memory for every request.
Skip retrieval when the current prompt already contains enough context, or when the query does not depend on previous memory.
Choose a retrieval strategy
Choose the strategy based on latency and retrieval quality requirements.
| Need | Recommended strategy |
|---|
| Millisecond-level memory recall | SIMPLE |
| Low-latency chatbot responses | SIMPLE |
| Memory retrieval before frequent agent actions | SIMPLE |
| Direct fact or preference lookup | SIMPLE |
| Exact technical term or tool lookup | SIMPLE |
| Second-level retrieval is acceptable | DEEP |
| Retrieval quality matters more than latency | DEEP |
| Need more complete evidence | DEEP |
| Ambiguous or underspecified query | DEEP |
| Cross-session investigation | DEEP |
| Project-level context reconstruction | DEEP |
In most applications, start with SIMPLE.
Use DEEP selectively for harder questions where missing context is more expensive than waiting longer.
Retrieve with SIMPLE
Use SIMPLE for fast memory recall.
var result = memory.retrieve(
memoryId,
"What does this user prefer when writing technical docs?",
RetrievalConfig.Strategy.SIMPLE
).block();
SIMPLE is designed for low-latency agents and chatbots. It can retrieve relevant insights, memory items, and Raw Data captions without using the heavier deep-retrieval path.
Use it for:
- normal chat turns
- frequent memory injection
- direct questions
- preference lookup
- recent context recall
- latency-sensitive agent loops
Retrieve with DEEP
Use DEEP for quality-first retrieval.
var result = memory.retrieve(
memoryId,
"What has changed in this project direction over the last few weeks?",
RetrievalConfig.Strategy.DEEP
).block();
DEEP is designed for complex or ambiguous queries. It can use sufficiency checking, typed query expansion, graph/thread assist, optional reranking, and evidence-backed output.
Use it for:
- cross-session investigation
- project-level questions
- ambiguous user requests
- tasks that need stronger evidence
- situations where retrieval quality matters more than latency
Use retrieval results as agent context
The easiest way to use retrieval output in an agent prompt is formattedResult().
String memoryContext = result.formattedResult();
The formatted result is designed for LLM context construction.
It may include:
Each section serves a different purpose.
| Section | Role |
|---|
Insights | Higher-level understanding, stable preferences, learned patterns. |
Items | Concrete memory facts, events, directives, playbooks, and resolutions. |
Captions | Raw Data captions that provide source-level context. |
Use a guard when memory may be empty:
var memoryContext = result.isEmpty()
? "No relevant memory found."
: result.formattedResult();
Example prompt assembly:
var prompt = """
Relevant memory:
%s
User request:
%s
""".formatted(memoryContext, userInput);
This gives the agent both evidence and interpretation.
Retrieval returns a RetrievalResult.
RetrievalResult
items
insights
rawData
evidences
strategy
query
| Field | Meaning |
|---|
items | Ranked Memory Items returned by retrieval. |
insights | Higher-level understanding from the Insight Tree. |
rawData | Aggregated Raw Data captions behind retrieved items. |
evidences | Key evidence produced by deep retrieval when available. |
strategy | The retrieval strategy used. |
query | The effective query used for retrieval. |
A useful retrieval result usually combines multiple layers:
items provide concrete facts
insights provide interpretation
rawData provides source-level context
evidences provide supporting information for complex retrieval
For simple use cases, pass a strategy directly:
memory.retrieve(memoryId, query, RetrievalConfig.Strategy.SIMPLE).block();
For more control, build a RetrievalRequest with a custom RetrievalConfig.
var request = RetrievalRequest.of(
memoryId,
"What should the agent remember about this project?",
RetrievalConfig.simple()
);
var result = memory.retrieve(request).block();
Retrieval configuration is organized around three memory tiers.
| Tier | Meaning |
|---|
| Tier 1 | Insight retrieval. |
| Tier 2 | Memory Item retrieval. |
| Tier 3 | Raw Data caption retrieval. |
Use RetrievalConfig.simple() for low-latency retrieval configuration.
var config = RetrievalConfig.simple()
.withTier1(RetrievalConfig.TierConfig.enabled(5))
.withTier2(RetrievalConfig.TierConfig.enabled(15))
.withTier3(RetrievalConfig.TierConfig.enabled(5));
Use RetrievalConfig.deep() for quality-first retrieval configuration.
var config = RetrievalConfig.deep()
.withTimeout(Duration.ofSeconds(120));
Common configuration areas include:
| Area | Controls |
|---|
| Strategy | Whether to use SIMPLE or DEEP. |
| Tier limits | How many insights, items, and Raw Data captions to retrieve. |
| Fusion scoring | How vector, keyword, temporal, graph, and thread signals are weighted. |
| Graph assist | Whether graph-connected memory can be added. |
| Thread assist | Whether long-running topic context can be added. |
| Temporal retrieval | Whether time-aware retrieval is used. |
| Rerank | Whether deep retrieval can rerank final candidates. |
| Timeout | How long retrieval may run. |
| Cache | Whether repeated retrieval requests can reuse cached results. |
| Trace | Whether retrieval traces are collected for debugging. |
Start with the default configuration. Tune only when retrieval traces show a clear need.
Retrieve by scope or category
Use a RetrievalRequest when you want to restrict retrieval scope.
For user memory:
var request = RetrievalRequest.userMemory(
memoryId,
"What does the user prefer?",
RetrievalConfig.Strategy.SIMPLE
);
var result = memory.retrieve(request).block();
For agent memory:
var request = RetrievalRequest.agentMemory(
memoryId,
"What tool usage patterns should the agent remember?",
RetrievalConfig.Strategy.SIMPLE
);
var result = memory.retrieve(request).block();
You can also retrieve by memory categories.
var request = RetrievalRequest.byCategories(
memoryId,
"What reusable workflow should the agent follow?",
Set.of(MemoryCategory.PLAYBOOK),
RetrievalConfig.Strategy.DEEP
);
var result = memory.retrieve(request).block();
Use filters when you know the query should target a specific memory scope or category.
Debug with retrieval traces
Memory retrieval can be difficult to debug. Memind provides retrieval traces so developers can inspect what happened during retrieval.
A trace can help answer:
- which strategy was used
- what query was executed
- whether cache was used
- which retrieval channels ran
- whether keyword search returned candidates
- whether temporal retrieval activated
- whether graph assist changed the result
- whether memory-thread assist changed the result
- whether
DEEP triggered query expansion
- whether reranking was applied
- why a specific item appeared in the final result
Use traces when retrieval feels incomplete, noisy, or surprising.
Instead of treating memory as a black box, inspect the retrieval path and tune configuration based on evidence.
Best practices
Start simple:
- Use
SIMPLE as the default strategy.
- Use
DEEP only when the query is complex or quality matters more than latency.
- Do not use
DEEP for every chatbot turn unless latency is acceptable.
Write focused retrieval queries:
- Ask retrieval for the memory you need, not the whole user prompt.
- Keep the query short enough to express intent clearly.
- Preserve useful time expressions such as “last week”, “recently”, or “before the release”.
Use the result as context:
- Use
formattedResult() as the default prompt context format.
- Include memory only when the result is not empty.
- Let insights guide behavior, items support facts, and captions provide source context.
Debug before tuning:
- Inspect retrieval traces before changing top-k or scoring settings.
- Check whether the missing information was extracted first.
- Check Raw Data and Memory Items if retrieval cannot find expected context.
Choose the right memory scope:
- Use user memory for user preferences, facts, and history.
- Use agent memory for directives, tool experience, playbooks, and resolutions.
- Use category filters when you know what type of memory the query needs.