Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.openmemind.com/llms.txt

Use this file to discover all available pages before exploring further.

Overview

Memory Retrieval is how your application reads useful context from Memind. Memind provides two retrieval strategies:
StrategyLatency profileBest for
SIMPLEMillisecond-level latencyLow-latency agents, chatbots, and frequent memory injection.
DEEPSecond-level latencyComplex questions, higher-quality retrieval, and workflows that can trade latency for completeness.
Use SIMPLE when memory should feel instant. Use DEEP when memory should be more complete. This page focuses on how to call retrieval APIs, choose a strategy, use the result as agent context, configure retrieval, and debug retrieval behavior. For how retrieval works internally, see Retrieval.

When to retrieve memory

Retrieve memory when the agent needs context that may not be present in the current prompt. Common moments include:
  • before generating an assistant response
  • before planning a task
  • before calling tools
  • when a user asks about past preferences, decisions, or project context
  • when restoring context across sessions
  • when building a prompt for a long-running agent
  • when the agent needs prior tool experience or reusable playbooks
You do not need to retrieve memory for every request. Skip retrieval when the current prompt already contains enough context, or when the query does not depend on previous memory.

Choose a retrieval strategy

Choose the strategy based on latency and retrieval quality requirements.
NeedRecommended strategy
Millisecond-level memory recallSIMPLE
Low-latency chatbot responsesSIMPLE
Memory retrieval before frequent agent actionsSIMPLE
Direct fact or preference lookupSIMPLE
Exact technical term or tool lookupSIMPLE
Second-level retrieval is acceptableDEEP
Retrieval quality matters more than latencyDEEP
Need more complete evidenceDEEP
Ambiguous or underspecified queryDEEP
Cross-session investigationDEEP
Project-level context reconstructionDEEP
In most applications, start with SIMPLE. Use DEEP selectively for harder questions where missing context is more expensive than waiting longer.

Retrieve with SIMPLE

Use SIMPLE for fast memory recall.
var result = memory.retrieve(
    memoryId,
    "What does this user prefer when writing technical docs?",
    RetrievalConfig.Strategy.SIMPLE
).block();
SIMPLE is designed for low-latency agents and chatbots. It can retrieve relevant insights, memory items, and Raw Data captions without using the heavier deep-retrieval path. Use it for:
  • normal chat turns
  • frequent memory injection
  • direct questions
  • preference lookup
  • recent context recall
  • latency-sensitive agent loops

Retrieve with DEEP

Use DEEP for quality-first retrieval.
var result = memory.retrieve(
    memoryId,
    "What has changed in this project direction over the last few weeks?",
    RetrievalConfig.Strategy.DEEP
).block();
DEEP is designed for complex or ambiguous queries. It can use sufficiency checking, typed query expansion, graph/thread assist, optional reranking, and evidence-backed output. Use it for:
  • cross-session investigation
  • project-level questions
  • ambiguous user requests
  • tasks that need stronger evidence
  • situations where retrieval quality matters more than latency

Use retrieval results as agent context

The easiest way to use retrieval output in an agent prompt is formattedResult().
String memoryContext = result.formattedResult();
The formatted result is designed for LLM context construction. It may include:
Insights
Items
Captions
Each section serves a different purpose.
SectionRole
InsightsHigher-level understanding, stable preferences, learned patterns.
ItemsConcrete memory facts, events, directives, playbooks, and resolutions.
CaptionsRaw Data captions that provide source-level context.
Use a guard when memory may be empty:
var memoryContext = result.isEmpty()
    ? "No relevant memory found."
    : result.formattedResult();
Example prompt assembly:
var prompt = """
Relevant memory:
%s

User request:
%s
""".formatted(memoryContext, userInput);
This gives the agent both evidence and interpretation.

Response format

Retrieval returns a RetrievalResult.
RetrievalResult
  items
  insights
  rawData
  evidences
  strategy
  query
FieldMeaning
itemsRanked Memory Items returned by retrieval.
insightsHigher-level understanding from the Insight Tree.
rawDataAggregated Raw Data captions behind retrieved items.
evidencesKey evidence produced by deep retrieval when available.
strategyThe retrieval strategy used.
queryThe effective query used for retrieval.
A useful retrieval result usually combines multiple layers:
  • items provide concrete facts
  • insights provide interpretation
  • rawData provides source-level context
  • evidences provide supporting information for complex retrieval

Configure retrieval

For simple use cases, pass a strategy directly:
memory.retrieve(memoryId, query, RetrievalConfig.Strategy.SIMPLE).block();
For more control, build a RetrievalRequest with a custom RetrievalConfig.
var request = RetrievalRequest.of(
    memoryId,
    "What should the agent remember about this project?",
    RetrievalConfig.simple()
);

var result = memory.retrieve(request).block();
Retrieval configuration is organized around three memory tiers.
TierMeaning
Tier 1Insight retrieval.
Tier 2Memory Item retrieval.
Tier 3Raw Data caption retrieval.
Use RetrievalConfig.simple() for low-latency retrieval configuration.
var config = RetrievalConfig.simple()
    .withTier1(RetrievalConfig.TierConfig.enabled(5))
    .withTier2(RetrievalConfig.TierConfig.enabled(15))
    .withTier3(RetrievalConfig.TierConfig.enabled(5));
Use RetrievalConfig.deep() for quality-first retrieval configuration.
var config = RetrievalConfig.deep()
    .withTimeout(Duration.ofSeconds(120));
Common configuration areas include:
AreaControls
StrategyWhether to use SIMPLE or DEEP.
Tier limitsHow many insights, items, and Raw Data captions to retrieve.
Fusion scoringHow vector, keyword, temporal, graph, and thread signals are weighted.
Graph assistWhether graph-connected memory can be added.
Thread assistWhether long-running topic context can be added.
Temporal retrievalWhether time-aware retrieval is used.
RerankWhether deep retrieval can rerank final candidates.
TimeoutHow long retrieval may run.
CacheWhether repeated retrieval requests can reuse cached results.
TraceWhether retrieval traces are collected for debugging.
Start with the default configuration. Tune only when retrieval traces show a clear need.

Retrieve by scope or category

Use a RetrievalRequest when you want to restrict retrieval scope. For user memory:
var request = RetrievalRequest.userMemory(
    memoryId,
    "What does the user prefer?",
    RetrievalConfig.Strategy.SIMPLE
);

var result = memory.retrieve(request).block();
For agent memory:
var request = RetrievalRequest.agentMemory(
    memoryId,
    "What tool usage patterns should the agent remember?",
    RetrievalConfig.Strategy.SIMPLE
);

var result = memory.retrieve(request).block();
You can also retrieve by memory categories.
var request = RetrievalRequest.byCategories(
    memoryId,
    "What reusable workflow should the agent follow?",
    Set.of(MemoryCategory.PLAYBOOK),
    RetrievalConfig.Strategy.DEEP
);

var result = memory.retrieve(request).block();
Use filters when you know the query should target a specific memory scope or category.

Debug with retrieval traces

Memory retrieval can be difficult to debug. Memind provides retrieval traces so developers can inspect what happened during retrieval. A trace can help answer:
  • which strategy was used
  • what query was executed
  • whether cache was used
  • which retrieval channels ran
  • whether keyword search returned candidates
  • whether temporal retrieval activated
  • whether graph assist changed the result
  • whether memory-thread assist changed the result
  • whether DEEP triggered query expansion
  • whether reranking was applied
  • why a specific item appeared in the final result
Use traces when retrieval feels incomplete, noisy, or surprising. Instead of treating memory as a black box, inspect the retrieval path and tune configuration based on evidence.

Best practices

Start simple:
  • Use SIMPLE as the default strategy.
  • Use DEEP only when the query is complex or quality matters more than latency.
  • Do not use DEEP for every chatbot turn unless latency is acceptable.
Write focused retrieval queries:
  • Ask retrieval for the memory you need, not the whole user prompt.
  • Keep the query short enough to express intent clearly.
  • Preserve useful time expressions such as “last week”, “recently”, or “before the release”.
Use the result as context:
  • Use formattedResult() as the default prompt context format.
  • Include memory only when the result is not empty.
  • Let insights guide behavior, items support facts, and captions provide source context.
Debug before tuning:
  • Inspect retrieval traces before changing top-k or scoring settings.
  • Check whether the missing information was extracted first.
  • Check Raw Data and Memory Items if retrieval cannot find expected context.
Choose the right memory scope:
  • Use user memory for user preferences, facts, and history.
  • Use agent memory for directives, tool experience, playbooks, and resolutions.
  • Use category filters when you know what type of memory the query needs.