Memory Retrieval

Overview

Memory Retrieval is how your application reads useful context from Memind. Memind provides two retrieval strategies:

Strategy	Latency profile	Best for
`SIMPLE`	Millisecond-level latency	Low-latency agents, chatbots, and frequent memory injection.
`DEEP`	Second-level latency	Complex questions, higher-quality retrieval, and workflows that can trade latency for completeness.

Use SIMPLE when memory should feel instant. Use DEEP when memory should be more complete. This page focuses on how to call retrieval APIs, choose a strategy, use the result as agent context, configure retrieval, and debug retrieval behavior. For how retrieval works internally, see Retrieval.

When to retrieve memory

Retrieve memory when the agent needs context that may not be present in the current prompt. Common moments include:

before generating an assistant response
before planning a task
before calling tools
when a user asks about past preferences, decisions, or project context
when restoring context across sessions
when building a prompt for a long-running agent
when the agent needs prior tool experience or reusable playbooks

You do not need to retrieve memory for every request. Skip retrieval when the current prompt already contains enough context, or when the query does not depend on previous memory.

Choose a retrieval strategy

Choose the strategy based on latency and retrieval quality requirements.

Need	Recommended strategy
Millisecond-level memory recall	`SIMPLE`
Low-latency chatbot responses	`SIMPLE`
Memory retrieval before frequent agent actions	`SIMPLE`
Direct fact or preference lookup	`SIMPLE`
Exact technical term or tool lookup	`SIMPLE`
Second-level retrieval is acceptable	`DEEP`
Retrieval quality matters more than latency	`DEEP`
Need more complete evidence	`DEEP`
Ambiguous or underspecified query	`DEEP`
Cross-session investigation	`DEEP`
Project-level context reconstruction	`DEEP`

In most applications, start with SIMPLE. Use DEEP selectively for harder questions where missing context is more expensive than waiting longer.

Retrieve with SIMPLE

Use SIMPLE for fast memory recall.

var result = memory.retrieve(
    memoryId,
    "What does this user prefer when writing technical docs?",
    RetrievalConfig.Strategy.SIMPLE
).block();

SIMPLE is designed for low-latency agents and chatbots. It can retrieve relevant insights, memory items, and Raw Data captions without using the heavier deep-retrieval path. Use it for:

normal chat turns
frequent memory injection
direct questions
preference lookup
recent context recall
latency-sensitive agent loops

Retrieve with DEEP

Use DEEP for quality-first retrieval.

var result = memory.retrieve(
    memoryId,
    "What has changed in this project direction over the last few weeks?",
    RetrievalConfig.Strategy.DEEP
).block();

DEEP is designed for complex or ambiguous queries. It can use sufficiency checking, typed query expansion, graph/thread assist, optional reranking, and evidence-backed output. Use it for:

cross-session investigation
project-level questions
ambiguous user requests
tasks that need stronger evidence
situations where retrieval quality matters more than latency

Use retrieval results as agent context

The easiest way to use retrieval output in an agent prompt is formattedResult().

String memoryContext = result.formattedResult();

The formatted result is designed for LLM context construction. It may include:

Insights
Items
Captions

Each section serves a different purpose.

Section	Role
`Insights`	Higher-level understanding, stable preferences, learned patterns.
`Items`	Concrete memory facts, events, directives, playbooks, and resolutions.
`Captions`	Raw Data captions that provide source-level context.

Use a guard when memory may be empty:

var memoryContext = result.isEmpty()
    ? "No relevant memory found."
    : result.formattedResult();

Example prompt assembly:

var prompt = """
Relevant memory:
%s

User request:
%s
""".formatted(memoryContext, userInput);

This gives the agent both evidence and interpretation.

Response format

Retrieval returns a RetrievalResult.

RetrievalResult
  items
  insights
  rawData
  evidences
  strategy
  query

Field	Meaning
`items`	Ranked Memory Items returned by retrieval.
`insights`	Higher-level understanding from the Insight Tree.
`rawData`	Aggregated Raw Data captions behind retrieved items.
`evidences`	Key evidence produced by deep retrieval when available.
`strategy`	The retrieval strategy used.
`query`	The effective query used for retrieval.

A useful retrieval result usually combines multiple layers:

items provide concrete facts
insights provide interpretation
rawData provides source-level context
evidences provide supporting information for complex retrieval

Configure retrieval

For simple use cases, pass a strategy directly:

memory.retrieve(memoryId, query, RetrievalConfig.Strategy.SIMPLE).block();

For more control, build a RetrievalRequest with a custom RetrievalConfig.

var request = RetrievalRequest.of(
    memoryId,
    "What should the agent remember about this project?",
    RetrievalConfig.simple()
);

var result = memory.retrieve(request).block();

Retrieval configuration is organized around three memory tiers.

Tier	Meaning
Tier 1	Insight retrieval.
Tier 2	Memory Item retrieval.
Tier 3	Raw Data caption retrieval.

Use RetrievalConfig.simple() for low-latency retrieval configuration.

var config = RetrievalConfig.simple()
    .withTier1(RetrievalConfig.TierConfig.enabled(5))
    .withTier2(RetrievalConfig.TierConfig.enabled(15))
    .withTier3(RetrievalConfig.TierConfig.enabled(5));

Use RetrievalConfig.deep() for quality-first retrieval configuration.

var config = RetrievalConfig.deep()
    .withTimeout(Duration.ofSeconds(120));

Common configuration areas include:

Area	Controls
Strategy	Whether to use `SIMPLE` or `DEEP`.
Tier limits	How many insights, items, and Raw Data captions to retrieve.
Fusion scoring	How vector, keyword, temporal, graph, and thread signals are weighted.
Graph assist	Whether graph-connected memory can be added.
Thread assist	Whether long-running topic context can be added.
Temporal retrieval	Whether time-aware retrieval is used.
Rerank	Whether deep retrieval can rerank final candidates.
Timeout	How long retrieval may run.
Cache	Whether repeated retrieval requests can reuse cached results.
Trace	Whether retrieval traces are collected for debugging.

Start with the default configuration. Tune only when retrieval traces show a clear need.

Retrieve by scope or category

Use a RetrievalRequest when you want to restrict retrieval scope. For user memory:

var request = RetrievalRequest.userMemory(
    memoryId,
    "What does the user prefer?",
    RetrievalConfig.Strategy.SIMPLE
);

var result = memory.retrieve(request).block();

For agent memory:

var request = RetrievalRequest.agentMemory(
    memoryId,
    "What tool usage patterns should the agent remember?",
    RetrievalConfig.Strategy.SIMPLE
);

var result = memory.retrieve(request).block();

You can also retrieve by memory categories.

var request = RetrievalRequest.byCategories(
    memoryId,
    "What reusable workflow should the agent follow?",
    Set.of(MemoryCategory.PLAYBOOK),
    RetrievalConfig.Strategy.DEEP
);

var result = memory.retrieve(request).block();

Use filters when you know the query should target a specific memory scope or category.

Debug with retrieval traces

Memory retrieval can be difficult to debug. Memind provides retrieval traces so developers can inspect what happened during retrieval. A trace can help answer:

which strategy was used
what query was executed
whether cache was used
which retrieval channels ran
whether keyword search returned candidates
whether temporal retrieval activated
whether graph assist changed the result
whether memory-thread assist changed the result
whether DEEP triggered query expansion
whether reranking was applied
why a specific item appeared in the final result

Use traces when retrieval feels incomplete, noisy, or surprising. Instead of treating memory as a black box, inspect the retrieval path and tune configuration based on evidence.

Best practices

Start simple:

Use SIMPLE as the default strategy.
Use DEEP only when the query is complex or quality matters more than latency.
Do not use DEEP for every chatbot turn unless latency is acceptable.

Write focused retrieval queries:

Ask retrieval for the memory you need, not the whole user prompt.
Keep the query short enough to express intent clearly.
Preserve useful time expressions such as “last week”, “recently”, or “before the release”.

Use the result as context:

Use formattedResult() as the default prompt context format.
Include memory only when the result is not empty.
Let insights guide behavior, items support facts, and captions provide source context.

Debug before tuning:

Inspect retrieval traces before changing top-k or scoring settings.
Check whether the missing information was extracted first.
Check Raw Data and Memory Items if retrieval cannot find expected context.

Choose the right memory scope:

Use user memory for user preferences, facts, and history.
Use agent memory for directives, tool experience, playbooks, and resolutions.
Use category filters when you know what type of memory the query needs.

Getting Started

Core Concepts

Core Features

Memory Retrieval

Overview

When to retrieve memory

Choose a retrieval strategy

Retrieve with SIMPLE

Retrieve with DEEP

Use retrieval results as agent context

Response format

Configure retrieval

Retrieve by scope or category

Debug with retrieval traces

Best practices

Getting Started

Core Concepts

Core Features

Documentation Index

​Overview

​When to retrieve memory

​Choose a retrieval strategy

​Retrieve with SIMPLE

​Retrieve with DEEP

​Use retrieval results as agent context

​Response format

​Configure retrieval

​Retrieve by scope or category

​Debug with retrieval traces

​Best practices

Overview

When to retrieve memory

Choose a retrieval strategy

Retrieve with SIMPLE

Retrieve with DEEP

Use retrieval results as agent context

Response format

Configure retrieval

Retrieve by scope or category

Debug with retrieval traces

Best practices