> ## Documentation Index
> Fetch the complete documentation index at: https://docs.openmemind.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Memory Retrieval

## Overview

Memory Retrieval is how your application reads useful context from Memind.

Memind provides two retrieval strategies:

| Strategy | Latency profile           | Best for                                                                                            |
| -------- | ------------------------- | --------------------------------------------------------------------------------------------------- |
| `SIMPLE` | Millisecond-level latency | Low-latency agents, chatbots, and frequent memory injection.                                        |
| `DEEP`   | Second-level latency      | Complex questions, higher-quality retrieval, and workflows that can trade latency for completeness. |

Use `SIMPLE` when memory should feel instant.

Use `DEEP` when memory should be more complete.

This page focuses on how to call retrieval APIs, choose a strategy, use the result as agent context, configure retrieval, and debug retrieval behavior.

For how retrieval works internally, see [Retrieval](/open-source/core-concepts/retrieval).

## When to retrieve memory

Retrieve memory when the agent needs context that may not be present in the current prompt.

Common moments include:

* before generating an assistant response
* before planning a task
* before calling tools
* when a user asks about past preferences, decisions, or project context
* when restoring context across sessions
* when building a prompt for a long-running agent
* when the agent needs prior tool experience or reusable playbooks

You do not need to retrieve memory for every request.

Skip retrieval when the current prompt already contains enough context, or when the query does not depend on previous memory.

## Choose a retrieval strategy

Choose the strategy based on latency and retrieval quality requirements.

| Need                                           | Recommended strategy |
| ---------------------------------------------- | -------------------- |
| Millisecond-level memory recall                | `SIMPLE`             |
| Low-latency chatbot responses                  | `SIMPLE`             |
| Memory retrieval before frequent agent actions | `SIMPLE`             |
| Direct fact or preference lookup               | `SIMPLE`             |
| Exact technical term or tool lookup            | `SIMPLE`             |
| Second-level retrieval is acceptable           | `DEEP`               |
| Retrieval quality matters more than latency    | `DEEP`               |
| Need more complete evidence                    | `DEEP`               |
| Ambiguous or underspecified query              | `DEEP`               |
| Cross-session investigation                    | `DEEP`               |
| Project-level context reconstruction           | `DEEP`               |

In most applications, start with `SIMPLE`.

Use `DEEP` selectively for harder questions where missing context is more expensive than waiting longer.

## Retrieve with SIMPLE

Use `SIMPLE` for fast memory recall.

```java theme={null}
var result = memory.retrieve(
    memoryId,
    "What does this user prefer when writing technical docs?",
    RetrievalConfig.Strategy.SIMPLE
).block();
```

`SIMPLE` is designed for low-latency agents and chatbots. It can retrieve relevant insights, memory items, and Raw Data captions without using the heavier deep-retrieval path.

Use it for:

* normal chat turns
* frequent memory injection
* direct questions
* preference lookup
* recent context recall
* latency-sensitive agent loops

## Retrieve with DEEP

Use `DEEP` for quality-first retrieval.

```java theme={null}
var result = memory.retrieve(
    memoryId,
    "What has changed in this project direction over the last few weeks?",
    RetrievalConfig.Strategy.DEEP
).block();
```

`DEEP` is designed for complex or ambiguous queries. It can use sufficiency checking, typed query expansion, graph/thread assist, optional reranking, and evidence-backed output.

Use it for:

* cross-session investigation
* project-level questions
* ambiguous user requests
* tasks that need stronger evidence
* situations where retrieval quality matters more than latency

## Use retrieval results as agent context

The easiest way to use retrieval output in an agent prompt is `formattedResult()`.

```java theme={null}
String memoryContext = result.formattedResult();
```

The formatted result is designed for LLM context construction.

It may include:

```text theme={null}
Insights
Items
Captions
```

Each section serves a different purpose.

| Section    | Role                                                                   |
| ---------- | ---------------------------------------------------------------------- |
| `Insights` | Higher-level understanding, stable preferences, learned patterns.      |
| `Items`    | Concrete memory facts, events, directives, playbooks, and resolutions. |
| `Captions` | Raw Data captions that provide source-level context.                   |

Use a guard when memory may be empty:

```java theme={null}
var memoryContext = result.isEmpty()
    ? "No relevant memory found."
    : result.formattedResult();
```

Example prompt assembly:

```java theme={null}
var prompt = """
Relevant memory:
%s

User request:
%s
""".formatted(memoryContext, userInput);
```

This gives the agent both evidence and interpretation.

## Response format

Retrieval returns a `RetrievalResult`.

```text theme={null}
RetrievalResult
  items
  insights
  rawData
  evidences
  strategy
  query
```

| Field       | Meaning                                                 |
| ----------- | ------------------------------------------------------- |
| `items`     | Ranked Memory Items returned by retrieval.              |
| `insights`  | Higher-level understanding from the Insight Tree.       |
| `rawData`   | Aggregated Raw Data captions behind retrieved items.    |
| `evidences` | Key evidence produced by deep retrieval when available. |
| `strategy`  | The retrieval strategy used.                            |
| `query`     | The effective query used for retrieval.                 |

A useful retrieval result usually combines multiple layers:

* `items` provide concrete facts
* `insights` provide interpretation
* `rawData` provides source-level context
* `evidences` provide supporting information for complex retrieval

## Configure retrieval

For simple use cases, pass a strategy directly:

```java theme={null}
memory.retrieve(memoryId, query, RetrievalConfig.Strategy.SIMPLE).block();
```

For more control, build a `RetrievalRequest` with a custom `RetrievalConfig`.

```java theme={null}
var request = RetrievalRequest.of(
    memoryId,
    "What should the agent remember about this project?",
    RetrievalConfig.simple()
);

var result = memory.retrieve(request).block();
```

Retrieval configuration is organized around three memory tiers.

| Tier   | Meaning                     |
| ------ | --------------------------- |
| Tier 1 | Insight retrieval.          |
| Tier 2 | Memory Item retrieval.      |
| Tier 3 | Raw Data caption retrieval. |

Use `RetrievalConfig.simple()` for low-latency retrieval configuration.

```java theme={null}
var config = RetrievalConfig.simple()
    .withTier1(RetrievalConfig.TierConfig.enabled(5))
    .withTier2(RetrievalConfig.TierConfig.enabled(15))
    .withTier3(RetrievalConfig.TierConfig.enabled(5));
```

Use `RetrievalConfig.deep()` for quality-first retrieval configuration.

```java theme={null}
var config = RetrievalConfig.deep()
    .withTimeout(Duration.ofSeconds(120));
```

Common configuration areas include:

| Area               | Controls                                                               |
| ------------------ | ---------------------------------------------------------------------- |
| Strategy           | Whether to use `SIMPLE` or `DEEP`.                                     |
| Tier limits        | How many insights, items, and Raw Data captions to retrieve.           |
| Fusion scoring     | How vector, keyword, temporal, graph, and thread signals are weighted. |
| Graph assist       | Whether graph-connected memory can be added.                           |
| Thread assist      | Whether long-running topic context can be added.                       |
| Temporal retrieval | Whether time-aware retrieval is used.                                  |
| Rerank             | Whether deep retrieval can rerank final candidates.                    |
| Timeout            | How long retrieval may run.                                            |
| Cache              | Whether repeated retrieval requests can reuse cached results.          |
| Trace              | Whether retrieval traces are collected for debugging.                  |

Start with the default configuration. Tune only when retrieval traces show a clear need.

## Retrieve by scope or category

Use a `RetrievalRequest` when you want to restrict retrieval scope.

For user memory:

```java theme={null}
var request = RetrievalRequest.userMemory(
    memoryId,
    "What does the user prefer?",
    RetrievalConfig.Strategy.SIMPLE
);

var result = memory.retrieve(request).block();
```

For agent memory:

```java theme={null}
var request = RetrievalRequest.agentMemory(
    memoryId,
    "What tool usage patterns should the agent remember?",
    RetrievalConfig.Strategy.SIMPLE
);

var result = memory.retrieve(request).block();
```

You can also retrieve by memory categories.

```java theme={null}
var request = RetrievalRequest.byCategories(
    memoryId,
    "What reusable workflow should the agent follow?",
    Set.of(MemoryCategory.PLAYBOOK),
    RetrievalConfig.Strategy.DEEP
);

var result = memory.retrieve(request).block();
```

Use filters when you know the query should target a specific memory scope or category.

## Debug with retrieval traces

Memory retrieval can be difficult to debug. Memind provides retrieval traces so developers can inspect what happened during retrieval.

A trace can help answer:

* which strategy was used
* what query was executed
* whether cache was used
* which retrieval channels ran
* whether keyword search returned candidates
* whether temporal retrieval activated
* whether graph assist changed the result
* whether memory-thread assist changed the result
* whether `DEEP` triggered query expansion
* whether reranking was applied
* why a specific item appeared in the final result

Use traces when retrieval feels incomplete, noisy, or surprising.

Instead of treating memory as a black box, inspect the retrieval path and tune configuration based on evidence.

## Best practices

Start simple:

* Use `SIMPLE` as the default strategy.
* Use `DEEP` only when the query is complex or quality matters more than latency.
* Do not use `DEEP` for every chatbot turn unless latency is acceptable.

Write focused retrieval queries:

* Ask retrieval for the memory you need, not the whole user prompt.
* Keep the query short enough to express intent clearly.
* Preserve useful time expressions such as "last week", "recently", or "before the release".

Use the result as context:

* Use `formattedResult()` as the default prompt context format.
* Include memory only when the result is not empty.
* Let insights guide behavior, items support facts, and captions provide source context.

Debug before tuning:

* Inspect retrieval traces before changing top-k or scoring settings.
* Check whether the missing information was extracted first.
* Check Raw Data and Memory Items if retrieval cannot find expected context.

Choose the right memory scope:

* Use user memory for user preferences, facts, and history.
* Use agent memory for directives, tool experience, playbooks, and resolutions.
* Use category filters when you know what type of memory the query needs.
