> ## Documentation Index
> Fetch the complete documentation index at: https://docs.openmemind.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Multimodal Extraction

## Overview

Multimodal Extraction is how your application turns real-world input into Memind memory.

Memind can extract memory from conversations, streaming messages, documents, images, audio, tool calls, and custom raw content. These inputs enter the same memory construction pipeline and can become Raw Data, Memory Items, Item Graph relationships, Memory Threads, and Insight Tree updates.

This page focuses on the extraction APIs, supported content types, plugin setup, response format, and usage patterns.

If you want to run a complete example first, start with [Quickstart](/open-source/getting-started/quickstart).

For the internal memory construction model, see [Memory Construction](/open-source/core-concepts/memory-construction).

## Supported content types

Memind supports multiple content types through built-in conversation support and raw-data plugins.

| Content type       | Examples                                           | Support                         |
| ------------------ | -------------------------------------------------- | ------------------------------- |
| Conversations      | chat messages, agent turns, transcripts            | Built in                        |
| Streaming messages | live chatbot or agent messages                     | Built in with buffer and commit |
| Documents          | text, markdown, HTML, CSV, PDF                     | Document raw-data plugin        |
| Images             | screenshots, diagrams, photos                      | Image raw-data plugin           |
| Audio              | recordings, transcripts, voice notes               | Audio raw-data plugin           |
| Tool calls         | tool input/output, test results, shell output      | Tool-call raw-data plugin       |
| Custom raw content | application-specific records, logs, domain objects | `RawContent` and plugin SPI     |

The important idea is that Memind does not treat memory as only chat history.

Different input types are normalized into Raw Data first, then processed into structured and connected memory.

## Plugin setup

Conversation extraction is built into `memind-core`.

Document, image, audio, and tool-call extraction require the matching raw-data plugin dependency and runtime registration.

### Maven dependencies

Add the plugin you need.

```xml theme={null}
<dependency>
  <groupId>com.openmemind.ai</groupId>
  <artifactId>memind-plugin-rawdata-document</artifactId>
  <version>${memind.version}</version>
</dependency>
```

```xml theme={null}
<dependency>
  <groupId>com.openmemind.ai</groupId>
  <artifactId>memind-plugin-rawdata-image</artifactId>
  <version>${memind.version}</version>
</dependency>
```

```xml theme={null}
<dependency>
  <groupId>com.openmemind.ai</groupId>
  <artifactId>memind-plugin-rawdata-audio</artifactId>
  <version>${memind.version}</version>
</dependency>
```

```xml theme={null}
<dependency>
  <groupId>com.openmemind.ai</groupId>
  <artifactId>memind-plugin-rawdata-toolcall</artifactId>
  <version>${memind.version}</version>
</dependency>
```

### Runtime registration

Register the plugin on the Memind runtime.

```java theme={null}
var memory = Memory.builder()
    // other runtime components: chat client, store, vector, text search...
    .rawDataPlugin(new DocumentRawDataPlugin())
    .build();
```

You can register multiple raw-data plugins when your application accepts multiple content types.

```java theme={null}
var memory = Memory.builder()
    // other runtime components...
    .rawDataPlugin(new DocumentRawDataPlugin())
    .rawDataPlugin(new ImageRawDataPlugin())
    .rawDataPlugin(new AudioRawDataPlugin())
    .rawDataPlugin(new ToolCallRawDataPlugin())
    .build();
```

### Spring Boot starters

Spring Boot users can use the matching starters to auto-configure raw-data plugins.

| Content type | Spring Boot starter                      |
| ------------ | ---------------------------------------- |
| Documents    | `memind-plugin-rawdata-document-starter` |
| Images       | `memind-plugin-rawdata-image-starter`    |
| Audio        | `memind-plugin-rawdata-audio-starter`    |
| Tool calls   | `memind-plugin-rawdata-toolcall-starter` |

The snippets below show the extraction entry points after the required plugin is available.

## Extraction modes

Choose the extraction mode based on how content arrives in your application.

| Mode                   | API                     | Best for                                                  |
| ---------------------- | ----------------------- | --------------------------------------------------------- |
| Batch conversation     | `addMessages()`         | You already have a complete conversation segment.         |
| Streaming messages     | `addMessage()`          | Chatbot or agent messages arrive one at a time.           |
| Manual commit          | `commit()`              | You want to force buffered messages into memory.          |
| Raw content extraction | `extract()`             | Documents, images, audio, tool calls, and custom content. |
| Plugin helpers         | Plugin-specific helpers | Convenience APIs for supported content plugins.           |

Most applications use more than one mode.

For example, a chatbot may use `addMessage()` during the conversation, `commit()` at session end, and `extract()` when the user uploads a document or when the agent produces important tool output.

## Extract conversation messages

Use `addMessages()` when you already have a complete conversation segment.

This is useful for:

* examples and tests
* importing a transcript
* processing a finished conversation
* extracting memory from a known context window

```java theme={null}
var result = memory.addMessages(
    memoryId,
    messages,
    ExtractionConfig.defaults().withLanguage("Chinese")
).block();
```

Memind treats the message list as one extraction unit.

The pipeline can preserve the source as Raw Data, generate captions, extract Memory Items, update graph and thread projections, and schedule Insight Tree updates depending on configuration.

## Extract streaming messages

Use `addMessage()` when messages arrive one at a time.

This is the common mode for chatbots, coding agents, and long-running assistants.

```java theme={null}
memory.addMessage(memoryId, Message.user("Can you review this design?")).block();

memory.addMessage(memoryId, Message.assistant("The main issue is that the API boundary is unclear."))
    .block();
```

In streaming mode, Memind does not need to extract memory from every single message immediately.

Messages are first stored in the conversation buffer. Memind then decides when the accumulated context is ready to commit. When a commit boundary is reached, extraction is triggered. If the boundary is not ready yet, the call may complete without producing a new extraction result.

This avoids turning every chat turn into a disconnected memory item.

## Commit buffered context

Use `commit()` to force the current conversation buffer into memory.

This is useful when:

* a session ends
* an agent run finishes
* the application is shutting down
* you want to make sure the latest buffered context is written

```java theme={null}
var result = memory.commit(memoryId).block();
```

For real-time agents, a common pattern is:

```java theme={null}
memory.addMessage(memoryId, userMessage).block();
memory.addMessage(memoryId, assistantMessage).block();

// At the end of the session or run:
memory.commit(memoryId).block();
```

If the buffer is empty, the commit returns an empty extraction result.

## Extract documents

Document extraction requires `memind-plugin-rawdata-document`.

Use the document plugin to extract memory from text files, Markdown, HTML, CSV, PDF, and other parser-supported document formats. The plugin parses the content, segments it into document-aware sections, generates captions, and sends the normalized content into Memind's memory construction pipeline.

```java theme={null}
var request = DocumentExtractionRequests.document(memoryId, documentContent);

var result = memory.extract(request).block();
```

`documentContent` is created with the document plugin's content model or parser flow. See the document plugin or Java SDK docs for construction examples.

Document extraction is useful for:

* project docs
* runbooks
* design documents
* meeting notes
* release notes
* support articles
* internal knowledge bases

Document captions are especially useful because they preserve source-level context behind extracted Memory Items.

## Extract images

Image extraction requires `memind-plugin-rawdata-image`.

Use the image plugin to extract memory from screenshots, diagrams, UI captures, whiteboards, photos, or other visual artifacts. The plugin can analyze image content, produce image semantics, generate captions, and pass the resulting representation into memory extraction.

```java theme={null}
var request = ImageExtractionRequests.image(memoryId, imageContent);

var result = memory.extract(request).block();
```

`imageContent` is created with the image plugin's content model or parser flow. See the image plugin or Java SDK docs for construction examples.

Image extraction is useful when visual context matters, such as:

* UI screenshots
* architecture diagrams
* error screenshots
* visual bug reports
* whiteboard notes
* product or design references

Instead of losing visual context, Memind can convert images into source-level Raw Data that can support later memory retrieval.

## Extract audio

Audio extraction requires `memind-plugin-rawdata-audio`.

Use the audio plugin to extract memory from recordings, transcripts, voice notes, meetings, interviews, support calls, or agent sessions that include spoken context. Audio content can be transcribed, segmented, captioned, and processed into memory.

```java theme={null}
var request = AudioExtractionRequests.audio(memoryId, audioContent);

var result = memory.extract(request).block();
```

`audioContent` is created with the audio plugin's content model or parser flow. See the audio plugin or Java SDK docs for construction examples.

Audio extraction can preserve:

* transcript segments
* speaker or timing metadata when available
* summarized captions
* source references
* extracted facts, events, preferences, and decisions

This lets Memind turn spoken context into searchable and retrievable memory.

## Extract tool calls

Tool-call extraction requires `memind-plugin-rawdata-toolcall`.

Use the tool-call plugin to extract memory from agent tool usage. Tool calls are important for agents because they capture what the agent tried, what worked, what failed, and what should be reused later.

```java theme={null}
var result = ToolCallMemories.report(memory, memoryId, toolCalls).block();
```

`toolCalls` is a list of tool-call records containing tool name, input, output, status, duration, and related metadata.

Tool-call extraction is useful for:

* coding agents
* research agents
* automation agents
* test and deployment agents
* long-running task agents

Tool-call memory can help Memind extract agent-scoped memory such as:

* tool experience
* durable directives
* reusable playbooks
* resolved problems
* failed attempts
* successful workflows

This is one of the main ways Memind helps agents remember their own operating experience, not only user preferences.

## Extract custom raw content

Use `extract()` for custom content types.

Custom raw content is useful when your application has domain-specific records that are not normal chat messages, documents, images, audio, or tool calls.

Examples include:

* internal business events
* workflow records
* logs
* incident reports
* CRM records
* product analytics events
* domain-specific artifacts

```java theme={null}
var result = memory.extract(
    memoryId,
    rawContent,
    ExtractionConfig.defaults()
).block();
```

For advanced use cases, implement a custom `RawContent` type and a raw-data processor or plugin. This lets Memind normalize your content into Raw Data before extracting memory.

## Configure extraction

Most applications can start with the default extraction configuration.

Use custom configuration when you need to control language, chunking, insight behavior, thread behavior, or plugin-specific processing.

| Area         | Controls                                                         |
| ------------ | ---------------------------------------------------------------- |
| Language     | Extraction language and prompt behavior.                         |
| Chunking     | Conversation, document, audio, or content-specific segmentation. |
| Captions     | Raw Data caption generation.                                     |
| Memory Items | Item extraction behavior and category handling.                  |
| Insight      | Insight Tree construction and asynchronous insight scheduling.   |
| Threads      | Memory Thread derivation and enrichment.                         |
| Plugins      | Document, image, audio, tool-call, or custom content behavior.   |

Example:

```java theme={null}
var config = ExtractionConfig.defaults().withLanguage("Chinese");

var result = memory.addMessages(memoryId, messages, config).block();
```

Start with defaults, inspect the generated memory in Memind UI, then tune extraction behavior when needed.

## Response format

All extraction modes return an `ExtractionResult`.

```text theme={null}
ExtractionResult
  memoryId
  rawDataResult
  memoryItemResult
  insightResult
  status
  duration
  errorMessage
  insightPending
```

| Field              | Meaning                                                                           |
| ------------------ | --------------------------------------------------------------------------------- |
| `memoryId`         | The memory identity that received the content.                                    |
| `rawDataResult`    | Raw Data construction result.                                                     |
| `memoryItemResult` | Memory Item extraction result.                                                    |
| `insightResult`    | Insight construction result when available.                                       |
| `status`           | Extraction status such as `SUCCESS`, `PARTIAL_SUCCESS`, or `FAILED`.              |
| `duration`         | Total extraction duration.                                                        |
| `errorMessage`     | Failure or partial-success error message.                                         |
| `insightPending`   | Whether insight work has been scheduled asynchronously but has not completed yet. |

`insightPending` is important because Insight Tree construction may be asynchronous. A successful extraction can write Raw Data and Memory Items immediately while insight construction continues in the background.

## Inspect extracted memory

Use Memind UI to inspect what extraction produced.

Useful inspection points include:

| View           | What to inspect                                                                          |
| -------------- | ---------------------------------------------------------------------------------------- |
| Buffers        | Pending and recent conversation context.                                                 |
| Raw Data       | Segments, captions, source references, metadata, and content type.                       |
| Memory Items   | Extracted facts, preferences, events, directives, playbooks, resolutions, and foresight. |
| Item Graph     | Entities, aliases, mentions, and item relationships.                                     |
| Memory Threads | Long-running topics, projects, workflows, incidents, and decisions.                      |
| Insight Tree   | Higher-level understanding built from extracted memory.                                  |
| Settings       | Runtime configuration that affects extraction behavior.                                  |

If extraction does not look right, inspect Raw Data first.

Raw Data captions usually show whether the source content was segmented and understood correctly before higher-level memory was extracted.

## Best practices

Use the extraction mode that matches how content arrives:

* Use `addMessages()` for complete conversation segments.
* Use `addMessage()` for real-time agents and chatbots.
* Use `commit()` at session end or before shutdown.
* Use `extract()` for documents, images, audio, tool calls, and custom raw content.
* Use plugin helpers when available.

Set up the right plugin:

* Conversation extraction works with `memind-core`.
* Document extraction needs the document raw-data plugin.
* Image extraction needs the image raw-data plugin.
* Audio extraction needs the audio raw-data plugin.
* Tool-call extraction needs the tool-call raw-data plugin.

Preserve source context:

* Keep useful metadata such as source client, content type, timestamps, file names, URLs, and tool names.
* Prefer source-aware content objects over plain text when the source matters.
* Inspect Raw Data captions before tuning item extraction.

Avoid fragmented memory:

* Do not force every single chat message into memory as a complete unit.
* Let streaming messages accumulate until there is enough context.
* Commit the final buffer when a run ends.

Tune gradually:

* Start with default extraction behavior.
* Review Raw Data, Memory Items, Threads, and Insights in Memind UI.
* Adjust language, chunking, plugin options, or insight behavior only when the output shows a real need.