Multimodal Extraction

Overview

Multimodal Extraction is how your application turns real-world input into Memind memory. Memind can extract memory from conversations, streaming messages, documents, images, audio, tool calls, and custom raw content. These inputs enter the same memory construction pipeline and can become Raw Data, Memory Items, Item Graph relationships, Memory Threads, and Insight Tree updates. This page focuses on the extraction APIs, supported content types, plugin setup, response format, and usage patterns. If you want to run a complete example first, start with Quickstart. For the internal memory construction model, see Memory Construction.

Supported content types

Memind supports multiple content types through built-in conversation support and raw-data plugins.

Content type	Examples	Support
Conversations	chat messages, agent turns, transcripts	Built in
Streaming messages	live chatbot or agent messages	Built in with buffer and commit
Documents	text, markdown, HTML, CSV, PDF	Document raw-data plugin
Images	screenshots, diagrams, photos	Image raw-data plugin
Audio	recordings, transcripts, voice notes	Audio raw-data plugin
Tool calls	tool input/output, test results, shell output	Tool-call raw-data plugin
Custom raw content	application-specific records, logs, domain objects	`RawContent` and plugin SPI

The important idea is that Memind does not treat memory as only chat history. Different input types are normalized into Raw Data first, then processed into structured and connected memory.

Plugin setup

Conversation extraction is built into memind-core. Document, image, audio, and tool-call extraction require the matching raw-data plugin dependency and runtime registration.

Maven dependencies

Add the plugin you need.

<dependency>
  <groupId>com.openmemind.ai</groupId>
  <artifactId>memind-plugin-rawdata-document</artifactId>
  <version>${memind.version}</version>
</dependency>

<dependency>
  <groupId>com.openmemind.ai</groupId>
  <artifactId>memind-plugin-rawdata-image</artifactId>
  <version>${memind.version}</version>
</dependency>

<dependency>
  <groupId>com.openmemind.ai</groupId>
  <artifactId>memind-plugin-rawdata-audio</artifactId>
  <version>${memind.version}</version>
</dependency>

<dependency>
  <groupId>com.openmemind.ai</groupId>
  <artifactId>memind-plugin-rawdata-toolcall</artifactId>
  <version>${memind.version}</version>
</dependency>

Runtime registration

var memory = Memory.builder()
    // other runtime components: chat client, store, vector, text search...
    .rawDataPlugin(new DocumentRawDataPlugin())
    .build();

You can register multiple raw-data plugins when your application accepts multiple content types.

var memory = Memory.builder()
    // other runtime components...
    .rawDataPlugin(new DocumentRawDataPlugin())
    .rawDataPlugin(new ImageRawDataPlugin())
    .rawDataPlugin(new AudioRawDataPlugin())
    .rawDataPlugin(new ToolCallRawDataPlugin())
    .build();

Spring Boot starters

Spring Boot users can use the matching starters to auto-configure raw-data plugins.

Content type	Spring Boot starter
Documents	`memind-plugin-rawdata-document-starter`
Images	`memind-plugin-rawdata-image-starter`
Audio	`memind-plugin-rawdata-audio-starter`
Tool calls	`memind-plugin-rawdata-toolcall-starter`

The snippets below show the extraction entry points after the required plugin is available.

Extraction modes

Choose the extraction mode based on how content arrives in your application.

Mode	API	Best for
Batch conversation	`addMessages()`	You already have a complete conversation segment.
Streaming messages	`addMessage()`	Chatbot or agent messages arrive one at a time.
Manual commit	`commit()`	You want to force buffered messages into memory.
Raw content extraction	`extract()`	Documents, images, audio, tool calls, and custom content.
Plugin helpers	Plugin-specific helpers	Convenience APIs for supported content plugins.

Most applications use more than one mode. For example, a chatbot may use addMessage() during the conversation, commit() at session end, and extract() when the user uploads a document or when the agent produces important tool output.

Extract conversation messages

Use addMessages() when you already have a complete conversation segment. This is useful for:

examples and tests
importing a transcript
processing a finished conversation
extracting memory from a known context window

var result = memory.addMessages(
    memoryId,
    messages,
    ExtractionConfig.defaults().withLanguage("Chinese")
).block();

Memind treats the message list as one extraction unit. The pipeline can preserve the source as Raw Data, generate captions, extract Memory Items, update graph and thread projections, and schedule Insight Tree updates depending on configuration.

Extract streaming messages

Use addMessage() when messages arrive one at a time. This is the common mode for chatbots, coding agents, and long-running assistants.

memory.addMessage(memoryId, Message.user("Can you review this design?")).block();

memory.addMessage(memoryId, Message.assistant("The main issue is that the API boundary is unclear."))
    .block();

In streaming mode, Memind does not need to extract memory from every single message immediately. Messages are first stored in the conversation buffer. Memind then decides when the accumulated context is ready to commit. When a commit boundary is reached, extraction is triggered. If the boundary is not ready yet, the call may complete without producing a new extraction result. This avoids turning every chat turn into a disconnected memory item.

Commit buffered context

Use commit() to force the current conversation buffer into memory. This is useful when:

a session ends
an agent run finishes
the application is shutting down
you want to make sure the latest buffered context is written

var result = memory.commit(memoryId).block();

For real-time agents, a common pattern is:

memory.addMessage(memoryId, userMessage).block();
memory.addMessage(memoryId, assistantMessage).block();

// At the end of the session or run:
memory.commit(memoryId).block();

If the buffer is empty, the commit returns an empty extraction result.

Extract documents

Document extraction requires memind-plugin-rawdata-document. Use the document plugin to extract memory from text files, Markdown, HTML, CSV, PDF, and other parser-supported document formats. The plugin parses the content, segments it into document-aware sections, generates captions, and sends the normalized content into Memind’s memory construction pipeline.

var request = DocumentExtractionRequests.document(memoryId, documentContent);

var result = memory.extract(request).block();

documentContent is created with the document plugin’s content model or parser flow. See the document plugin or Java SDK docs for construction examples. Document extraction is useful for:

project docs
runbooks
design documents
meeting notes
release notes
support articles
internal knowledge bases

Document captions are especially useful because they preserve source-level context behind extracted Memory Items.

Extract images

Image extraction requires memind-plugin-rawdata-image. Use the image plugin to extract memory from screenshots, diagrams, UI captures, whiteboards, photos, or other visual artifacts. The plugin can analyze image content, produce image semantics, generate captions, and pass the resulting representation into memory extraction.

var request = ImageExtractionRequests.image(memoryId, imageContent);

var result = memory.extract(request).block();

imageContent is created with the image plugin’s content model or parser flow. See the image plugin or Java SDK docs for construction examples. Image extraction is useful when visual context matters, such as:

UI screenshots
architecture diagrams
error screenshots
visual bug reports
whiteboard notes
product or design references

Instead of losing visual context, Memind can convert images into source-level Raw Data that can support later memory retrieval.

Extract audio

Audio extraction requires memind-plugin-rawdata-audio. Use the audio plugin to extract memory from recordings, transcripts, voice notes, meetings, interviews, support calls, or agent sessions that include spoken context. Audio content can be transcribed, segmented, captioned, and processed into memory.

var request = AudioExtractionRequests.audio(memoryId, audioContent);

var result = memory.extract(request).block();

audioContent is created with the audio plugin’s content model or parser flow. See the audio plugin or Java SDK docs for construction examples. Audio extraction can preserve:

transcript segments
speaker or timing metadata when available
summarized captions
source references
extracted facts, events, preferences, and decisions

This lets Memind turn spoken context into searchable and retrievable memory.

Extract tool calls

Tool-call extraction requires memind-plugin-rawdata-toolcall. Use the tool-call plugin to extract memory from agent tool usage. Tool calls are important for agents because they capture what the agent tried, what worked, what failed, and what should be reused later.

var result = ToolCallMemories.report(memory, memoryId, toolCalls).block();

toolCalls is a list of tool-call records containing tool name, input, output, status, duration, and related metadata. Tool-call extraction is useful for:

coding agents
research agents
automation agents
test and deployment agents
long-running task agents

Tool-call memory can help Memind extract agent-scoped memory such as:

tool experience
durable directives
reusable playbooks
resolved problems
failed attempts
successful workflows

This is one of the main ways Memind helps agents remember their own operating experience, not only user preferences.

Extract custom raw content

Use extract() for custom content types. Custom raw content is useful when your application has domain-specific records that are not normal chat messages, documents, images, audio, or tool calls. Examples include:

internal business events
workflow records
logs
incident reports
CRM records
product analytics events
domain-specific artifacts

var result = memory.extract(
    memoryId,
    rawContent,
    ExtractionConfig.defaults()
).block();

For advanced use cases, implement a custom RawContent type and a raw-data processor or plugin. This lets Memind normalize your content into Raw Data before extracting memory.

Configure extraction

Most applications can start with the default extraction configuration. Use custom configuration when you need to control language, chunking, insight behavior, thread behavior, or plugin-specific processing.

Area	Controls
Language	Extraction language and prompt behavior.
Chunking	Conversation, document, audio, or content-specific segmentation.
Captions	Raw Data caption generation.
Memory Items	Item extraction behavior and category handling.
Insight	Insight Tree construction and asynchronous insight scheduling.
Threads	Memory Thread derivation and enrichment.
Plugins	Document, image, audio, tool-call, or custom content behavior.

Example:

var config = ExtractionConfig.defaults().withLanguage("Chinese");

var result = memory.addMessages(memoryId, messages, config).block();

Start with defaults, inspect the generated memory in Memind UI, then tune extraction behavior when needed.

Response format

All extraction modes return an ExtractionResult.

ExtractionResult
  memoryId
  rawDataResult
  memoryItemResult
  insightResult
  status
  duration
  errorMessage
  insightPending

Field	Meaning
`memoryId`	The memory identity that received the content.
`rawDataResult`	Raw Data construction result.
`memoryItemResult`	Memory Item extraction result.
`insightResult`	Insight construction result when available.
`status`	Extraction status such as `SUCCESS`, `PARTIAL_SUCCESS`, or `FAILED`.
`duration`	Total extraction duration.
`errorMessage`	Failure or partial-success error message.
`insightPending`	Whether insight work has been scheduled asynchronously but has not completed yet.

insightPending is important because Insight Tree construction may be asynchronous. A successful extraction can write Raw Data and Memory Items immediately while insight construction continues in the background.

Inspect extracted memory

Use Memind UI to inspect what extraction produced. Useful inspection points include:

View	What to inspect
Buffers	Pending and recent conversation context.
Raw Data	Segments, captions, source references, metadata, and content type.
Memory Items	Extracted facts, preferences, events, directives, playbooks, resolutions, and foresight.
Item Graph	Entities, aliases, mentions, and item relationships.
Memory Threads	Long-running topics, projects, workflows, incidents, and decisions.
Insight Tree	Higher-level understanding built from extracted memory.
Settings	Runtime configuration that affects extraction behavior.

If extraction does not look right, inspect Raw Data first. Raw Data captions usually show whether the source content was segmented and understood correctly before higher-level memory was extracted.

Best practices

Use the extraction mode that matches how content arrives:

Use addMessages() for complete conversation segments.
Use addMessage() for real-time agents and chatbots.
Use commit() at session end or before shutdown.
Use extract() for documents, images, audio, tool calls, and custom raw content.
Use plugin helpers when available.

Set up the right plugin:

Conversation extraction works with memind-core.
Document extraction needs the document raw-data plugin.
Image extraction needs the image raw-data plugin.
Audio extraction needs the audio raw-data plugin.
Tool-call extraction needs the tool-call raw-data plugin.

Preserve source context:

Keep useful metadata such as source client, content type, timestamps, file names, URLs, and tool names.
Prefer source-aware content objects over plain text when the source matters.
Inspect Raw Data captions before tuning item extraction.

Avoid fragmented memory:

Do not force every single chat message into memory as a complete unit.
Let streaming messages accumulate until there is enough context.
Commit the final buffer when a run ends.

Tune gradually:

Start with default extraction behavior.
Review Raw Data, Memory Items, Threads, and Insights in Memind UI.
Adjust language, chunking, plugin options, or insight behavior only when the output shows a real need.

Getting Started

Core Concepts

Core Features

Multimodal Extraction

Overview

Supported content types

Plugin setup

Maven dependencies

Runtime registration

Spring Boot starters

Extraction modes

Extract conversation messages

Extract streaming messages

Commit buffered context

Extract documents

Extract images

Extract audio

Extract tool calls

Extract custom raw content

Configure extraction

Response format

Inspect extracted memory

Best practices

Getting Started

Core Concepts

Core Features

Documentation Index

​Overview

​Supported content types

​Plugin setup

​Maven dependencies

​Runtime registration

​Spring Boot starters

​Extraction modes

​Extract conversation messages

​Extract streaming messages

​Commit buffered context

​Extract documents

​Extract images

​Extract audio

​Extract tool calls

​Extract custom raw content

​Configure extraction

​Response format

​Inspect extracted memory

​Best practices

Overview

Supported content types

Plugin setup

Maven dependencies

Runtime registration

Spring Boot starters

Extraction modes

Extract conversation messages

Extract streaming messages

Commit buffered context

Extract documents

Extract images

Extract audio

Extract tool calls

Extract custom raw content

Configure extraction

Response format

Inspect extracted memory

Best practices