Overview
Multimodal Extraction is how your application turns real-world input into Memind memory. Memind can extract memory from conversations, streaming messages, documents, images, audio, tool calls, and custom raw content. These inputs enter the same memory construction pipeline and can become Raw Data, Memory Items, Item Graph relationships, Memory Threads, and Insight Tree updates. This page focuses on the extraction APIs, supported content types, plugin setup, response format, and usage patterns. If you want to run a complete example first, start with Quickstart. For the internal memory construction model, see Memory Construction.Supported content types
Memind supports multiple content types through built-in conversation support and raw-data plugins.| Content type | Examples | Support |
|---|---|---|
| Conversations | chat messages, agent turns, transcripts | Built in |
| Streaming messages | live chatbot or agent messages | Built in with buffer and commit |
| Documents | text, markdown, HTML, CSV, PDF | Document raw-data plugin |
| Images | screenshots, diagrams, photos | Image raw-data plugin |
| Audio | recordings, transcripts, voice notes | Audio raw-data plugin |
| Tool calls | tool input/output, test results, shell output | Tool-call raw-data plugin |
| Custom raw content | application-specific records, logs, domain objects | RawContent and plugin SPI |
Plugin setup
Conversation extraction is built intomemind-core.
Document, image, audio, and tool-call extraction require the matching raw-data plugin dependency and runtime registration.
Maven dependencies
Add the plugin you need.Runtime registration
Register the plugin on the Memind runtime.Spring Boot starters
Spring Boot users can use the matching starters to auto-configure raw-data plugins.| Content type | Spring Boot starter |
|---|---|
| Documents | memind-plugin-rawdata-document-starter |
| Images | memind-plugin-rawdata-image-starter |
| Audio | memind-plugin-rawdata-audio-starter |
| Tool calls | memind-plugin-rawdata-toolcall-starter |
Extraction modes
Choose the extraction mode based on how content arrives in your application.| Mode | API | Best for |
|---|---|---|
| Batch conversation | addMessages() | You already have a complete conversation segment. |
| Streaming messages | addMessage() | Chatbot or agent messages arrive one at a time. |
| Manual commit | commit() | You want to force buffered messages into memory. |
| Raw content extraction | extract() | Documents, images, audio, tool calls, and custom content. |
| Plugin helpers | Plugin-specific helpers | Convenience APIs for supported content plugins. |
addMessage() during the conversation, commit() at session end, and extract() when the user uploads a document or when the agent produces important tool output.
Extract conversation messages
UseaddMessages() when you already have a complete conversation segment.
This is useful for:
- examples and tests
- importing a transcript
- processing a finished conversation
- extracting memory from a known context window
Extract streaming messages
UseaddMessage() when messages arrive one at a time.
This is the common mode for chatbots, coding agents, and long-running assistants.
Commit buffered context
Usecommit() to force the current conversation buffer into memory.
This is useful when:
- a session ends
- an agent run finishes
- the application is shutting down
- you want to make sure the latest buffered context is written
Extract documents
Document extraction requiresmemind-plugin-rawdata-document.
Use the document plugin to extract memory from text files, Markdown, HTML, CSV, PDF, and other parser-supported document formats. The plugin parses the content, segments it into document-aware sections, generates captions, and sends the normalized content into Memind’s memory construction pipeline.
documentContent is created with the document plugin’s content model or parser flow. See the document plugin or Java SDK docs for construction examples.
Document extraction is useful for:
- project docs
- runbooks
- design documents
- meeting notes
- release notes
- support articles
- internal knowledge bases
Extract images
Image extraction requiresmemind-plugin-rawdata-image.
Use the image plugin to extract memory from screenshots, diagrams, UI captures, whiteboards, photos, or other visual artifacts. The plugin can analyze image content, produce image semantics, generate captions, and pass the resulting representation into memory extraction.
imageContent is created with the image plugin’s content model or parser flow. See the image plugin or Java SDK docs for construction examples.
Image extraction is useful when visual context matters, such as:
- UI screenshots
- architecture diagrams
- error screenshots
- visual bug reports
- whiteboard notes
- product or design references
Extract audio
Audio extraction requiresmemind-plugin-rawdata-audio.
Use the audio plugin to extract memory from recordings, transcripts, voice notes, meetings, interviews, support calls, or agent sessions that include spoken context. Audio content can be transcribed, segmented, captioned, and processed into memory.
audioContent is created with the audio plugin’s content model or parser flow. See the audio plugin or Java SDK docs for construction examples.
Audio extraction can preserve:
- transcript segments
- speaker or timing metadata when available
- summarized captions
- source references
- extracted facts, events, preferences, and decisions
Extract tool calls
Tool-call extraction requiresmemind-plugin-rawdata-toolcall.
Use the tool-call plugin to extract memory from agent tool usage. Tool calls are important for agents because they capture what the agent tried, what worked, what failed, and what should be reused later.
toolCalls is a list of tool-call records containing tool name, input, output, status, duration, and related metadata.
Tool-call extraction is useful for:
- coding agents
- research agents
- automation agents
- test and deployment agents
- long-running task agents
- tool experience
- durable directives
- reusable playbooks
- resolved problems
- failed attempts
- successful workflows
Extract custom raw content
Useextract() for custom content types.
Custom raw content is useful when your application has domain-specific records that are not normal chat messages, documents, images, audio, or tool calls.
Examples include:
- internal business events
- workflow records
- logs
- incident reports
- CRM records
- product analytics events
- domain-specific artifacts
RawContent type and a raw-data processor or plugin. This lets Memind normalize your content into Raw Data before extracting memory.
Configure extraction
Most applications can start with the default extraction configuration. Use custom configuration when you need to control language, chunking, insight behavior, thread behavior, or plugin-specific processing.| Area | Controls |
|---|---|
| Language | Extraction language and prompt behavior. |
| Chunking | Conversation, document, audio, or content-specific segmentation. |
| Captions | Raw Data caption generation. |
| Memory Items | Item extraction behavior and category handling. |
| Insight | Insight Tree construction and asynchronous insight scheduling. |
| Threads | Memory Thread derivation and enrichment. |
| Plugins | Document, image, audio, tool-call, or custom content behavior. |
Response format
All extraction modes return anExtractionResult.
| Field | Meaning |
|---|---|
memoryId | The memory identity that received the content. |
rawDataResult | Raw Data construction result. |
memoryItemResult | Memory Item extraction result. |
insightResult | Insight construction result when available. |
status | Extraction status such as SUCCESS, PARTIAL_SUCCESS, or FAILED. |
duration | Total extraction duration. |
errorMessage | Failure or partial-success error message. |
insightPending | Whether insight work has been scheduled asynchronously but has not completed yet. |
insightPending is important because Insight Tree construction may be asynchronous. A successful extraction can write Raw Data and Memory Items immediately while insight construction continues in the background.
Inspect extracted memory
Use Memind UI to inspect what extraction produced. Useful inspection points include:| View | What to inspect |
|---|---|
| Buffers | Pending and recent conversation context. |
| Raw Data | Segments, captions, source references, metadata, and content type. |
| Memory Items | Extracted facts, preferences, events, directives, playbooks, resolutions, and foresight. |
| Item Graph | Entities, aliases, mentions, and item relationships. |
| Memory Threads | Long-running topics, projects, workflows, incidents, and decisions. |
| Insight Tree | Higher-level understanding built from extracted memory. |
| Settings | Runtime configuration that affects extraction behavior. |
Best practices
Use the extraction mode that matches how content arrives:- Use
addMessages()for complete conversation segments. - Use
addMessage()for real-time agents and chatbots. - Use
commit()at session end or before shutdown. - Use
extract()for documents, images, audio, tool calls, and custom raw content. - Use plugin helpers when available.
- Conversation extraction works with
memind-core. - Document extraction needs the document raw-data plugin.
- Image extraction needs the image raw-data plugin.
- Audio extraction needs the audio raw-data plugin.
- Tool-call extraction needs the tool-call raw-data plugin.
- Keep useful metadata such as source client, content type, timestamps, file names, URLs, and tool names.
- Prefer source-aware content objects over plain text when the source matters.
- Inspect Raw Data captions before tuning item extraction.
- Do not force every single chat message into memory as a complete unit.
- Let streaming messages accumulate until there is enough context.
- Commit the final buffer when a run ends.
- Start with default extraction behavior.
- Review Raw Data, Memory Items, Threads, and Insights in Memind UI.
- Adjust language, chunking, plugin options, or insight behavior only when the output shows a real need.

