Documentation Index
Fetch the complete documentation index at: https://docs.openmemind.com/llms.txt
Use this file to discover all available pages before exploring further.
Raw Data is Memind’s source-level semantic evidence layer.
Before Memind extracts structured Memory Items, connects knowledge through the Item Graph, builds Memory Threads, or consolidates Insights, it first preserves what was actually observed. Raw Data keeps source context in a form that can be inspected, searched, and referenced by later memory layers.
The key idea is simple:
Raw Data keeps memory grounded in source context.
Overview
Raw Data is not just archived input.
It is the source layer behind Memind’s structured memory system. Each Raw Data record preserves a source segment, its metadata, source references, timing information, and a generated caption.
That caption is important. It turns a raw source segment into a compact semantic context that can be searched, inspected, and used to understand the background behind extracted Memory Items.
A Memory Item tells Memind what durable memory was extracted. Raw Data tells Memind where that memory came from and what broader context shaped it.
Raw source context
-> topic-aware Raw Data segment
-> semantic caption
-> structured Memory Items
This is one of the main differences between Memind and memory systems that only retrieve isolated facts.
Why Raw Data exists
Many memory systems jump directly from conversation text to extracted facts.
That works for simple preferences, but it often loses the surrounding context. An agent may retrieve several correct items, but still miss the larger situation that produced them.
For example, the agent might retrieve:
- the user uses Java 21
- the user prefers stable tools
- the user is migrating a service
Those facts are useful, but they are still fragments. They do not explain the project background, the decision path, the tradeoffs discussed, or why those facts are related.
Raw Data exists to preserve that missing layer.
It gives Memind:
- source context before higher-level memory is extracted
- traceability from Memory Items back to observed content
- searchable captions for source-level evidence
- a way to recover broader context behind isolated items
- an inspection layer for debugging memory quality
- a stable foundation for future processing and reprocessing
Raw Data makes memory explainable instead of opaque.
Why captions matter
Raw Data captions are not just summaries.
In Memind, source content can be segmented around meaningful context boundaries, such as a topic, workflow, decision, incident, or conversation shift. For conversation content, LLM-based segmentation can identify these semantic boundaries instead of relying only on arbitrary fixed-size chunks.
That means a Raw Data segment can represent a coherent slice of context.
The caption then becomes a compact semantic handle for that segment.
This matters because Memory Items are intentionally concise. They capture durable facts, preferences, events, directives, tool experience, playbooks, or resolutions. But an item alone may not carry the full background of the conversation that produced it.
Raw Data captions keep that background searchable.
Instead of retrieving only scattered items, Memind can also retrieve the source-level context behind those items:
Raw conversation context
-> semantic source segment
-> captioned context
-> extracted Memory Items
This helps agents understand not only what was remembered, but also why that memory exists, what surrounding context shaped it, and how multiple extracted items relate to the same source situation.
A plain memory store may return isolated facts. Memind can return facts with their source context.
Where Raw Data fits
Raw Data is created early in the memory construction flow.
Raw Input
-> Raw Data
-> Memory Items
-> Item Graph
-> Memory Threads
-> Insight Tree
Raw Data is the first durable memory layer. It sits between raw input and structured memory extraction.
Later layers depend on it:
| Layer | How it uses Raw Data |
|---|
| Memory Items | Extract durable facts, preferences, events, directives, tool experience, playbooks, and resolutions from Raw Data. |
| Item Graph | Connects extracted items that keep source references back to Raw Data. |
| Memory Threads | Groups related items that were originally extracted from Raw Data. |
| Insight Tree | Consolidates higher-level understanding from items while preserving evidence through item and source references. |
| Retrieval | Can return raw-data captions and source evidence alongside structured memory results. |
Raw Data is the bridge between what happened and what Memind remembers.
Processing flow
Raw Data processing turns raw input into source-level records.
At a high level, the flow is:
Raw Input
-> Content Resolution
-> Resource Preparation
-> Segmentation
-> Caption Generation
-> Vectorization
-> Persistence
-> Raw Data Result
Each step has a specific role.
| Step | Purpose |
|---|
| Content Resolution | Resolve content type, metadata, source client, extraction config, and content identity. |
| Resource Preparation | Parse files, fetch URLs, apply plugins, and normalize the source payload when needed. |
| Segmentation | Split source content into meaningful, inspectable units with boundaries and source timing. |
| Caption Generation | Generate semantic captions for source segments. |
| Vectorization | Embed captions and store vector IDs for source-level retrieval. |
| Persistence | Store Raw Data records with segments, captions, metadata, references, and timestamps. |
| Raw Data Result | Return Raw Data records and source segments to the next construction stage. |
For streaming conversation input, Memind may first buffer messages and seal a conversation segment before Raw Data processing begins.
Raw Data records
A Raw Data record stores source-level information about one processed segment.
Conceptually, it contains:
| Field | Meaning |
|---|
id | Unique Raw Data segment identifier. |
memoryId | The memory namespace this Raw Data belongs to. |
contentType | The type of source content, such as conversation content. |
sourceClient | The client or integration that produced the content. |
contentId | A content fingerprint used for idempotency. |
segment | The source segment, including content, boundary, and metadata. |
caption | A compact semantic summary of the segment. |
captionVectorId | The vector index ID for the caption embedding. |
metadata | Additional source or application metadata. |
resourceId | Optional reference to a stored resource. |
mimeType | Optional MIME type for file or resource content. |
createdAt | When the Raw Data record was created. |
startTime | Source start time when available. |
endTime | Source end time when available. |
You usually do not need to manipulate these fields directly. They explain what Memind preserves so later memory layers can remain traceable.
Segments
A segment is the source unit that Raw Data persists.
For conversation content, a segment may represent a range of messages. For other content types, it may represent a character range, parsed section, or processor-defined chunk.
A segment can include:
| Segment field | Meaning |
|---|
content | The source text for this segment. |
caption | The generated semantic summary for the segment. |
boundary | The source boundary, such as message range or character range. |
metadata | Segment-level metadata. |
Segments are important because Raw Data should preserve context at a useful granularity. A whole conversation or file may be too large to inspect or retrieve as one unit, while a single sentence may be too small to preserve meaning.
The goal is to keep source context coherent enough for later extraction, inspection, and retrieval.
Captions
Captions summarize Raw Data segments into compact semantic context.
A caption does not replace the original content. It gives the segment a searchable and human-readable representation.
Captions are useful because they:
- make source segments easier to browse in Memind UI
- provide compact source-level retrieval text
- reduce noise when searching Raw Data
- help LLMs understand the background behind extracted items
- preserve a semantic view of the original source segment
- provide text that can be embedded for vector search
This makes Raw Data more than a source archive.
With captions, Raw Data becomes a searchable context layer behind Memory Items.
Raw Data can carry source metadata and references.
This is useful when memory comes from multiple applications, agents, files, URLs, or tools.
Common metadata and references include:
| Concept | Purpose |
|---|
sourceClient | Identifies which client, integration, or application produced the content. |
contentType | Describes how the content should be processed. |
resourceId | Links Raw Data back to a stored resource. |
mimeType | Preserves resource type information for files or external content. |
startTime / endTime | Preserves the time range represented by the source segment. |
metadata | Carries application-specific source information. |
These fields make Raw Data useful for filtering, inspection, debugging, and downstream processing.
Vector indexing
Raw Data can be indexed for semantic search.
Memind vectorizes Raw Data captions and stores the resulting vector IDs on Raw Data records. This allows source-level evidence to participate in retrieval without embedding the entire original content as the only searchable representation.
Raw Data vector indexing is separate from Memory Item and Insight indexing.
| Indexed layer | What is embedded |
|---|
| Raw Data | Segment captions. |
| Memory Items | Structured memory item content. |
| Insights | Consolidated insight content. |
This separation lets Memind retrieve at different levels of abstraction: source evidence, structured memory, and higher-level understanding.
Idempotency
Raw Data processing uses content identity to avoid duplicate work.
Each raw input can produce a contentId, which acts as a fingerprint of the original content. Before processing, Memind can check whether the same content has already been stored for the same memory namespace.
If the content already exists, Memind can return the existing Raw Data instead of writing duplicate source records.
This is useful when:
- the same conversation batch is submitted more than once
- ingestion is retried after a failure
- integrations resend previously observed content
- applications want safer repeated writes
Idempotency helps keep the source layer clean.
Relationship to Memory Items
Raw Data is the input evidence for Memory Item extraction.
After Raw Data is created, Memind extracts structured Memory Items from it. Those items can keep references back to the Raw Data record that produced them.
This relationship is important:
Raw Data
-> one or more Memory Items
-> graph, threads, insights, and retrieval context
A single Raw Data segment can produce multiple Memory Items. Some Raw Data may produce no durable items if there is nothing worth keeping. Either way, the source remains inspectable.
This also means an agent can reason with both levels:
- Memory Items provide concise durable facts.
- Raw Data captions provide the broader context behind those facts.
Together, they help Memind avoid returning memory as disconnected fragments.
Inspecting Raw Data
Memind UI lets developers inspect the Raw Data layer.
The Raw Data view is useful when you want to understand:
- what source content was ingested
- how content was segmented
- what captions were generated
- which source client produced the data
- what metadata was attached
- what time range the source segment represents
- whether downstream memory came from the expected source
- what broader context sits behind a Memory Item
This is especially helpful when tuning extraction behavior or debugging unexpected retrieval results.
Common use cases
Raw Data is useful in several development workflows:
| Use case | Why Raw Data helps |
|---|
| Debug extraction | Check whether a Memory Item came from the expected source context. |
| Audit memory quality | Inspect source segments when extracted memory looks wrong or incomplete. |
| Improve ingestion | See how conversations, files, or custom content are segmented and captioned. |
| Trace retrieval evidence | Understand which source segments support a retrieved context. |
| Recover background context | Retrieve the broader topic context behind concise Memory Items. |
| Extend content support | Validate custom parsers, resource fetchers, and Raw Data plugins. |
Design principle
Raw Data keeps Memind grounded.
Structured memory is useful only when it remains connected to the source context that produced it. Raw Data gives Memind a durable evidence layer, and captions make that evidence semantic, searchable, and usable by agents.
Memind does not only remember what was extracted.
It also preserves the context that made those memories meaningful.