Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.openmemind.com/llms.txt

Use this file to discover all available pages before exploring further.

Raw Data is Memind’s source-level semantic evidence layer. Before Memind extracts structured Memory Items, connects knowledge through the Item Graph, builds Memory Threads, or consolidates Insights, it first preserves what was actually observed. Raw Data keeps source context in a form that can be inspected, searched, and referenced by later memory layers. The key idea is simple:
Raw Data keeps memory grounded in source context.
Rawdata Flow

Overview

Raw Data is not just archived input. It is the source layer behind Memind’s structured memory system. Each Raw Data record preserves a source segment, its metadata, source references, timing information, and a generated caption. That caption is important. It turns a raw source segment into a compact semantic context that can be searched, inspected, and used to understand the background behind extracted Memory Items. A Memory Item tells Memind what durable memory was extracted. Raw Data tells Memind where that memory came from and what broader context shaped it.
Raw source context
  -> topic-aware Raw Data segment
  -> semantic caption
  -> structured Memory Items
This is one of the main differences between Memind and memory systems that only retrieve isolated facts.

Why Raw Data exists

Many memory systems jump directly from conversation text to extracted facts. That works for simple preferences, but it often loses the surrounding context. An agent may retrieve several correct items, but still miss the larger situation that produced them. For example, the agent might retrieve:
  • the user uses Java 21
  • the user prefers stable tools
  • the user is migrating a service
Those facts are useful, but they are still fragments. They do not explain the project background, the decision path, the tradeoffs discussed, or why those facts are related. Raw Data exists to preserve that missing layer. It gives Memind:
  • source context before higher-level memory is extracted
  • traceability from Memory Items back to observed content
  • searchable captions for source-level evidence
  • a way to recover broader context behind isolated items
  • an inspection layer for debugging memory quality
  • a stable foundation for future processing and reprocessing
Raw Data makes memory explainable instead of opaque.

Why captions matter

Raw Data captions are not just summaries. In Memind, source content can be segmented around meaningful context boundaries, such as a topic, workflow, decision, incident, or conversation shift. For conversation content, LLM-based segmentation can identify these semantic boundaries instead of relying only on arbitrary fixed-size chunks. That means a Raw Data segment can represent a coherent slice of context. The caption then becomes a compact semantic handle for that segment. This matters because Memory Items are intentionally concise. They capture durable facts, preferences, events, directives, tool experience, playbooks, or resolutions. But an item alone may not carry the full background of the conversation that produced it. Raw Data captions keep that background searchable. Instead of retrieving only scattered items, Memind can also retrieve the source-level context behind those items:
Raw conversation context
  -> semantic source segment
  -> captioned context
  -> extracted Memory Items
This helps agents understand not only what was remembered, but also why that memory exists, what surrounding context shaped it, and how multiple extracted items relate to the same source situation. A plain memory store may return isolated facts. Memind can return facts with their source context.

Where Raw Data fits

Raw Data is created early in the memory construction flow.
Raw Input
  -> Raw Data
  -> Memory Items
  -> Item Graph
  -> Memory Threads
  -> Insight Tree
Raw Data is the first durable memory layer. It sits between raw input and structured memory extraction. Later layers depend on it:
LayerHow it uses Raw Data
Memory ItemsExtract durable facts, preferences, events, directives, tool experience, playbooks, and resolutions from Raw Data.
Item GraphConnects extracted items that keep source references back to Raw Data.
Memory ThreadsGroups related items that were originally extracted from Raw Data.
Insight TreeConsolidates higher-level understanding from items while preserving evidence through item and source references.
RetrievalCan return raw-data captions and source evidence alongside structured memory results.
Raw Data is the bridge between what happened and what Memind remembers.

Processing flow

Raw Data processing turns raw input into source-level records. At a high level, the flow is:
Raw Input
  -> Content Resolution
  -> Resource Preparation
  -> Segmentation
  -> Caption Generation
  -> Vectorization
  -> Persistence
  -> Raw Data Result
Each step has a specific role.
StepPurpose
Content ResolutionResolve content type, metadata, source client, extraction config, and content identity.
Resource PreparationParse files, fetch URLs, apply plugins, and normalize the source payload when needed.
SegmentationSplit source content into meaningful, inspectable units with boundaries and source timing.
Caption GenerationGenerate semantic captions for source segments.
VectorizationEmbed captions and store vector IDs for source-level retrieval.
PersistenceStore Raw Data records with segments, captions, metadata, references, and timestamps.
Raw Data ResultReturn Raw Data records and source segments to the next construction stage.
For streaming conversation input, Memind may first buffer messages and seal a conversation segment before Raw Data processing begins.

Raw Data records

A Raw Data record stores source-level information about one processed segment. Conceptually, it contains:
FieldMeaning
idUnique Raw Data segment identifier.
memoryIdThe memory namespace this Raw Data belongs to.
contentTypeThe type of source content, such as conversation content.
sourceClientThe client or integration that produced the content.
contentIdA content fingerprint used for idempotency.
segmentThe source segment, including content, boundary, and metadata.
captionA compact semantic summary of the segment.
captionVectorIdThe vector index ID for the caption embedding.
metadataAdditional source or application metadata.
resourceIdOptional reference to a stored resource.
mimeTypeOptional MIME type for file or resource content.
createdAtWhen the Raw Data record was created.
startTimeSource start time when available.
endTimeSource end time when available.
You usually do not need to manipulate these fields directly. They explain what Memind preserves so later memory layers can remain traceable.

Segments

A segment is the source unit that Raw Data persists. For conversation content, a segment may represent a range of messages. For other content types, it may represent a character range, parsed section, or processor-defined chunk. A segment can include:
Segment fieldMeaning
contentThe source text for this segment.
captionThe generated semantic summary for the segment.
boundaryThe source boundary, such as message range or character range.
metadataSegment-level metadata.
Segments are important because Raw Data should preserve context at a useful granularity. A whole conversation or file may be too large to inspect or retrieve as one unit, while a single sentence may be too small to preserve meaning. The goal is to keep source context coherent enough for later extraction, inspection, and retrieval.

Captions

Captions summarize Raw Data segments into compact semantic context. A caption does not replace the original content. It gives the segment a searchable and human-readable representation. Captions are useful because they:
  • make source segments easier to browse in Memind UI
  • provide compact source-level retrieval text
  • reduce noise when searching Raw Data
  • help LLMs understand the background behind extracted items
  • preserve a semantic view of the original source segment
  • provide text that can be embedded for vector search
This makes Raw Data more than a source archive. With captions, Raw Data becomes a searchable context layer behind Memory Items.

Metadata and source references

Raw Data can carry source metadata and references. This is useful when memory comes from multiple applications, agents, files, URLs, or tools. Common metadata and references include:
ConceptPurpose
sourceClientIdentifies which client, integration, or application produced the content.
contentTypeDescribes how the content should be processed.
resourceIdLinks Raw Data back to a stored resource.
mimeTypePreserves resource type information for files or external content.
startTime / endTimePreserves the time range represented by the source segment.
metadataCarries application-specific source information.
These fields make Raw Data useful for filtering, inspection, debugging, and downstream processing.

Vector indexing

Raw Data can be indexed for semantic search. Memind vectorizes Raw Data captions and stores the resulting vector IDs on Raw Data records. This allows source-level evidence to participate in retrieval without embedding the entire original content as the only searchable representation. Raw Data vector indexing is separate from Memory Item and Insight indexing.
Indexed layerWhat is embedded
Raw DataSegment captions.
Memory ItemsStructured memory item content.
InsightsConsolidated insight content.
This separation lets Memind retrieve at different levels of abstraction: source evidence, structured memory, and higher-level understanding.

Idempotency

Raw Data processing uses content identity to avoid duplicate work. Each raw input can produce a contentId, which acts as a fingerprint of the original content. Before processing, Memind can check whether the same content has already been stored for the same memory namespace. If the content already exists, Memind can return the existing Raw Data instead of writing duplicate source records. This is useful when:
  • the same conversation batch is submitted more than once
  • ingestion is retried after a failure
  • integrations resend previously observed content
  • applications want safer repeated writes
Idempotency helps keep the source layer clean.

Relationship to Memory Items

Raw Data is the input evidence for Memory Item extraction. After Raw Data is created, Memind extracts structured Memory Items from it. Those items can keep references back to the Raw Data record that produced them. This relationship is important:
Raw Data
  -> one or more Memory Items
  -> graph, threads, insights, and retrieval context
A single Raw Data segment can produce multiple Memory Items. Some Raw Data may produce no durable items if there is nothing worth keeping. Either way, the source remains inspectable. This also means an agent can reason with both levels:
  • Memory Items provide concise durable facts.
  • Raw Data captions provide the broader context behind those facts.
Together, they help Memind avoid returning memory as disconnected fragments.

Inspecting Raw Data

Memind UI lets developers inspect the Raw Data layer. The Raw Data view is useful when you want to understand:
  • what source content was ingested
  • how content was segmented
  • what captions were generated
  • which source client produced the data
  • what metadata was attached
  • what time range the source segment represents
  • whether downstream memory came from the expected source
  • what broader context sits behind a Memory Item
This is especially helpful when tuning extraction behavior or debugging unexpected retrieval results.

Common use cases

Raw Data is useful in several development workflows:
Use caseWhy Raw Data helps
Debug extractionCheck whether a Memory Item came from the expected source context.
Audit memory qualityInspect source segments when extracted memory looks wrong or incomplete.
Improve ingestionSee how conversations, files, or custom content are segmented and captioned.
Trace retrieval evidenceUnderstand which source segments support a retrieved context.
Recover background contextRetrieve the broader topic context behind concise Memory Items.
Extend content supportValidate custom parsers, resource fetchers, and Raw Data plugins.

Design principle

Raw Data keeps Memind grounded. Structured memory is useful only when it remains connected to the source context that produced it. Raw Data gives Memind a durable evidence layer, and captions make that evidence semantic, searchable, and usable by agents. Memind does not only remember what was extracted. It also preserves the context that made those memories meaningful.