Raw Data - Memind

Raw Data is Memind’s source-level semantic evidence layer. Before Memind extracts structured Memory Items, connects knowledge through the Item Graph, builds Memory Threads, or consolidates Insights, it first preserves what was actually observed. Raw Data keeps source context in a form that can be inspected, searched, and referenced by later memory layers. The key idea is simple:

Raw Data keeps memory grounded in source context.

Overview

Raw Data is not just archived input. It is the source layer behind Memind’s structured memory system. Each Raw Data record preserves a source segment, its metadata, source references, timing information, and a generated caption. That caption is important. It turns a raw source segment into a compact semantic context that can be searched, inspected, and used to understand the background behind extracted Memory Items. A Memory Item tells Memind what durable memory was extracted. Raw Data tells Memind where that memory came from and what broader context shaped it.

Raw source context
  -> topic-aware Raw Data segment
  -> semantic caption
  -> structured Memory Items

This is one of the main differences between Memind and memory systems that only retrieve isolated facts.

Why Raw Data exists

Many memory systems jump directly from conversation text to extracted facts. That works for simple preferences, but it often loses the surrounding context. An agent may retrieve several correct items, but still miss the larger situation that produced them. For example, the agent might retrieve:

the user uses Java 21
the user prefers stable tools
the user is migrating a service

Those facts are useful, but they are still fragments. They do not explain the project background, the decision path, the tradeoffs discussed, or why those facts are related. Raw Data exists to preserve that missing layer. It gives Memind:

source context before higher-level memory is extracted
traceability from Memory Items back to observed content
searchable captions for source-level evidence
a way to recover broader context behind isolated items
an inspection layer for debugging memory quality
a stable foundation for future processing and reprocessing

Raw Data makes memory explainable instead of opaque.

Why captions matter

Raw Data captions are not just summaries. In Memind, source content can be segmented around meaningful context boundaries, such as a topic, workflow, decision, incident, or conversation shift. For conversation content, LLM-based segmentation can identify these semantic boundaries instead of relying only on arbitrary fixed-size chunks. That means a Raw Data segment can represent a coherent slice of context. The caption then becomes a compact semantic handle for that segment. This matters because Memory Items are intentionally concise. They capture durable facts, preferences, events, directives, tool experience, playbooks, or resolutions. But an item alone may not carry the full background of the conversation that produced it. Raw Data captions keep that background searchable. Instead of retrieving only scattered items, Memind can also retrieve the source-level context behind those items:

Raw conversation context
  -> semantic source segment
  -> captioned context
  -> extracted Memory Items

This helps agents understand not only what was remembered, but also why that memory exists, what surrounding context shaped it, and how multiple extracted items relate to the same source situation. A plain memory store may return isolated facts. Memind can return facts with their source context.

Where Raw Data fits

Raw Data is created early in the memory construction flow.

Raw Input
  -> Raw Data
  -> Memory Items
  -> Item Graph
  -> Memory Threads
  -> Insight Tree

Raw Data is the first durable memory layer. It sits between raw input and structured memory extraction. Later layers depend on it:

Layer	How it uses Raw Data
Memory Items	Extract durable facts, preferences, events, directives, tool experience, playbooks, and resolutions from Raw Data.
Item Graph	Connects extracted items that keep source references back to Raw Data.
Memory Threads	Groups related items that were originally extracted from Raw Data.
Insight Tree	Consolidates higher-level understanding from items while preserving evidence through item and source references.
Retrieval	Can return raw-data captions and source evidence alongside structured memory results.

Raw Data is the bridge between what happened and what Memind remembers.

Processing flow

Raw Data processing turns raw input into source-level records. At a high level, the flow is:

Raw Input
  -> Content Resolution
  -> Resource Preparation
  -> Segmentation
  -> Caption Generation
  -> Vectorization
  -> Persistence
  -> Raw Data Result

Each step has a specific role.

Step	Purpose
Content Resolution	Resolve content type, metadata, source client, extraction config, and content identity.
Resource Preparation	Parse files, fetch URLs, apply plugins, and normalize the source payload when needed.
Segmentation	Split source content into meaningful, inspectable units with boundaries and source timing.
Caption Generation	Generate semantic captions for source segments.
Vectorization	Embed captions and store vector IDs for source-level retrieval.
Persistence	Store Raw Data records with segments, captions, metadata, references, and timestamps.
Raw Data Result	Return Raw Data records and source segments to the next construction stage.

For streaming conversation input, Memind may first buffer messages and seal a conversation segment before Raw Data processing begins.

Raw Data records

A Raw Data record stores source-level information about one processed segment. Conceptually, it contains:

Field	Meaning
`id`	Unique Raw Data segment identifier.
`memoryId`	The memory namespace this Raw Data belongs to.
`contentType`	The type of source content, such as conversation content.
`sourceClient`	The client or integration that produced the content.
`contentId`	A content fingerprint used for idempotency.
`segment`	The source segment, including content, boundary, and metadata.
`caption`	A compact semantic summary of the segment.
`captionVectorId`	The vector index ID for the caption embedding.
`metadata`	Additional source or application metadata.
`resourceId`	Optional reference to a stored resource.
`mimeType`	Optional MIME type for file or resource content.
`createdAt`	When the Raw Data record was created.
`startTime`	Source start time when available.
`endTime`	Source end time when available.

You usually do not need to manipulate these fields directly. They explain what Memind preserves so later memory layers can remain traceable.

Segments

A segment is the source unit that Raw Data persists. For conversation content, a segment may represent a range of messages. For other content types, it may represent a character range, parsed section, or processor-defined chunk. A segment can include:

Segment field	Meaning
`content`	The source text for this segment.
`caption`	The generated semantic summary for the segment.
`boundary`	The source boundary, such as message range or character range.
`metadata`	Segment-level metadata.

Segments are important because Raw Data should preserve context at a useful granularity. A whole conversation or file may be too large to inspect or retrieve as one unit, while a single sentence may be too small to preserve meaning. The goal is to keep source context coherent enough for later extraction, inspection, and retrieval.

Captions

Captions summarize Raw Data segments into compact semantic context. A caption does not replace the original content. It gives the segment a searchable and human-readable representation. Captions are useful because they:

make source segments easier to browse in Memind UI
provide compact source-level retrieval text
reduce noise when searching Raw Data
help LLMs understand the background behind extracted items
preserve a semantic view of the original source segment
provide text that can be embedded for vector search

This makes Raw Data more than a source archive. With captions, Raw Data becomes a searchable context layer behind Memory Items.

Metadata and source references

Raw Data can carry source metadata and references. This is useful when memory comes from multiple applications, agents, files, URLs, or tools. Common metadata and references include:

Concept	Purpose
`sourceClient`	Identifies which client, integration, or application produced the content.
`contentType`	Describes how the content should be processed.
`resourceId`	Links Raw Data back to a stored resource.
`mimeType`	Preserves resource type information for files or external content.
`startTime` / `endTime`	Preserves the time range represented by the source segment.
`metadata`	Carries application-specific source information.

These fields make Raw Data useful for filtering, inspection, debugging, and downstream processing.

Vector indexing

Raw Data can be indexed for semantic search. Memind vectorizes Raw Data captions and stores the resulting vector IDs on Raw Data records. This allows source-level evidence to participate in retrieval without embedding the entire original content as the only searchable representation. Raw Data vector indexing is separate from Memory Item and Insight indexing.

Indexed layer	What is embedded
Raw Data	Segment captions.
Memory Items	Structured memory item content.
Insights	Consolidated insight content.

This separation lets Memind retrieve at different levels of abstraction: source evidence, structured memory, and higher-level understanding.

Idempotency

Raw Data processing uses content identity to avoid duplicate work. Each raw input can produce a contentId, which acts as a fingerprint of the original content. Before processing, Memind can check whether the same content has already been stored for the same memory namespace. If the content already exists, Memind can return the existing Raw Data instead of writing duplicate source records. This is useful when:

the same conversation batch is submitted more than once
ingestion is retried after a failure
integrations resend previously observed content
applications want safer repeated writes

Idempotency helps keep the source layer clean.

Relationship to Memory Items

Raw Data is the input evidence for Memory Item extraction. After Raw Data is created, Memind extracts structured Memory Items from it. Those items can keep references back to the Raw Data record that produced them. This relationship is important:

Raw Data
  -> one or more Memory Items
  -> graph, threads, insights, and retrieval context

A single Raw Data segment can produce multiple Memory Items. Some Raw Data may produce no durable items if there is nothing worth keeping. Either way, the source remains inspectable. This also means an agent can reason with both levels:

Memory Items provide concise durable facts.
Raw Data captions provide the broader context behind those facts.

Together, they help Memind avoid returning memory as disconnected fragments.

Inspecting Raw Data

Memind UI lets developers inspect the Raw Data layer. The Raw Data view is useful when you want to understand:

what source content was ingested
how content was segmented
what captions were generated
which source client produced the data
what metadata was attached
what time range the source segment represents
whether downstream memory came from the expected source
what broader context sits behind a Memory Item

This is especially helpful when tuning extraction behavior or debugging unexpected retrieval results.

Common use cases

Raw Data is useful in several development workflows:

Use case	Why Raw Data helps
Debug extraction	Check whether a Memory Item came from the expected source context.
Audit memory quality	Inspect source segments when extracted memory looks wrong or incomplete.
Improve ingestion	See how conversations, files, or custom content are segmented and captioned.
Trace retrieval evidence	Understand which source segments support a retrieved context.
Recover background context	Retrieve the broader topic context behind concise Memory Items.
Extend content support	Validate custom parsers, resource fetchers, and Raw Data plugins.

Design principle

Raw Data keeps Memind grounded. Structured memory is useful only when it remains connected to the source context that produced it. Raw Data gives Memind a durable evidence layer, and captions make that evidence semantic, searchable, and usable by agents. Memind does not only remember what was extracted. It also preserves the context that made those memories meaningful.

​Overview

​Why Raw Data exists

​Why captions matter

​Where Raw Data fits

​Processing flow

​Raw Data records

​Segments

​Captions

​Metadata and source references

​Vector indexing

​Idempotency

​Relationship to Memory Items

​Inspecting Raw Data

​Common use cases

​Design principle