Home » AI Memory System Design » Simple to Structured

How to Migrate from Simple to Structured Memory

Migrating from simple flat-text memory to structured memory with entities, metadata, and relationships is the most impactful upgrade you can make to an existing AI memory system. It enables metadata filtering, lifecycle management, entity-based retrieval, and cognitive scoring, none of which work on unstructured text blobs. The migration can run alongside your production system without downtime if you follow a staged approach.

Before You Start

You need read access to your current memory store and a clear understanding of what format your existing memories are in. Most simple memory systems store memories as plain text strings with minimal metadata (usually just a timestamp and a user identifier). Some include an embedding vector. You also need a target structured schema defined, which means you should complete the architecture design process first. Do not try to design the target schema and migrate at the same time, because discoveries during migration will force schema changes that require re-processing already-migrated memories.

Step-by-Step Migration

Step 1: Audit your current memory store.
Before migrating, understand what you have. Export a representative sample of your current memories (at least 100, ideally 500) and analyze them. What types of information do they contain? How long is the average memory? What metadata exists (timestamps, user IDs, source tags)? How many memories per user on average? Are there duplicates or near-duplicates? What fraction of memories are still relevant versus stale? This audit tells you what your extraction pipeline needs to handle and how much cleanup the migration should include. If 40% of your memories are duplicates of slightly different phrasings, your migration should include deduplication. If 30% reference time-sensitive information that is now outdated, your migration should include a staleness check. Document the audit findings, including example memories for each type you identify. You will use these examples to validate your extraction pipeline in Step 4.
Step 2: Design your target schema.
Define the structured memory object you are migrating to. A production schema typically includes: content (the cleaned, normalized memory text), content_type (episodic, semantic, procedural, preference), entities (list of extracted entities with types: person, product, concept, organization), relationships (connections to other memories or entities), metadata including created_at, updated_at, last_accessed, access_count, confidence score, source (conversation, import, system), topic_category, and tenant_id. Also define which fields are required (content, content_type, tenant_id, created_at) and which are optional (entities may be empty for simple factual memories, relationships are populated during graph building). The schema should support your retrieval patterns: if you need to filter by topic_category, it must be an indexed field. If you need to sort by confidence, it must be a numeric field with a default value.
Step 3: Build the extraction pipeline.
The extraction pipeline transforms flat text memories into structured memory objects. It has four stages. Content cleaning: normalize the text by fixing encoding issues, removing system-generated boilerplate, standardizing formatting, and splitting compound memories into individual facts (a single text blob that contains "User prefers dark mode. User is on enterprise plan. User had billing issue last month." should become three separate memories). Entity extraction: use an LLM or NER model to identify entities in the memory content. For most applications, an LLM prompt that asks "Extract all named entities and their types from this text" works well for memories under 500 tokens. For high-volume migration, a fine-tuned NER model is faster and cheaper. Classification: assign content_type and topic_category based on the memory content. A rule-based classifier works for content_type (memories containing "prefers" or "likes" are preferences, memories describing events with timestamps are episodic, memories stating facts are semantic). Topic classification may require an LLM call or a lightweight text classifier. Metadata enrichment: assign default values for fields that do not exist in the source data. Access_count defaults to 1. Confidence defaults to 5.0 (neutral). Last_accessed defaults to created_at.
Step 4: Run a test migration on a small subset.
Pick 200 to 500 memories from your audit sample and run them through the extraction pipeline. Manually review at least 50 of the results and check: is the content correctly cleaned and split? Are entities extracted accurately (no missed entities, no false entities)? Are content types and categories assigned correctly? Are metadata defaults reasonable? Identify systematic errors and fix them in the pipeline before proceeding. Common issues include: entity extraction hallucinating entities that are not in the text (tighten the extraction prompt to be more conservative), content splitting that breaks compound sentences incorrectly (adjust the splitting heuristic), and classification errors for ambiguous memories (add more classification rules or examples). Iterate on the pipeline until you see fewer than 10% errors on your test set. Perfect accuracy is not the goal because you can clean up individual errors later, but systematic errors that affect 30% of memories need to be fixed before the full migration.
Step 5: Execute the full migration.
Run all existing memories through the extraction pipeline in batches. Process memories in batches of 100 to 500, with progress tracking (log the batch number, success count, error count, and elapsed time). Store failed memories in a separate error queue for manual review rather than blocking the pipeline. Run the migration during low-traffic hours if your extraction pipeline uses the same resources as production queries. For large memory stores (100,000+ memories), the migration may take hours or days. Build it to be resumable: if the pipeline crashes at batch 450 of 1,000, it should resume from batch 450 without re-processing earlier batches. Write migrated memories to the new structured store alongside the old store, not as a replacement. Both stores should be active simultaneously until you validate that the migration is correct and complete.
Step 6: Switch the write path.
Update your application to write new memories in the structured format. This means updating your memory creation code to run new memories through the extraction pipeline before storage (entity extraction, classification, metadata assignment). The simplest approach is to use the same extraction pipeline you built for migration, adapted to process single memories in real-time rather than batches. New memories should be written to the new structured store only. At this point, your system is dual-reading (querying both old and new stores) and single-writing (new memories go to the structured store only). Retrieval should query the new store first and fall back to the old store for memories that have not been migrated. Over time, all retrieval will come from the new store as the old store becomes stale.
Step 7: Validate and clean up.
After the migration is complete and new memories are flowing into the structured store, validate that retrieval quality has improved. Run the same queries against the new store and compare results to the old store. You should see: more relevant results due to entity-based retrieval picking up related memories, better ranking due to metadata-informed scoring, and equivalent or better recall (no relevant memories lost in migration). If validation passes, deprecate the old store by removing the fallback query path and eventually deleting the old data. Keep a backup of the old store for at least 30 days after deprecation in case you discover migration gaps that need correction.

Migration Without Downtime

The staged approach described above allows migration without any application downtime. At every step, the existing system continues to function. The old store is not modified during migration. The write path switch is a configuration change that can be rolled back instantly. And the old store fallback ensures that no memories are lost during the transition period. The only visible change to users is improved retrieval quality, which is the whole point.

Adaptive Recall stores memories in a structured format from day one, with automatic entity extraction, knowledge graph building, and metadata management. Start structured and skip the migration entirely.

Get Started Free