Home » Entity Extraction and NER » Extract with LLM

How to Extract Entities from Text with an LLM

LLM-based entity extraction uses a large language model to identify entities from unstructured text and return them as structured data. You write a prompt that describes what to extract, the model applies its understanding of language and context to identify entities, and you get back JSON that downstream systems can process. This approach handles domain-specific entity types without training data, making it the most flexible extraction method available in 2026.

Why Use an LLM for Entity Extraction

Traditional NER models recognize a fixed set of entity types: person, organization, location, date. If your application needs to extract "microservices," "API endpoints," "deployment environments," or "database tables," a pre-trained NER model will not find them. Fine-tuning a NER model requires 200 to 500 labeled examples per entity type, which means annotating text by hand before you can extract anything.

LLMs solve this cold-start problem. By describing your entity types in a prompt, you can extract domain-specific entities from the first document without any training data. The model leverages its understanding of language, context, and world knowledge to identify entities that match your descriptions. This makes LLM-based extraction the right choice when your entity types are specialized, when they evolve over time, or when you need to start extracting immediately without a labeling phase.

Step-by-Step Process

Step 1: Define your entity types.
Start by listing the categories of things you need to extract. Be specific about what each type includes and excludes. A vague type like "thing" will produce noisy results. A specific type like "cloud infrastructure service (AWS, GCP, or Azure services, not generic software)" gives the model clear boundaries. For most applications, 5 to 10 entity types cover the domain well. More types mean more specificity but also more confusion at the boundaries between types.
ENTITY_TYPES = { "Person": "Individual people referenced by name or role", "Organization": "Companies, teams, departments, open-source projects", "Technology": "Programming languages, frameworks, libraries, protocols", "Service": "Deployed services, APIs, microservices, third-party SaaS", "Infrastructure": "Databases, message queues, caches, cloud services", "Concept": "Architectural patterns, methodologies, standards" }
Step 2: Design the extraction prompt.
The prompt should describe each entity type, specify the output format, and include instructions for handling edge cases. Ask for canonical names (full names, not abbreviations) and aliases (other names used in the text for the same entity). Specifying JSON output with a clear schema makes parsing reliable. Include 1 to 2 examples in the prompt if the entity types are unusual or ambiguous.
EXTRACT_PROMPT = """You are an entity extraction system. Extract all entities from the text below. Entity types: - Person: individual people referenced by name or role - Organization: companies, teams, departments, open-source projects - Technology: programming languages, frameworks, libraries, protocols - Service: deployed services, APIs, microservices, third-party SaaS - Infrastructure: databases, message queues, caches, cloud services - Concept: architectural patterns, methodologies, standards For each entity, return: - name: the canonical full name (not abbreviations or pronouns) - type: one of the types listed above - aliases: list of other names used for this entity in the text - confidence: 0.0 to 1.0 based on how clearly the entity is mentioned Return a JSON array. Only include entities explicitly mentioned in the text. Do not infer entities that are not stated. Text: {text}"""
Step 3: Chunk your input text.
LLMs process text in tokens, and longer inputs increase cost and reduce extraction quality as the model has more to track. Split documents into passages of 500 to 1,000 tokens. Use paragraph or section boundaries when available rather than splitting mid-sentence. Include 50 to 100 tokens of overlap between consecutive chunks so that entities mentioned at the boundary of one chunk are captured in the next.
def chunk_text(text, max_tokens=800, overlap_tokens=100): words = text.split() chunks = [] start = 0 while start < len(words): end = min(start + max_tokens, len(words)) chunk = ' '.join(words[start:end]) chunks.append(chunk) start = end - overlap_tokens if start >= len(words): break return chunks
Step 4: Run extraction and parse results.
Send each chunk to the LLM with your extraction prompt and parse the JSON response. Handle cases where the LLM returns malformed JSON by wrapping the parse in error handling and retrying with a clarifying prompt. Collect all entities from all chunks into a single list for deduplication.
import json from anthropic import Anthropic client = Anthropic() def extract_entities(text_chunk): response = client.messages.create( model="claude-sonnet-4-6", max_tokens=2000, messages=[{ "role": "user", "content": EXTRACT_PROMPT.replace("{text}", text_chunk) }] ) try: return json.loads(response.content[0].text) except json.JSONDecodeError: return [] all_entities = [] for chunk in chunk_text(document_text): entities = extract_entities(chunk) all_entities.extend(entities)
Step 5: Deduplicate and normalize.
The same entity will appear in multiple chunks, sometimes with different names. Merge duplicates using string similarity (SequenceMatcher or Levenshtein distance) and alias matching. When two entries have a similarity score above 0.85, merge them by keeping the longer canonical name and combining alias lists. For ambiguous cases, use the LLM to judge whether two names refer to the same entity.
from difflib import SequenceMatcher def deduplicate(entities, threshold=0.85): canonical = {} for entity in entities: name_lower = entity["name"].lower().strip() merged = False for key in canonical: sim = SequenceMatcher(None, name_lower, key).ratio() if sim >= threshold: canonical[key]["aliases"].update( entity.get("aliases", [])) canonical[key]["aliases"].add(entity["name"]) merged = True break if not merged: canonical[name_lower] = { "name": entity["name"], "type": entity["type"], "aliases": set(entity.get("aliases", [])), "confidence": entity.get("confidence", 0.8) } return list(canonical.values())
Step 6: Validate against source text.
Sample 50 to 100 extracted entities and verify each one against the original text. Check for false positives (entities the LLM hallucinated), false negatives (entities present in the text that were missed), type errors (correct entity, wrong type), and normalization errors (wrong canonical name). Each error type points to a specific prompt improvement. Expect 75 to 85% F1 on the first pass and 90%+ after two or three prompt iterations.
Cost optimization: For large document sets, run extraction on a representative sample first to validate your prompt before processing the full corpus. A sample of 50 documents is usually enough to identify prompt issues. This avoids spending hundreds of dollars on LLM calls with a prompt that produces poor results.

Production Considerations

In production, add rate limiting to avoid hitting API throttles, implement retry logic with exponential backoff for transient failures, and store raw LLM responses alongside parsed entities so you can re-parse if your schema changes. Cache results so that re-processing the same document does not incur additional LLM costs. Track extraction quality over time by periodically sampling and validating results, because prompt performance can drift as document characteristics change.

Adaptive Recall handles LLM-based entity extraction automatically during memory storage. When you store a memory through the MCP tools, entities are extracted using a tuned extraction pipeline, deduplicated against the existing entity inventory, and added to the knowledge graph. The extraction quality improves as the entity inventory grows because disambiguation becomes more reliable with more context about known entities.

Skip the extraction pipeline. Adaptive Recall extracts entities automatically when you store memories, building your knowledge graph with zero configuration.

Try It Free