How to Handle Entities Spanning Multiple Sentences
The Problem
Consider this passage: "The checkout service is our highest-traffic application. It handles over 50,000 transactions per hour during peak periods. To manage this load, it relies on Redis for session caching and PostgreSQL for persistent storage."
Sentence-level extraction finds: sentence 1 has "checkout service" (entity), sentence 2 has "it" (pronoun, no entity), sentence 3 has "it" (pronoun), "Redis" (entity), and "PostgreSQL" (entity). The relationships (checkout service depends on Redis, checkout service depends on PostgreSQL) span sentences 1 and 3, connected by the pronoun "it" in sentences 2 and 3.
Without cross-sentence processing, you extract the entities but miss the relationships. The graph has nodes for checkout service, Redis, and PostgreSQL, but no edges connecting them. This is the most common source of missing relationships in knowledge graphs built from text.
Multi-Sentence Passage Processing
The simplest fix is to process text in passages of 5 to 10 sentences rather than individual sentences. This gives the extraction system enough context to see both the entity definition (sentence 1) and the relationship (sentence 3) in the same input. LLMs handle this naturally, as they process the full passage and understand that "it" in sentence 3 refers to "checkout service" from sentence 1.
For NER models that process sentence by sentence, run coreference resolution on the passage first. Replace pronouns with their antecedents, then extract entities from the resolved text. This transforms the passage into: "The checkout service is our highest-traffic application. The checkout service handles over 50,000 transactions per hour during peak periods. To manage this load, the checkout service relies on Redis for session caching and PostgreSQL for persistent storage." Now sentence-level extraction finds the relationships because the entity name appears explicitly in every sentence.
Chunking Strategy
When splitting documents into passages for extraction, use paragraph boundaries rather than sentence counts. Most paragraphs contain a coherent topic with its entities and relationships. Splitting mid-paragraph puts the subject of a relationship in one chunk and the object in another, making the relationship invisible.
When paragraphs are very long (more than 1,000 tokens), split at sentence boundaries while keeping at least 5 sentences together. Add 2 to 3 sentence overlap between consecutive chunks so that relationships near the split point are captured in at least one chunk.
When paragraphs are very short (less than 100 tokens), combine consecutive paragraphs until you reach at least 300 tokens. Short passages lack enough context for reliable entity disambiguation and relationship extraction.
Discourse-Level Entities
Some entities are defined through progressive elaboration across an entire section. "The new authentication system" might be introduced in a heading, its components described across three paragraphs, and its relationships to other systems stated in a fourth paragraph. Passage-level extraction captures relationships within each paragraph, but the full picture requires aggregating entities and relationships across the entire section.
For these cases, extract at the passage level for relationship accuracy, then aggregate and deduplicate at the section or document level for completeness. The entity "authentication system" might be extracted multiple times from different passages with different relationship fragments. The deduplication step merges these into a single node with all its connections.
Adaptive Recall processes memories as complete passages rather than splitting them into sentences, which naturally captures cross-sentence relationships. When longer documents are stored, the extraction pipeline maintains context across passages so that entity references are resolved and relationships are captured regardless of where they appear in the text.
Store memories of any length. Adaptive Recall extracts entities and relationships with full cross-sentence context.
Try It Free