Home » AI Personalization » Adaptive Responses

How to Adapt AI Responses Based on User History

Adapting AI responses based on user history means retrieving what the system knows about a user before generating each response, then using that knowledge to adjust tone, depth, examples, and references so the output matches what that specific user finds most useful. The implementation involves assembling context from memory, injecting it into the AI prompt, and closing the feedback loop so adaptation improves with every interaction.

Before You Start

Response adaptation is the final step in the personalization pipeline. It requires user preferences already stored in a memory layer, a retrieval mechanism that can find relevant preferences quickly (within your latency budget), and an LLM API that accepts dynamic system prompts or context injection. If you have not built the preference capture layer yet, start with the preference engine guide and cross-session learning guide first.

The key design decision is how much of the context window to allocate to personalization. Too little, and the adaptation is too subtle to matter. Too much, and you crowd out space for the user's actual query and the AI's working context. A reasonable starting budget is 300-500 tokens for preference injection, scaling up to 800-1000 tokens if you include episodic memory references. These numbers assume a model with at least a 32K token context window.

Step-by-Step Implementation

Step 1: Build the Context Assembly Pipeline.
Before every AI generation, run a context assembly step that gathers the user's relevant preferences, recent interaction memories, and any episodic context that relates to the current query. The assembly pipeline takes the user's ID and current message as input and returns a structured context block ready for prompt injection.

The assembly pipeline should run in parallel with any other pre-processing (embedding generation, intent classification) to minimize latency. If your memory store supports batch queries, retrieve preferences and episodic memories in a single round trip rather than multiple sequential queries.

async function assembleUserContext(userId, currentMessage) {
  // Run retrievals in parallel
  const [preferences, episodicMemories, recentHistory] = await Promise.all([
    // Get preferences relevant to this message
    recallPreferences(userId, currentMessage, { limit: 15, minConfidence: 0.4 }),
    // Get past interactions related to this topic
    recallEpisodic(userId, currentMessage, { limit: 5, minConfidence: 0.5 }),
    // Get the user's most recent session summary
    getRecentSummary(userId, { sessions: 1 })
  ]);

  return {
    preferences: formatPreferences(preferences),
    episodic: formatEpisodic(episodicMemories),
    recentContext: formatSummary(recentHistory),
    tokenCount: estimateTokens(preferences, episodicMemories, recentHistory)
  };
}

Step 2: Design the Adaptive System Prompt.
Create a system prompt template with designated slots for user-specific context. The base prompt contains your application's core instructions and personality. The adaptive sections contain user preferences, behavioral guidance, and historical context. Keep the sections clearly separated so you can measure the impact of each on response quality.

function buildAdaptivePrompt(basePrompt, userContext) {
  let adaptiveBlock = '';

  // Communication preferences
  if (userContext.preferences.communication) {
    adaptiveBlock += '\n## Communication Style for This User\n';
    adaptiveBlock += userContext.preferences.communication;
  }

  // Domain context
  if (userContext.preferences.domain) {
    adaptiveBlock += '\n## User Technical Context\n';
    adaptiveBlock += userContext.preferences.domain;
  }

  // Negative preferences (things to avoid)
  if (userContext.preferences.negative) {
    adaptiveBlock += '\n## Avoid These\n';
    adaptiveBlock += userContext.preferences.negative;
  }

  // Episodic memory (past relevant interactions)
  if (userContext.episodic && userContext.episodic.length > 0) {
    adaptiveBlock += '\n## Relevant Past Interactions\n';
    adaptiveBlock += userContext.episodic;
  }

  return basePrompt + adaptiveBlock;
}

The order of sections in the adaptive block matters. Place negative preferences (things to avoid) last, closest to the user's message, because LLMs attend more strongly to content near the end of the context. Place communication style preferences early because they shape the entire response. Place episodic memories in the middle where they provide useful context without dominating the response.

Step 3: Implement Tone and Depth Adaptation.
The most immediately noticeable form of adaptation is adjusting how the AI communicates. Convert stored preferences into specific behavioral instructions that the AI can follow consistently. Vague instructions like "be more casual" produce inconsistent results. Specific instructions like "use contractions, short sentences, and skip formal greetings" produce reliable adaptation.

function formatCommunicationPrefs(prefs) {
  const instructions = [];

  if (prefs.tone === 'casual') {
    instructions.push('Use contractions, short sentences, and conversational language.');
    instructions.push('Skip formal greetings and sign-offs.');
  } else if (prefs.tone === 'formal') {
    instructions.push('Use complete sentences without contractions.');
    instructions.push('Maintain professional language throughout.');
  } else if (prefs.tone === 'technical') {
    instructions.push('Use precise technical terminology without simplification.');
    instructions.push('Assume familiarity with standard concepts in the domain.');
  }

  if (prefs.detail_level <= 3) {
    instructions.push('Keep responses concise. Lead with the answer, add detail only if asked.');
    instructions.push('Code examples should be minimal and focused.');
  } else if (prefs.detail_level >= 8) {
    instructions.push('Provide thorough explanations with context and reasoning.');
    instructions.push('Include complete, runnable code examples with comments.');
  }

  if (prefs.prefers_examples) {
    instructions.push('Lead with concrete examples before abstract explanations.');
  }

  return instructions.join('\n');
}

Step 4: Add Historical Reference Injection.
When the user's current query relates to something they have done before, include that context so the AI can reference it naturally. This is the feature that makes users feel genuinely remembered. The AI says "you used JWT tokens for authentication in your last project, should we follow the same approach?" instead of starting from scratch.

Historical references should be specific enough to be useful but brief enough to not consume excessive context. Format each episodic memory as a one-to-two sentence summary with the date and topic. Let the AI decide whether and how to reference it in its response rather than forcing it to mention every retrieved memory.

function formatEpisodic(memories) {
  if (!memories || memories.length === 0) return '';

  const formatted = memories.map(m => {
    const date = new Date(m.timestamp).toLocaleDateString();
    return `[${date}] ${m.summary}`;
  });

  return 'The user has previously worked on related topics:\n' +
    formatted.join('\n') +
    '\nReference these naturally if relevant, but do not force references.';
}

Step 5: Build the Feedback Loop.
After the AI generates an adapted response, capture whether the adaptation was successful. Three signals matter: acceptance (user proceeds with the response without correction), correction (user modifies the AI's approach, indicating a preference mismatch), and explicit feedback (user comments on the personalization itself). Feed each signal back into the preference store.

Corrections are the most valuable signal because they indicate exactly what went wrong. If the AI used casual tone and the user said "be more professional," that is a direct correction of the tone preference. If the AI referenced a past project and the user said "that is outdated, we switched to a different approach," that is a correction of an episodic memory that needs updating. Build correction detection into your post-response processing pipeline.

async function processAdaptationFeedback(userId, response, userReply) {
  // Detect corrections
  const correctionSignals = [
    'no', 'not like that', 'instead', 'actually',
    'too', 'more', 'less', 'stop', 'don\'t'
  ];
  const hasCorrection = correctionSignals.some(
    s => userReply.toLowerCase().includes(s)
  );

  if (hasCorrection) {
    // Extract what the correction implies
    const correctionAnalysis = await analyzeCorrection(response, userReply);
    if (correctionAnalysis.preferenceUpdate) {
      await updatePreference(userId, correctionAnalysis.preferenceUpdate);
    }
  } else {
    // No correction implies the adaptation was acceptable
    // Reinforce the preferences that drove this response
    await reinforcePreferences(userId, response.appliedPreferences);
  }
}

Step 6: Implement Adaptation Guardrails.
Guardrails prevent the adaptation system from making the experience worse. Implement three types. Staleness guards prevent preferences older than a threshold (such as 90 days without reinforcement) from being applied with full weight. Confidence guards prevent low-confidence preferences from making visible changes to the response. Context budget guards ensure that personalization context never exceeds a fixed percentage of the total context window.

The most important guardrail is the fallback: if the personalization pipeline fails (memory store timeout, malformed preferences, excessive latency), the AI should generate a response using only its base prompt. A generic response is always better than a delayed or crashed response. Design the adaptation pipeline as an enhancement layer that degrades gracefully, not as a required component that blocks generation.

Measuring Adaptation Quality

Run A/B tests between adapted and non-adapted responses for the same users. The adapted group should show lower correction rates, faster task completion, higher satisfaction scores (if you collect them), and higher return rates. If the adapted group does not outperform the control group, the adaptation is adding complexity without adding value, and you should simplify the pipeline or improve the preference capture.

Adaptive Recall provides the memory retrieval and cognitive scoring that powers response adaptation. Store observations, retrieve relevant context, and let your AI learn what each user needs.

Start Building Free

How to Adapt AI Responses Based on User History

Before You Start

Step-by-Step Implementation

Measuring Adaptation Quality

Related Articles