Home » Memory Lifecycle Management » Memory Decay Models

Memory Decay Models: From Ebbinghaus to FadeMem

Memory decay models define how quickly stored information loses accessibility over time. The choice of decay model directly affects how your AI memory system balances retaining useful knowledge against cleaning up stale data. Four models dominate the field: Ebbinghaus power-law decay, simple exponential decay, ACT-R's base-level learning equation, and FadeMem's adaptive forgetting. Each makes different assumptions and produces different forgetting behavior.

Ebbinghaus and Power-Law Decay

Hermann Ebbinghaus published the first quantitative study of memory decay in 1885. By memorizing lists of nonsense syllables and testing his own recall at intervals ranging from minutes to weeks, he discovered that retention drops rapidly in the first hours after learning and then levels off into a long, gradual decline. Later researchers identified this curve as a power function: retention at time t equals a constant times t raised to a negative exponent.

The power-law model captures something important about memory that simpler models miss. The rate of forgetting slows down over time. A memory that survives the first day without reinforcement has a better chance of surviving the second day, and a memory that survives the first week has an even better chance of surviving the second week. This produces the characteristic long tail where old memories retain a small but nonzero accessibility for a very long time.

For AI systems, power-law decay means that recently stored memories that are never retrieved will fade quickly, clearing space in the retrieval index. But memories that persist past the initial rapid-decay period remain accessible at low activation for an extended time, giving them a chance to be retrieved and reinforced if they become relevant again. This is more forgiving than exponential models, which drop memories faster and more uniformly.

Simple Exponential Decay

Exponential decay is the most common model in software systems because it is simple to implement and reason about. Activation at time t equals the initial activation times e raised to negative lambda t, where lambda is the decay rate constant. This produces a smooth curve where a fixed percentage of remaining activation is lost in each time period.

The advantage of exponential decay is predictability. A memory with a half-life of 7 days will always have half its current activation after 7 days, regardless of its current level. This makes capacity planning straightforward: you can predict exactly when memories will reach the forgetting threshold and estimate storage turnover rates.

The disadvantage is that exponential decay does not match observed memory behavior. It drops too quickly for old memories, effectively creating a hard cutoff where everything older than a certain age is gone. Human memory experiments consistently show power-law decay, not exponential, and the practical difference matters. Exponential decay would forget your childhood address within a few years of moving out. Power-law decay preserves it at low activation for decades, which matches reality.

For AI systems, exponential decay is appropriate when you want aggressive, predictable cleanup with a clear expiration timeline. Time-sensitive data like news, stock prices, or session state benefits from exponential decay because old values are definitively outdated. For general-purpose memory where old knowledge might remain relevant, power-law decay produces better results.

ACT-R Base-Level Learning

ACT-R's base-level learning equation extends the Ebbinghaus model by incorporating multiple access events. Instead of treating memory as a single creation event followed by decay, ACT-R computes activation as a function of the entire access history. Each time a memory is retrieved, it adds a new contribution to the activation sum, weighted by recency with power-law decay.

The equation computes activation B as the natural log of the sum of (t - ti) raised to negative d, where ti is the time of each access event and d is the decay parameter. This means a memory accessed once decays according to the standard power law, but a memory accessed five times decays much more slowly because each access contributes its own decaying activation term. The sum of five decaying terms is always larger than any single term, and each new access resets the rapid-decay phase.

This model captures the spaced repetition effect. A memory accessed at regular intervals builds cumulative activation that resists decay far more effectively than the same number of accesses clustered together. Five accesses spread over a month produce stronger long-term retention than five accesses in a single day, because each spaced access occurs after some decay has happened, demonstrating that the memory survived a period of disuse.

For AI memory systems, ACT-R's model is the most sophisticated and accurate of the four. It naturally handles the common case where some memories are used heavily and should persist while others are stored once and should fade. The computational overhead is modest because activation can be precomputed and updated incrementally when a memory is accessed. Adaptive Recall uses this model for all activation calculations.

FadeMem: Adaptive Forgetting

FadeMem, introduced by researchers studying long-context language models, takes a different approach to forgetting. Instead of modeling individual memory items, FadeMem applies a forgetting mechanism to the attention weights in transformer architectures. Older tokens in the context window receive exponentially decaying attention weights, which means the model naturally focuses on recent content while gradually losing access to older content.

The key innovation in FadeMem is that the decay rate is adaptive. Instead of a fixed decay parameter, the model learns to adjust the forgetting rate based on the importance of the information. Content that is frequently referenced in subsequent text maintains higher attention weights, while content that is mentioned once and never referenced again decays faster. This produces behavior similar to ACT-R's access-frequency effect but implemented directly in the neural network architecture.

FadeMem is designed for a different problem than the memory lifecycle models discussed above. It operates within a single inference context rather than across a persistent memory store. However, its approach to adaptive decay rates is influential. The idea that different memories should decay at different rates based on demonstrated importance is central to how Adaptive Recall implements importance-based retention adjustments.

Comparing the Models

Each model has distinct characteristics that make it suited to different applications.

Exponential decay provides the most aggressive and predictable forgetting. Use it when information has a clear expiration timeline and old data is definitively wrong rather than potentially useful. Session state, real-time event data, and time-sensitive caches all fit this model well.

Power-law decay provides gentler forgetting with a long tail. Use it when old information might become relevant again and you want the system to retain a faint memory of past knowledge. General-purpose memory stores, research assistants, and knowledge bases benefit from this model.

ACT-R base-level learning provides the most nuanced forgetting by incorporating the full access history. Use it when you want the system to automatically distinguish between frequently used knowledge and one-time observations. This is the best general-purpose model for persistent AI memory because it rewards demonstrated utility.

FadeMem provides context-window forgetting for transformer architectures. Use it when you are managing attention within a single inference call rather than persisting memories across sessions.

Choosing a Decay Rate

Regardless of which model you choose, the decay rate parameter controls how aggressively the system forgets. Higher values mean faster forgetting, lower values mean slower. The right value depends on how quickly information becomes outdated in your domain.

Customer support systems where product updates happen monthly benefit from aggressive decay (d = 0.6 to 0.8 in ACT-R terms). Research or legal systems where knowledge remains relevant for years benefit from conservative decay (d = 0.3 to 0.4). General-purpose assistants work well with the ACT-R default of d = 0.5, which produces a forgetting curve calibrated against human memory data.

ACT-R decay with importance-based retention, tuned for your domain. Build memory that ages gracefully.

Get Started Free