Base-Level Activation Explained for Developers
The Intuition Behind the Equation
Think about how your own memory works. If someone asks you about a meeting you attended yesterday, you recall it easily. If they ask about a meeting from three months ago, you struggle unless it was particularly important or you have thought about it since. If you attended a recurring weekly meeting, the accumulated familiarity makes it easier to recall than a one-off meeting from the same time period. Base-level activation captures exactly these two effects: recency (how long ago) and frequency (how many times).
The equation computes a single activation value from the complete history of when a memory was accessed. Recent accesses contribute more than old ones (recency), and more accesses contribute more than fewer accesses (frequency). The relationship is not linear in either direction. Instead, it follows a power law, which means the first few accesses have the biggest impact and additional accesses provide diminishing returns. Similarly, the most recent access dominates, and older accesses fade according to how much time has passed.
The Equation
For a memory with n accesses at times t1, t2, through tn, the base-level activation B at the current time t is:
B(t) = ln( sum from i=1 to n of (t - ti)^(-d) )
Breaking this down:
- (t - ti) is the age of the i-th access in seconds.
- (-d) is the decay exponent, typically 0.5.
- The sum aggregates contributions from all accesses.
- ln() compresses the range with a natural logarithm.
Each access contributes a term (t - ti)^(-d) to the sum. This term is large when the access is recent (small age) and small when the access is old (large age). With d = 0.5, an access from 1 second ago contributes 1.0, an access from 100 seconds ago contributes 0.1, an access from 10,000 seconds ago contributes 0.01, and an access from 1,000,000 seconds ago (about 11.5 days) contributes 0.001. The contributions decay as a power of age, not exponentially.
Power Law vs Exponential Decay
Many engineers instinctively reach for exponential decay when they want something to fade over time. Time-to-live values, cache expiration, and rate limiter windows all use exponential models because they are simple and well-understood. But human memory does not follow exponential decay, and using an exponential model for memory retrieval produces unnatural behavior.
The critical difference is the tail. Exponential decay drops to effectively zero after a few time constants. A memory with a 24-hour exponential half-life has less than 0.1% of its original activation after 10 days. It is gone. Power-law decay has a long tail: the same memory retains about 3% of its activation after 10 days and about 1% after 100 days. That 1% might seem trivial, but it means the memory is still accessible if other factors (spreading activation, high confidence) contribute additional activation.
This long tail matches how human memory actually works. You can recall your childhood phone number decades later even though you have not used it in years. You can recognize an old colleague's face even though you have not seen them in a decade. These are memories with very low base-level activation that are nevertheless accessible because the retrieval cue (seeing the face, being asked for the number) provides enough additional activation through spreading and contextual priming.
| Time Since Access | Power-Law (d=0.5) | Exponential (lambda=0.001) |
|---|---|---|
| 1 hour (3600s) | 0.0167 | 0.027 |
| 1 day (86400s) | 0.0034 | 1.2e-38 |
| 1 week (604800s) | 0.0013 | ~0 |
| 1 month (2.6M s) | 0.00062 | ~0 |
The exponential function drops to effectively zero within a day, while the power law maintains meaningful values for weeks. For a retrieval system, this means exponential decay requires manual intervention (refreshing timestamps, setting explicit expiration policies) to keep useful memories alive, while power-law decay lets natural usage patterns determine what stays accessible.
How Frequency Interacts with Recency
The equation sums contributions from all accesses, which means frequency matters alongside recency. Consider two memories: Memory A was accessed once, yesterday. Memory B was accessed ten times over the past month, most recently a week ago. Which has higher activation?
Memory A has a single contribution from an access about 86,400 seconds ago: (86400)^(-0.5) = 0.0034. Its activation is ln(0.0034) = -5.68.
Memory B has ten contributions spread across a month. Even though its most recent access is older than Memory A's, the accumulated contributions from ten accesses sum to approximately 0.012. Its activation is ln(0.012) = -4.42.
Memory B has higher activation despite being less recently accessed, because the accumulated frequency outweighs the recency advantage of Memory A. This matches intuition: a piece of knowledge you have used ten times is more accessible than something you encountered once yesterday. The equation captures this automatically without needing separate "recency" and "frequency" parameters.
Practical Behavior at Different Time Scales
Understanding how activation behaves at different time scales helps you predict and tune retrieval behavior:
- Seconds to minutes: Very high activation. Memories accessed in the current session are almost certainly retrieved. This models the "just saw it" effect where information from seconds ago is trivially accessible.
- Hours: Moderate activation. Memories from earlier today are accessible but starting to compete with other candidates. This is where recency starts to differentiate between "used this morning" and "used last week."
- Days to weeks: Lower activation, but still above threshold for memories with enough frequency. Daily-use patterns create stable activation at this scale. Memories used once are fading but not gone.
- Weeks to months: Low activation. Only frequently accessed or recently reinforced memories remain above threshold. This is where the long tail of power-law decay matters most, keeping important knowledge accessible while letting trivia fade.
Normalization for Retrieval Scoring
Raw base-level activation values are negative numbers (typically -8 to +2) that are not directly comparable to vector similarity scores (0 to 1). To blend activation with similarity in a combined retrieval score, pass the raw value through a sigmoid function:
def normalize(bla):
return 1.0 / (1.0 + math.exp(-bla))This maps the full range of activation values to (0, 1) while preserving relative ordering. A memory with activation -5 maps to about 0.007, activation -2 maps to about 0.12, activation 0 maps to 0.5, and activation +2 maps to about 0.88. The sigmoid is steep around 0, which means the most meaningful distinctions (between moderately accessible and highly accessible memories) get the most resolution in the normalized range.
What Base-Level Activation Does Not Capture
Base-level activation handles recency and frequency but not contextual relevance. A memory that is highly activated (frequently accessed, recently used) might still be irrelevant to the current query. That is why ACT-R uses spreading activation as a second component: it boosts memories that are contextually related to the current query, regardless of their access history. The combination of base-level activation (is this memory generally accessible?) and spreading activation (is this memory relevant right now?) produces retrieval behavior that matches human performance far better than either component alone.
Adaptive Recall computes base-level activation on every retrieval call. Recent, frequently used, and well-connected memories surface first.
Try It Free