ADR-006: Multi-Factor Scoring Engine #

Status: Proposed Date: 2026-04-09

Context #

Reverie retrieval currently uses a 4-factor scoring pipeline in ChunkStore::hybrid_search() (crates/reverie-store/src/backends/sqlite_vec.rs). The Chunk data model (crates/reverie-store/src/chunk.rs) carries 4 additional neuroscience-grounded fields that are stored but not yet wired into search ranking. This ADR documents the full 8-factor scoring architecture — what’s live, what’s parked, and the formulas.

The 8 Factors #

Group A — Search-time (live in sqlite_vec.rs) #

#FactorSourceFormulaStatus
1BM25 text relevanceFTS5 bm25() functionSQLite-native; scores are negative (lower = better)Live
2Vector similaritysqlite-vec cosine distancebge-large-en-v1.5 embeddings, 1024-dimLive (placeholder when no embedder)
3RRF fusionReciprocal Rank Fusionscore(d) = Σ 1/(k + rank_i(d)) with k=30Live
4Time-decay boostupdated_at timestampdecay = 1.0 + 0.3 * exp(-age_days / 30.0)Live

Group B — Lifecycle metadata (stored, not yet in ranking) #

#FactorFieldRangeNeuroscience modelStatus
5Synaptic strengthstrength: f320.0–1.0SHY (Tononi/Cirelli) — decays during dream downscaleStored, write-path parked
6Depth scoredepth_score: u81–31=episodic (hippocampal), 2=intermediate, 3=semantic (neocortical)Stored, defaults to 2
7Session spreadsession_spread: u321–∞Cross-session reactivation count (Hebbian co-activation)Stored, defaults to 1
8Stabilitystability: f320.0–∞Ebbinghaus S parameter — higher = slower forgettingStored, defaults to 1.0

Supporting fields (inputs to the 8 factors, not independent factors) #

FieldTypeRole
staleness_scoref32Computed: time_since_access * decay_rate_per_kind — input to prune decisions
signal_scoref32Computed: access_frequency * revision_count * kind_weight — input to promote/demote
access_countu32Raw access counter — feeds signal_score and session_spread
revision_countu32Upsert counter — feeds signal_score
consolidation_statusenumStaged → Consolidated → Archived — lifecycle state, not a ranking factor

Scoring Pipeline #

Current (v0.2) #

query
  ├─ FTS5 BM25 → top-100 by text relevance
  ├─ sqlite-vec → top-100 by vector distance

  └─ RRF fusion (k=30)

       └─ time-decay boost

            └─ final ranked list, truncated to k
// RRF: merge two ranked lists
for (rank, id) in fts_ranked.iter().enumerate() {
    *scores.entry(id).or_insert(0.0) += 1.0 / (RRF_K + (rank + 1) as f32);
}
for (rank, id) in vec_ranked.iter().enumerate() {
    *scores.entry(id).or_insert(0.0) += 1.0 / (RRF_K + (rank + 1) as f32);
}

// Time-decay: recent → 1.3x boost, old → 1.0x (no penalty)
let decay = 1.0 + RECENCY_BOOST * (-age_days / TAU).exp();
let final_score = rrf_score * decay;

Target (v0.4+) #

query
  ├─ FTS5 BM25 → top-100
  ├─ sqlite-vec → top-100

  └─ RRF fusion (k=30)

       └─ time-decay boost (factor 4)

            └─ lifecycle re-rank:
                 score *= strength          (factor 5: SHY decay)
                 score *= depth_weight(d)   (factor 6: depth bonus)
                 score *= log(1 + spread)   (factor 7: cross-session signal)
                 score *= stability_decay() (factor 8: Ebbinghaus curve)

                 └─ final ranked list

The lifecycle re-rank multipliers are designed to be neutral at defaults:

This means newly ingested chunks score identically to today’s pipeline. Only after dream cycles modify these fields does the lifecycle re-rank diverge from baseline.

Dream Cycle Interactions #

Each dream phase reads and writes different factors:

Dream phaseReadsWrites
ScanIdentifies candidates by consolidation_status=Staged
Consolidatesession_spread, agestrength (replay delta: max * recency * ln(1+peers))
Downscalestrengthstrength (global SHY decay, todo!() in v0.2)
Prunestaleness_score, revision_count, duplicate_countSoft-delete (sets deleted_at)
Placeconsolidation_status, activity scoreconsolidation_status (Staged→Consolidated)
Promotesignal_score, depth_scorecanonical_layer, depth_score

Consolidate formula (live in crates/reverie-dream/src/phases/consolidate.rs) #

/// delta = max * (1 / (1 + age_h)) * ln(1 + peer_count), clamped to [0, max]
pub fn compute_replay_delta(peer_count: usize, age_hours: f64, max_delta: f64) -> f64 {
    let recency = 1.0 / (1.0 + age_hours.max(0.0));
    let peer_factor = (1.0 + peer_count as f64).ln();
    (max_delta * recency * peer_factor).clamp(0.0, max_delta)
}

Default max_strength_delta = 0.1, min_strength_floor = 0.01.

Neuroscience Mapping #

Biological conceptSystem factorMechanism
SHY (Synaptic Homeostasis Hypothesis)strengthGlobal downscale during dream; strong traces survive, weak decay
Ebbinghaus forgetting curvestabilityS parameter controls decay rate; higher S = flatter curve
Systems consolidationdepth_scoreHippocampal (episodic, depth=1) → neocortical (semantic, depth=3) over repeated access
Hebbian learningsession_spread”Neurons that fire together wire together” — cross-session reactivation strengthens traces
Sharp-wave ripplesConsolidate phaseFast-forward replay of recent patterns, strengthening co-activated traces
CLS (Complementary Learning Systems)Interleaved pairsMix new and old patterns to prevent catastrophic interference
Behavioral taggingimportance_tagHigh-salience events get persistent tags that resist forgetting
Reconsolidation (Nader)Place phaseRetrieved memories become labile and must be restabilized — the act of re-placing is restabilization

Consequences #

Positive:

Negative:

Migration path: Factors 5-8 can be enabled incrementally behind feature flags. Each factor is an independent multiplier; enabling one doesn’t require the others.