Source: Search engineering practitioner literature; production patterns across Coveo, Vespa, Algolia, custom implementations

Classification — Fusion pattern combining per-path scores with tunable weights, typically after normalization.

Intent

Combine scores from multiple retrieval methods using explicit per-method weights, supporting per-query-type tuning and integration with learned ranking models that need calibrated combined scores.

Motivating Problem

RRF (prior entry) is robust but doesn't use score magnitudes. When the underlying retrieval methods produce useful score information — not just rank order, but how confident each method is — score-based fusion can outperform RRF. The challenge is normalizing scores across methods with different scales, and choosing weights that produce good results. Weighted hybrid scoring handles this through explicit normalization and tunable weights, at the cost of more configuration than RRF requires.

How It Works

Score normalization. Each retrieval method's scores are normalized to a common scale, typically [0, 1]. Min-max normalization (per-query, using the min and max scores in the candidate set) is common; z-score normalization (using the distribution mean and standard deviation) is an alternative. Normalization is per-query rather than global because score distributions vary substantially across queries.

Weighted combination. The fused score for a document is a weighted sum of its normalized scores from each method: score = w_lexical \ lexical_norm + w_dense \ dense_norm + w_sparse * sparse_norm. Weights are typically tuned via evaluation against labeled data, A/B testing, or learning-to-rank approaches that treat the per-method scores as features.

Per-query-type weighting. Different query types deserve different fusion weights. Navigational queries ("SKU-12345") typically weight lexical higher; informational/discovery queries weight dense higher; conversational queries weight dense plus reranking higher. Production systems may have multiple weight profiles selected by query routing (Section E).

Documents not in all lists. A document may appear in some retrieval methods' candidate sets but not others. The pattern handles this with explicit decisions: missing scores can be treated as zero (the document gets no contribution from that method but isn't penalized further); or as a small penalty (the document is slightly disfavored relative to documents that all methods returned). The choice affects ranking and should be tuned.

Comparison with RRF. Weighted hybrid scoring can outperform RRF when: (1) the per-method scores carry useful magnitude information beyond rank order; (2) labeled training data is available for tuning weights; (3) per-query-type tuning is feasible. RRF tends to outperform weighted hybrid when the underlying scores are poorly calibrated, when training data is limited, or when simplicity and operational robustness matter more than peak optimization.

When to Use It

Mature search deployments with labeled training data and evaluation infrastructure. Per-query-type optimization where different query types deserve different weights. Cases where the underlying retrieval methods produce well-calibrated, meaningful scores. Integration with learning-to-rank pipelines where per-method scores feed downstream LTR models.

Alternatives — RRF (prior entry) for simpler deployments or when tuning data isn't available. Learning-to-rank over per-method scores when sufficient training data and infrastructure support the more complex approach.

Sources

Search engineering practitioner literature on hybrid retrieval tuning
Vespa rank profile documentation (supports weighted multi-phase ranking natively)
Coveo machine learning ranking documentation