RelevantSearch.AI
Pattern · Volume 05 · Section E --- Click models and counterfactual evaluation · Updated May 2026

Click models for bias correction (PBM, Cascade, DBN)

Source: Craswell et al., "An Experimental Comparison of Click Position-Bias Models" (WSDM 2008); Chapelle and Zhang, "Dynamic Bayesian Network Click Model" (WWW 2009); Chuklin, Markov, de Rijke, Click Models for Web Search (2015)

Classification — Probabilistic models of user click behavior that separate relevance signal from presentation bias.

Intent

Model the probability that a user clicks a result as a function of the result's relevance and its position (and other presentation features), so that observed clicks can be decomposed into the underlying relevance signal and the bias from how the result was presented.

Motivating Problem

Users click position 1 about 30–40% of the time on web search regardless of how relevant position 1 actually is, simply because users see it first. Position 10 gets clicked maybe 2–3% of the time, again regardless of relevance. The position effect dwarfs any actual relevance differences for non-top positions; aggregating raw click-through rates as relevance signal treats position bias as if it were relevance. The result: ranking models trained on raw click data learn to put already-high-ranked documents higher, ignoring the relevance signal underneath. Click models address this by explicitly modeling the position effect (and other biases) so the relevance signal can be extracted.

How It Works

Position-Based Model (PBM). The simplest useful click model: P(click | rank, query, doc) = P(examine | rank) × P(click | examine, query, doc). The examine probability depends only on rank; the click-given-examine probability depends only on the query-document relevance. The decomposition lets you estimate per-position examination probabilities from production data, then divide observed clicks by the examination probabilities to recover relevance signal. PBM is widely used as a baseline; more sophisticated models extend it.

Cascade Model. Models the user as walking down the result list: they examine position 1; if it satisfies them, they click and stop; if not, they move to position 2 and repeat. The model captures the intuition that lower positions get less examination because higher positions absorb attention. Cascade is more accurate than PBM for navigational queries (where users typically click the first satisfying result and stop) but less accurate for queries where users explore multiple results.

Dynamic Bayesian Network (DBN). Chapelle and Zhang's more sophisticated model: separate variables for "user examined this position" and "this result was perceived relevant" and "user was satisfied". Captures the case where users continue past results they perceived as relevant if those results turned out not to fully satisfy. DBN handles a wider range of user behavior than simpler models at the cost of more parameters to estimate and more data needed for stable estimates.

Estimating model parameters. The models have parameters (per-position examination probabilities, etc.) that must be estimated from production data. Standard approach: expectation-maximization (EM) on production click logs. The EM iterates between estimating per-query-document relevances given current parameter estimates and estimating parameters given current relevance estimates. Convergence produces stable estimates of both relevances and biases.

Inverse Propensity Scoring (IPS). Once the per-position examination probability is estimated, observed clicks can be reweighted: a click at position 10 (low examination probability) is worth more evidence than a click at position 1 (high examination probability), because the position-10 click is more surprising and more likely to reflect genuine relevance. Joachims and colleagues developed the IPS approach to counterfactual evaluation; the methodology lets production click data drive offline evaluation in unbiased ways.

Limitations. Click models capture some biases but not all. Trust bias (users click results from known sources more), presentation bias (results with rich snippets get more clicks), and brand bias all add complexity that simple position-bias correction doesn't address. More sophisticated models (UBM, DCM, others) handle additional biases at the cost of more complexity. The methodology has limits; production teams calibrate models against ground-truth experiments where available.

When to Use It

Production search teams using implicit signals (Section C) at scale who need to correct for position and other biases. Offline log replay evaluation (Chapter 2) where click prediction is needed for systems that didn't generate the original logs. Counterfactual evaluation: "what would users have done if we'd shown them this candidate ranking instead?"

Alternatives — explicit judgment-based evaluation when implicit signals' biases are hard to handle. Online evaluation (A/B testing or interleaving) that doesn't need counterfactual reasoning because actual user behavior is observed. Click models are essential when implicit signals are the primary evaluation source; they're less needed when explicit judgments or online tests are available.

Sources
  • Craswell et al., "An Experimental Comparison of Click Position-Bias Models" (2008)
  • Chapelle and Zhang, "Dynamic Bayesian Network Click Model" (2009)
  • Chuklin, Markov, de Rijke, Click Models for Web Search (Morgan & Claypool, 2015)
  • Joachims, Swaminathan, Schnabel, "Unbiased Learning-to-Rank with Biased Feedback" (WSDM 2017)

Read in context within Volume 05 →