Source: Production methodology at major search teams; query log analytics literature; data engineering patterns for analytical workloads
Classification — The structural pattern for capturing, enriching, and querying production search events for operational analysis.
Capture production search events with sufficient detail and enrichment to support all downstream operational analyses — zero-result investigation, regression detection, A/B test evaluation, query trend analysis — without requiring re-instrumentation each time a new view is needed.
Search teams without proper query logging are operating on speculation. The team doesn't know which queries failed, which results users clicked, what reformulations users made, or whether the system's metrics this week match last week's. The investment in query log infrastructure is foundational; every other operational pattern depends on it being in place. The patterns documented here capture what to log, how to enrich it, and the standard views that mature teams build on top.
Event schema. The query log captures one event per query. Core fields: event_id (unique identifier); user_id (where available; respect privacy constraints); session_id; query_text (as the user typed it); query_normalized (after tokenization for matching purposes, useful for aggregation); timestamp; locale; device_type; result_count (number of documents returned); top_result_ids (IDs of the top N results, typically 10 or 20); query understanding output (intent class, confidence, extracted entities, expanded terms); ranking output (final ordered list with scores). Subsequent events link to the query event by ID: click events (which result was clicked, at what position, after what time); conversion events (purchase, add-to-cart, signup, whatever the downstream business action is); reformulation events (subsequent queries in the same session).
Enrichment. Raw events are enriched post-capture for analytical convenience. Standard enrichments: intent classification (already in raw events typically, but may be re-computed for retrospective analysis with newer models); session linking (group events by session_id with appropriate session timeout logic); reformulation detection (queries within a session that occur shortly after a previous query and share some terms are reformulations); outcome attribution (did the session lead to a conversion? at which query in the session?); query class binning (zero-result, low-CTR, high-CTR, navigational, informational). The enrichment runs as a batch job over the previous day's events; results are stored alongside raw events in the analytics warehouse.
Sampling. Production search at scale produces too many events for naive analysis. Patterns: log all events but sample 1–5% for ad-hoc analytics (the sample is statistically representative for aggregate measures); log all events for specific event types where sampling distorts analysis (e.g., conversion events should not be sampled because they're rare); retain different data for different durations (raw events for 30–90 days; aggregated stats indefinitely). The sampling and retention strategy balances storage cost with analytical capability.
Standard views. Mature operations teams build standard views that drive routine work. Daily zero-result query report (top 100 zero-result queries by frequency, with intent classification). Weekly low-CTR query report (queries with high impressions and low CTR, sorted by total impressions × (1 − CTR)). Reformulation chains (sessions where users issued multiple queries, surfacing patterns of search difficulty). Query trend report (which queries are growing in frequency week-over-week). Metric dashboards (NDCG, CTR, zero-result rate, latency over time). Each view is a query against the warehouse that runs daily or on demand.
Joining with other data. Query logs are most powerful when joined with other data. Product catalog (the documents being searched): which categories of documents are getting zero-result queries? User profile data (where privacy permits): how do different user segments search differently? Conversion data: which queries lead to high-value outcomes, and which lead to abandonment? The joins are routine warehouse operations; the analytical power comes from the combinations.
Privacy considerations. Query log data includes user-typed text and behavior; this is privacy-sensitive. Production discipline: anonymize where possible (hash user IDs; truncate timestamps to coarser granularity for long-term retention); enforce access controls (not everyone needs raw query access); apply retention policies (delete or aggregate raw data after a defined period); document data flows for compliance review (GDPR in Europe, CCPA in California, sector-specific regulations like HIPAA for healthcare). The patterns vary by jurisdiction and industry; the discipline is treating query log data as the sensitive data it is.
Infrastructure choices. The pipeline typically uses: a streaming layer (Kafka, Pub/Sub) for low-latency event capture; a data warehouse (BigQuery, Snowflake, Redshift) or analytical database (ClickHouse, Druid) for analytical queries; a BI / dashboarding tool (Looker, Tableau, Metabase, custom) for routine views. The choices depend on the team's broader data infrastructure; search-specific custom builds are rarely justified when the broader product analytics infrastructure can be extended.
Every production search system benefits from query log analytics. The investment is foundational; without it, no other operational pattern works. Teams without query logs should treat building them as priority infrastructure before attempting other operations work.
Alternatives — none for serious operational practice. Some teams operate without logs and rely on user complaints or sampled manual review; this works at very small scale but doesn't scale and produces less reliable signals than log-based analysis.
- Production methodology writings at search teams (Etsy, Wayfair, Spotify, Algolia case studies)
- Search analytics literature (Croft, Metzler, Strohman; Manning et al. ch. 8)
- Data warehousing / streaming infrastructure documentation (Kafka, BigQuery, Snowflake, ClickHouse)
Schema / config
\-- Production search event schema (data warehouse)
\-- Captures one row per query event; linked tables for clicks and
conversions
CREATE TABLE search_events (
event_id STRING NOT NULL, \-- unique per event
timestamp TIMESTAMP NOT NULL,
user_id STRING, \-- nullable (anonymous sessions)
session_id STRING NOT NULL,
query_text STRING NOT NULL, \-- as user typed
query_normalized STRING, \-- after analyzer chain
locale STRING,
device_type STRING,
result_count INT64 NOT NULL,
top_result_ids ARRAY<STRING>, \-- top 20 result IDs in order
\-- Query understanding output
intent_class STRING, \-- nav/info/trans/conv
intent_confidence FLOAT64,
extracted_entities ARRAY<STRUCT<type STRING, value STRING,
span_start INT64, span_end INT64>>,
expanded_terms ARRAY<STRING>, \-- post-synonym expansion
\-- Ranking output
ranked_results ARRAY<STRUCT<doc_id STRING, position INT64, score
FLOAT64>>,
\-- Latency
query_latency_ms INT64,
\-- Experiment
experiment_id STRING, \-- A/B test bucket assignment
experiment_variant STRING \-- control / treatment_a / ...
)
PARTITION BY DATE(timestamp)
CLUSTER BY intent_class, user_id;
CREATE TABLE click_events (
click_id STRING NOT NULL,
event_id STRING NOT NULL, \-- joins to search_events
timestamp TIMESTAMP NOT NULL,
doc_id STRING NOT NULL,
position INT64 NOT NULL, \-- position clicked (1-indexed)
dwell_time_ms INT64 \-- if measurable
)
PARTITION BY DATE(timestamp);
CREATE TABLE conversion_events (
conversion_id STRING NOT NULL,
event_id STRING, \-- attributed search event (may be null)
timestamp TIMESTAMP NOT NULL,
user_id STRING,
session_id STRING NOT NULL,
conv_type STRING, \-- \'purchase\' / \'signup\' / etc.
conv_value NUMERIC \-- $ amount or other measure
)
PARTITION BY DATE(timestamp);
Code
\-- Daily zero-result query report
\-- Run this every morning; investigate the top 20-50 by frequency
WITH yesterday_events AS (
SELECT *
FROM search_events
WHERE DATE(timestamp) = CURRENT_DATE() - 1
)
SELECT
query_normalized,
ANY_VALUE(query_text) AS sample_text, \-- one example of original
casing/typing
COUNT(*) AS query_count,
COUNT(DISTINCT session_id) AS distinct_sessions,
ANY_VALUE(intent_class) AS dominant_intent,
\-- Aggregated entity signals
ARRAY_AGG(DISTINCT e.type IGNORE NULLS) AS entity_types_seen
FROM yesterday_events
LEFT JOIN UNNEST(extracted_entities) AS e
WHERE result_count = 0
GROUP BY query_normalized
ORDER BY query_count DESC
LIMIT 100;
\-- Weekly low-CTR query report
\-- Joins search events with clicks; finds high-impression / low-CTR
queries
WITH last_week_searches AS (
SELECT *
FROM search_events
WHERE timestamp >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 7
DAY)
AND result_count > 0
),
clicks_per_query AS (
SELECT
s.query_normalized,
COUNT(DISTINCT s.event_id) AS impressions,
COUNT(c.click_id) AS clicks,
COUNT(DISTINCT CASE WHEN c.position <= 3 THEN c.event_id END) AS
clicks_top3
FROM last_week_searches s
LEFT JOIN click_events c ON c.event_id = s.event_id
GROUP BY s.query_normalized
)
SELECT
query_normalized,
impressions,
clicks,
ROUND(SAFE_DIVIDE(clicks, impressions), 4) AS ctr,
\-- Score: total impressions weighted by (1 - CTR) - prioritizes
high-impression, low-CTR
ROUND(impressions * (1 - SAFE_DIVIDE(clicks, impressions)), 0) AS
investigation_priority
FROM clicks_per_query
WHERE impressions >= 100 \-- minimum frequency to investigate
ORDER BY investigation_priority DESC
LIMIT 100;