RelevantSearch.AI
Pattern · Volume 01 · Section G --- Performance and caching patterns · Updated May 2026

Query result caching strategies

Source: Established pattern in production search; documented across Elasticsearch / OpenSearch (request cache, query cache), Solr (filter cache, query result cache), Coveo (result cache), Vespa, Algolia

Classification — Patterns for caching query results, intermediate computations, and analyzer outputs.

Intent

Reduce query latency and infrastructure cost by caching results at appropriate granularity: full result sets, intermediate retrieval candidates, filter results, analyzer outputs, embedding computations.

Motivating Problem

Production search has heavy query repetition. A small head of popular queries can drive 30–60% of total query volume; the same queries produce the same results until the underlying corpus changes. Computing these results from scratch on every request wastes compute. Caching repeated results substantially reduces both per-query latency (cache hits return in microseconds vs. milliseconds for full retrieval) and infrastructure cost (popular queries' results computed once and served many times).

How It Works

Result-level caching. The full ranked result list for a query is cached. Cache key includes the query string and any parameters that affect results (filters, sort, user/session context if personalized). Cache invalidation triggers on corpus updates: when documents change, affected cached entries must be invalidated. The pattern works well for non-personalized queries with stable corpora; personalization and frequent updates reduce hit rates.

Filter result caching. Filter queries (e.g., "status:active AND category:shoes") often repeat with different free-text queries. Caching the document-set produced by the filter (typically as a bitset or roaring bitmap) lets subsequent queries combine the filter result with their text query quickly. Solr's filter cache is the canonical implementation; similar mechanisms exist in Elasticsearch and others.

Analyzer and tokenization caching. Analyzers (tokenization, normalization, stemming) produce deterministic output for given input. Caching the analyzed form of common terms avoids redundant analyzer runs. The pattern has small per-operation savings but at scale becomes meaningful.

Embedding caching. For dense retrieval (Section B), query embeddings are generated at query time. Caching embeddings for common queries avoids redundant embedding model calls; embedding generation can be a substantial fraction of dense retrieval latency. Cache TTL depends on whether the embedding model itself updates.

Cache invalidation challenges. The fundamental difficulty of caching: knowing when cached data is stale. Approaches: TTL-based (entries expire after fixed time; simple but stale data is served until expiry); event-based (cache invalidated on corpus updates; more precise but requires invalidation infrastructure); version-based (cache key includes corpus version; works for batch update patterns). Production systems typically combine TTLs with event-based invalidation for important changes.

Cache architecture. In-process caches (each search node has its own cache) are simple but produce duplication and cold-start issues on each node. Shared caches (Redis, Memcached) reduce duplication but add network latency. Hybrid patterns (small in-process L1 cache plus larger shared L2 cache) balance the trade-offs. The right architecture depends on cluster size and query patterns.

When to Use It

Production search above modest scale where query repetition produces meaningful cache hit rates. Use cases with stable corpora where invalidation overhead is manageable. Latency-sensitive applications where cache hits provide noticeable user experience improvements. Cost-sensitive deployments where infrastructure savings from caching are material.

Alternatives — no caching for highly personalized retrieval where cache hit rates are low. No caching for high-write-rate corpora where invalidation overhead exceeds caching benefits. Selective caching (filter caches only; result caches only for non-personalized queries) for mixed workloads.

Sources
  • Elasticsearch / OpenSearch request cache and query cache documentation
  • Solr filter cache and query result cache documentation
  • Coveo result cache documentation
  • Algolia caching documentation

Read in context within Volume 01 →