Source: OpenAI, Cohere, Voyage AI, BGE embedding model documentation; production methodology in modern RAG and semantic search deployments

Classification — Patterns for selecting embedding models, designing what to embed, and managing embeddings as the schema evolves.

Intent

Choose embedding model, content representation, and field structure to produce high-quality vector representations that retrieval can use effectively across the diverse queries the system handles.

Motivating Problem

Embedding a document is more than running text through a model. What text to embed, at what granularity, with which model, in how many fields — each decision affects retrieval quality. Defaults (embed the body, one vector field, off-the-shelf model) work but often miss substantial quality gains available through deliberate design. The patterns documented here capture the production discipline that emerges from teams who have invested in embedding strategy.

How It Works

What to embed: the choice. The simplest choice is to embed the full document text. The pattern works but produces dilution for long or multi-topic documents — the embedding represents an average of concepts. Better patterns: embed a curated representation. Title-only embedding for navigational matching. Title + key attributes (brand, category, top 5 features) for focused matching. LLM-generated summary for documents where the original text is verbose or structurally complex. Multi-vector schemas (separate title_vec and body_vec) for different match modes.

Multi-vector schemas. A schema with multiple vector fields per document, each capturing a different aspect. Common pattern: title_vec for navigational matches (typically 768-dim, focused on title content); body_vec for descriptive matches (1024-dim, broader context); summary_vec for high-level semantic matches (generated from LLM summary). Queries match against multiple vector fields with different weights; the LTR model (Vol 4 Section B) learns the appropriate weighting per query class. Multi-vector schemas substantially outperform single-vector schemas on diverse query distributions.

Embedding model selection. General-purpose models cover many cases: OpenAI text-embedding-3-large (1536 dims, strong general quality, commercial API); BGE-large (1024 dims, open source, self-hostable); Voyage AI voyage-3 (1024 dims, high quality on RAG tasks, commercial API); Cohere embed v3 (1024 dims, multilingual). The MTEB leaderboard provides comparative quality data; the BEIR benchmark provides retrieval-specific evaluation. Production evaluation on the actual workload (Volume 5 Section B) is essential before committing.

Domain fine-tuning. Off-the-shelf embedding models work but domain-tuned models typically produce 5–15% quality improvement on workload-specific tasks. The training process: collect labeled relevance pairs (query, positive document, hard negative document) from production logs or judgment lists; fine-tune the base model with contrastive loss; evaluate the tuned model on held-out evaluation queries. The investment is non-trivial but produces durable quality gains; production teams with mature search practice typically invest in this.

LLM-generated content for embedding. Embedding model quality depends on the input text quality. Documents with verbose or poorly-structured content produce lower-quality embeddings; documents with clean, focused content produce higher-quality embeddings. The pattern: use an LLM at index time to generate a clean summary or representation; embed the LLM output rather than the raw document. The LLM acts as content normalizer; the embedding model receives high-quality input. Production deployments routinely combine LLM-generated representations with raw-content fields, providing multiple embedding paths.

Embedding model versioning. The embedding model is a schema-level decision — changing models means re-embedding the entire corpus. Production discipline: treat embedding model selection as a deliberate schema decision; record the model version with the index metadata; plan model upgrades as substantial migrations (Section F covers); maintain compatibility windows during transitions (run two models in parallel during migration). The discipline prevents accidental incompatibilities.

Cost management. Embedding API/inference cost at index time accumulates. Strategies: batch processing for cost efficiency (most providers offer batch discounts of 50%+); selective embedding (only embed content above a length threshold; for very short content like product titles, embedding may not add value over lexical matching); incremental updates (only re-embed when content changes substantively, not on every metadata update); fine-tuned smaller models (a tuned smaller model can match an off-the-shelf larger model at lower per-document cost). Production deployments typically combine multiple strategies to keep embedding costs manageable.

Quality monitoring. Production embedding pipelines need quality monitoring. Track: embedding generation rate (how many docs per hour are being embedded); embedding generation failures (API errors, timeouts); embedding quality metrics on held-out evaluation queries (NDCG@K from vector retrieval over time). Sudden changes in these signals indicate issues with the embedding pipeline that need investigation. The monitoring discipline is part of operations (Volume 6 planned).

When to Use It

Any production search system using vector retrieval. RAG pipelines. Hybrid retrieval systems where vector is one component. Cross-modal search where embeddings bridge modalities (Section G). The embedding strategy decisions affect quality substantially; deliberate design beats defaults.

Alternatives — lexical-only retrieval (Volume 1 Section A) for cases where semantic matching isn't needed. Pre-computed similarity tables for very small corpora where ANN structures aren't justified. Most modern production search uses vector embeddings somewhere; the strategy is how they're used.

Sources

OpenAI embedding documentation (platform.openai.com)
Cohere embed documentation (docs.cohere.com)
Voyage AI embed documentation
BGE / FlagEmbedding documentation (huggingface.co/BAAI)
MTEB leaderboard (huggingface.co/spaces/mteb/leaderboard)
BEIR benchmark (github.com/beir-cellar/beir)

Production embedding strategies and multi-vector schemas