Query Embedding: Converting Search Text to Vectors
This article explains how we convert search queries into vector embeddings for clustering and matching.
The Problem: Comparing Queries Semantically
After fetching queries from multiple sources, we need to compare them:
-
"mini pc" vs "small computer" - Same meaning, different words
-
"thin client" vs "zero client" - Related but distinct concepts
-
"fanless pc" vs "silent computer" - Similar intent, different phrasing
Text comparison (string matching, edit distance) fails here. We need semantic similarity - comparing meaning, not characters.
The Solution: Sentence Embeddings
We use a sentence transformer model (see Embedding Strategy) to convert each query into a fixed-dimensional vector:
Input: "mini pc for office"
Output: Vector of numbers representing semantic meaning
Benefit: Queries with similar meaning have similar vectors (high cosine similarity).
Step 1: The Embedding Process - Load and Prepare Queries
1.1 Load Combined Queries
Purpose: Load query text from the combined query dataset.
Input: Combined queries JSON from query fetching.
Output: List of normalized query strings ready for embedding.
Process:
- Load combined query file
- Extract query text from each entry
- Normalize text (lowercase, trim, collapse whitespace)
- Remove duplicates
- Prepare for batch embedding
1.2 Incremental Embedding Strategy
Purpose: Optimize embedding by caching previous results.(See incremental embedding)
Process:
- Load previously cached embeddings
- Compare current queries to cached list
- Identify new and modified queries
- Embed only new/changed queries
- Merge with cached embeddings
- Save updated embeddings
Benefit: Subsequent runs are significantly faster (high cache hit rate)
Step 2: Embed Queries Using Model
2.1 Load Embedding Model
Purpose: Initialize sentence transformer model for semantic encoding.
Model: Pre-trained sentence transformer (multi-language support)
Output dimensions: Fixed vector size for all queries
Process:
- Load pre-trained model from cache or download
- Move to available hardware (CPU or GPU)
- Set to inference mode
- Initialize for batch encoding
Efficiency: Model loaded once and reused for all batches
2.2 Batch Embedding
Purpose: Convert query text to vectors efficiently.
Process:
- Group queries into batches
- Encode each batch with model
- Combine embeddings in order
- Maintain dimensional consistency
Batch size: Configurable (balances throughput and memory)
Hardware: Supports both CPU and GPU acceleration
Efficiency: Processes batches sequentially to manage memory
2.3 Save Embeddings
Purpose: Persist embeddings for downstream use.
Output formats:
Vector array: Binary format for fast numerical operations
-
Efficient storage and access
-
Optimized for vector operations
Query list: JSON text format for human reference
-
Maps array indices to query text
-
Enables debugging and verification
Separation: Different formats for different access patterns
Step 3: Quality Assurance
3.1 Handling Duplicates
Queries may have slight variations:
-
"mini pc" vs "Mini PC" (case difference)
-
"mini pc" vs "mini pc" (whitespace difference)
Solution: Normalize before embedding (lowercase, trim, collapse whitespace)
Benefit: Identical queries get identical embeddings
3.2 Handling Typos
Typos create different embeddings than correct spelling:
-
Model is trained on correct text
-
Similar typos still produce similar embeddings
Benefit: Robust to minor typos
Limitation: Major typos may not match well
3.3 Handling Multi-Language
Model supports multiple languages natively:
-
English: "mini pc"
-
French: "mini ordinateur"
-
Hindi: "मिनी पीसी"
Benefit: Cross-language similarity works
Limitation: Model trained primarily on English
Step 4: Integration with Pipeline
4.1 Query Clustering
Input: Query embeddings + engagement weights
Process: Compute similarity, group similar queries
Output: Query clusters
See: Query Clustering
4.2 Product Matching
Input: Query embeddings + product embeddings
Process: Find most relevant product for each query cluster
Output: Query-to-product mappings
See: Product Matching
4.3 Related Searches
Input: Query embeddings
Process: Find similar queries for each query
Output: Related search suggestions
See: Related Searches
4.4 Search Service
Input: Live query text from user
Process: Embed query, find similar products/queries
Output: Search results and suggestions
See: Search Service Architecture
Performance Characteristics
First Run (Cold Cache)
Activity: Embedding all queries from scratch
Time: Varies by dataset size and hardware
Output: Complete embedding matrix
Subsequent Runs (Warm Cache)
Activity: Embedding only new/changed queries
Processing: Subset of total (new additions)
Speed: Significantly faster due to cache reuse
Efficiency: High cache hit rate
Memory Profile
Model: Loaded once in memory
Embeddings: Proportional to number of queries and dimensions
Batch processing: Temporary space for batch operations
Peak usage: Depends on batch size and dataset
Storage Format
Vector Array
Format: Binary numerical array
Structure: N queries × D dimensions
Data type: 32-bit floating point numbers
Access: Memory-mapped for efficiency
Use case: Fast vector operations (similarity, clustering)
Query List
Format: JSON text array
Structure: Simple list of query strings
Purpose: Map array index to query text
Use case: Human debugging and inspection
References
Technical Concepts
-
Word Embedding — Wikipedia
-
Semantic Similarity — Wikipedia
-
Cosine Similarity — Wikipedia
-
Incremental Learning — Wikipedia
Libraries and Tools
-
Sentence Transformers — Embedding library
-
NumPy — Array computing
Related Articles
-
SEO Embedding Strategy — Model selection
-
Query Fetching — Source queries
-
Source Data Embedding — Product embeddings
-
Query Clustering — Clustering with embeddings
-
Product Matching — Matching queries to products