Embedding Strategy: all-mpnet-base-v2 for Semantic Search
This article explains how we use the all-mpnet-base-v2 sentence transformer model to power semantic search across our SEO pipeline.
The Problem: Matching Queries to Products
When users search for "mini pc" or "small computer," they mean the same thing despite sharing no keywords. Traditional keyword matching fails here. We need semantic embeddings - dense vector representations that capture meaning, not just words.
Our SEO pipeline processes 65,000+ search queries and needs to:
-
Cluster similar queries together for query pages
-
Match queries to products based on meaning
-
Find related searches
-
Expand phrase-to-filter mappings
The Model: all-mpnet-base-v2
We use all-mpnet-base-v2, a sentence transformer model that:
-
Outputs 768-dimensional vectors
-
Is trained on 1 billion+ sentence pairs
-
Handles sequences up to 384 tokens
The model maps text to a dense vector space where semantically similar text has high cosine similarity.
How We Use It
1. Embedding Queries
We embed all search queries into 768-dimensional vectors:
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("all-mpnet-base-v2")
embeddings = model.encode(queries)
Each query becomes a 768-float vector (3,072 bytes). For 65,000 queries, that's ~190 MB of embeddings.
2. Clustering Queries
We cluster queries by similarity:
-
Select top 1,000 queries by traffic as cluster centers
-
Compute cosine similarity between all 65K queries and these 1K centers
-
Queries with similarity ≥ 0.85 join that cluster
-
Unclustered queries become singleton clusters
This creates query pages where each page represents a cluster of similar searches.
See Query Clustering Algorithm for details.
3. Matching Products
We embed product names and features, then match queries to products using cosine similarity. Products with highest similarity to the query appear on that query page.
See Product Matching Algorithm for details.
4. Expanding Phrase Mappings
We use embeddings to find similar phrases for filter extraction. For example, if "mini pc" maps to a size filter, we can find that "small computer" should too.
See Phrase Mapping Expansion for details.
Storage: NumPy Arrays
We store embeddings as NumPy .npy files:
import numpy as np
# Save
np.save("embeddings.npy", embeddings)
# Load (memory-mapped)
embeddings = np.load("embeddings.npy", mmap_mode='r')
Why NumPy?
-
Fast loading with memory mapping
-
Native array operations for similarity computation
-
Compact binary format (~190 MB for 65K queries)
Incremental Embedding
We don't re-embed all queries every time. Our incremental embedding:
- Loads existing embeddings
- Compares query keys (text + metadata)
- Embeds only new/changed queries
- Appends to existing array
This reduces processing time significantly when only a few queries change.
Multi-Server Architecture
Embeddings are used across multiple servers:
- Batch Pipeline: Generates embeddings from queries and products
- Search Service: Loads embeddings for related search API
- Main Web Server: Uses embeddings for product matching
Storage varies by server:
-
NumPy files: Batch pipeline (fast local access)
-
Valkey RediSearch: Search service (vector similarity search)
See Search Service Architecture for Valkey integration details.
Integration with SEO Pipeline
The embeddings power multiple pipeline steps:
- Step 0: Embed Source Data - Products, parts, articles
- Step 3b: Embed Queries - All search queries
- Step 4: Expand Phrase Mappings - Find similar phrases
- Step 5: Cluster Queries - Group into query pages
- Step 6: Match Products - Query-product matching
- Step 8: Generate Related Searches - Find related queries
See SEO Pipeline Overview for the complete flow.
References
Model Documentation
-
all-mpnet-base-v2 Model Card - Hugging Face
-
Sentence Transformers Documentation - Official docs
-
Sentence Transformers Paper - Reimers & Gurevych, 2019
Technical Concepts
-
Word Embeddings - Wikipedia
-
Cosine Similarity - Wikipedia
-
NumPy Memory-Mapped Files - NumPy docs
Related Articles
-
SEO Pipeline Overview - Complete pipeline architecture
-
Query Clustering Algorithm - How we cluster 65K queries
-
Search Service Architecture - Valkey RediSearch integration
-
Product Matching Algorithm - Semantic matching
-
Phrase Mapping Expansion - Using embeddings for filters
Summary
We use all-mpnet-base-v2 to convert text into 768-dimensional vectors that capture semantic meaning. These embeddings power our entire SEO pipeline:
-
Query clustering: Group 65K queries into pages using 0.85 similarity threshold
-
Product matching: Match queries to products semantically
-
Related searches: Find similar queries for navigation
-
Phrase expansion: Discover new filter phrases
Embeddings are stored as NumPy arrays for fast loading and processed incrementally to avoid re-embedding unchanged data. The same embeddings are used across multiple servers via NumPy files and Valkey RediSearch.