Embedding Strategy: all-mpnet-base-v2 for Semantic Search

This article explains how we use the all-mpnet-base-v2 sentence transformer model to power semantic search across our SEO pipeline.

The Problem: Matching Queries to Products

When users search for "mini pc" or "small computer," they mean the same thing despite sharing no keywords. Traditional keyword matching fails here. We need semantic embeddings - dense vector representations that capture meaning, not just words.

Our SEO pipeline processes 65,000+ search queries and needs to:

Cluster similar queries together for query pages
Match queries to products based on meaning
Find related searches
Expand phrase-to-filter mappings

The Model: all-mpnet-base-v2

We use all-mpnet-base-v2, a sentence transformer model that:

Outputs 768-dimensional vectors
Is trained on 1 billion+ sentence pairs
Handles sequences up to 384 tokens

The model maps text to a dense vector space where semantically similar text has high cosine similarity.

How We Use It

1. Embedding Queries

We embed all search queries into 768-dimensional vectors:

from sentence_transformers import SentenceTransformer

model = SentenceTransformer("all-mpnet-base-v2")
embeddings = model.encode(queries)

Each query becomes a 768-float vector (3,072 bytes). For 65,000 queries, that's ~190 MB of embeddings.

2. Clustering Queries

We cluster queries by similarity:

Select top 1,000 queries by traffic as cluster centers
Compute cosine similarity between all 65K queries and these 1K centers
Queries with similarity ≥ 0.85 join that cluster
Unclustered queries become singleton clusters

This creates query pages where each page represents a cluster of similar searches.

See Query Clustering Algorithm for details.

3. Matching Products

We embed product names and features, then match queries to products using cosine similarity. Products with highest similarity to the query appear on that query page.

See Product Matching Algorithm for details.

4. Expanding Phrase Mappings

We use embeddings to find similar phrases for filter extraction. For example, if "mini pc" maps to a size filter, we can find that "small computer" should too.

See Phrase Mapping Expansion for details.

Storage: NumPy Arrays

We store embeddings as NumPy .npy files:

import numpy as np

# Save
np.save("embeddings.npy", embeddings)

# Load (memory-mapped)
embeddings = np.load("embeddings.npy", mmap_mode='r')

Why NumPy?

Fast loading with memory mapping
Native array operations for similarity computation
Compact binary format (~190 MB for 65K queries)

Incremental Embedding

We don't re-embed all queries every time. Our incremental embedding:

Loads existing embeddings
Compares query keys (text + metadata)
Embeds only new/changed queries
Appends to existing array

This reduces processing time significantly when only a few queries change.

Multi-Server Architecture

Embeddings are used across multiple servers:

Batch Pipeline: Generates embeddings from queries and products
Search Service: Loads embeddings for related search API
Main Web Server: Uses embeddings for product matching

Storage varies by server:

NumPy files: Batch pipeline (fast local access)
Valkey RediSearch: Search service (vector similarity search)

See Search Service Architecture for Valkey integration details.

Integration with SEO Pipeline

The embeddings power multiple pipeline steps:

Step 0: Embed Source Data - Products, parts, articles
Step 3b: Embed Queries - All search queries
Step 4: Expand Phrase Mappings - Find similar phrases
Step 5: Cluster Queries - Group into query pages
Step 6: Match Products - Query-product matching
Step 8: Generate Related Searches - Find related queries

See SEO Pipeline Overview for the complete flow.

References

Model Documentation

all-mpnet-base-v2 Model Card - Hugging Face
Sentence Transformers Documentation - Official docs
Sentence Transformers Paper - Reimers & Gurevych, 2019

Technical Concepts

Word Embeddings - Wikipedia
Cosine Similarity - Wikipedia
NumPy Memory-Mapped Files - NumPy docs

SEO Pipeline Overview - Complete pipeline architecture
Query Clustering Algorithm - How we cluster 65K queries
Search Service Architecture - Valkey RediSearch integration
Product Matching Algorithm - Semantic matching
Phrase Mapping Expansion - Using embeddings for filters

Summary

We use all-mpnet-base-v2 to convert text into 768-dimensional vectors that capture semantic meaning. These embeddings power our entire SEO pipeline:

Query clustering: Group 65K queries into pages using 0.85 similarity threshold
Product matching: Match queries to products semantically
Related searches: Find similar queries for navigation
Phrase expansion: Discover new filter phrases

Embeddings are stored as NumPy arrays for fast loading and processed incrementally to avoid re-embedding unchanged data. The same embeddings are used across multiple servers via NumPy files and Valkey RediSearch.

← Back to Documentation Index

Products

Popular Searches and Blogs