Query Embedding: Converting Search Text to Vectors

This article explains how we convert search queries into vector embeddings for clustering and matching.

The Problem: Comparing Queries Semantically

After fetching queries from multiple sources, we need to compare them:

  • "mini pc" vs "small computer" - Same meaning, different words

  • "thin client" vs "zero client" - Related but distinct concepts

  • "fanless pc" vs "silent computer" - Similar intent, different phrasing

Text comparison (string matching, edit distance) fails here. We need semantic similarity - comparing meaning, not characters.

The Solution: Sentence Embeddings

We use a sentence transformer model (see Embedding Strategy) to convert each query into a fixed-dimensional vector:

Input: "mini pc for office"

Output: Vector of numbers representing semantic meaning

Benefit: Queries with similar meaning have similar vectors (high cosine similarity).

Step 1: The Embedding Process - Load and Prepare Queries

1.1 Load Combined Queries

Purpose: Load query text from the combined query dataset.

Input: Combined queries JSON from query fetching.

Output: List of normalized query strings ready for embedding.

Process:

  1. Load combined query file
  2. Extract query text from each entry
  3. Normalize text (lowercase, trim, collapse whitespace)
  4. Remove duplicates
  5. Prepare for batch embedding

1.2 Incremental Embedding Strategy

Purpose: Optimize embedding by caching previous results.(See incremental embedding)

Process:

  1. Load previously cached embeddings
  2. Compare current queries to cached list
  3. Identify new and modified queries
  4. Embed only new/changed queries
  5. Merge with cached embeddings
  6. Save updated embeddings

Benefit: Subsequent runs are significantly faster (high cache hit rate)

Step 2: Embed Queries Using Model

2.1 Load Embedding Model

Purpose: Initialize sentence transformer model for semantic encoding.

Model: Pre-trained sentence transformer (multi-language support)

Output dimensions: Fixed vector size for all queries

Process:

  1. Load pre-trained model from cache or download
  2. Move to available hardware (CPU or GPU)
  3. Set to inference mode
  4. Initialize for batch encoding

Efficiency: Model loaded once and reused for all batches

2.2 Batch Embedding

Purpose: Convert query text to vectors efficiently.

Process:

  1. Group queries into batches
  2. Encode each batch with model
  3. Combine embeddings in order
  4. Maintain dimensional consistency

Batch size: Configurable (balances throughput and memory)

Hardware: Supports both CPU and GPU acceleration

Efficiency: Processes batches sequentially to manage memory

2.3 Save Embeddings

Purpose: Persist embeddings for downstream use.

Output formats:

Vector array: Binary format for fast numerical operations

  • Efficient storage and access

  • Optimized for vector operations

Query list: JSON text format for human reference

  • Maps array indices to query text

  • Enables debugging and verification

Separation: Different formats for different access patterns

Step 3: Quality Assurance

3.1 Handling Duplicates

Queries may have slight variations:

  • "mini pc" vs "Mini PC" (case difference)

  • "mini pc" vs "mini pc" (whitespace difference)

Solution: Normalize before embedding (lowercase, trim, collapse whitespace)

Benefit: Identical queries get identical embeddings

3.2 Handling Typos

Typos create different embeddings than correct spelling:

  • Model is trained on correct text

  • Similar typos still produce similar embeddings

Benefit: Robust to minor typos

Limitation: Major typos may not match well

3.3 Handling Multi-Language

Model supports multiple languages natively:

  • English: "mini pc"

  • French: "mini ordinateur"

  • Hindi: "मिनी पीसी"

Benefit: Cross-language similarity works

Limitation: Model trained primarily on English

Step 4: Integration with Pipeline

4.1 Query Clustering

Input: Query embeddings + engagement weights

Process: Compute similarity, group similar queries

Output: Query clusters

See: Query Clustering

4.2 Product Matching

Input: Query embeddings + product embeddings

Process: Find most relevant product for each query cluster

Output: Query-to-product mappings

See: Product Matching

4.3 Related Searches

Input: Query embeddings

Process: Find similar queries for each query

Output: Related search suggestions

See: Related Searches

4.4 Search Service

Input: Live query text from user

Process: Embed query, find similar products/queries

Output: Search results and suggestions

See: Search Service Architecture

Performance Characteristics

First Run (Cold Cache)

Activity: Embedding all queries from scratch

Time: Varies by dataset size and hardware

Output: Complete embedding matrix

Subsequent Runs (Warm Cache)

Activity: Embedding only new/changed queries

Processing: Subset of total (new additions)

Speed: Significantly faster due to cache reuse

Efficiency: High cache hit rate

Memory Profile

Model: Loaded once in memory

Embeddings: Proportional to number of queries and dimensions

Batch processing: Temporary space for batch operations

Peak usage: Depends on batch size and dataset

Storage Format

Vector Array

Format: Binary numerical array

Structure: N queries × D dimensions

Data type: 32-bit floating point numbers

Access: Memory-mapped for efficiency

Use case: Fast vector operations (similarity, clustering)

Query List

Format: JSON text array

Structure: Simple list of query strings

Purpose: Map array index to query text

Use case: Human debugging and inspection

References

Technical Concepts

Libraries and Tools

Related Articles


← Back to Documentation Index