Search Service Architecture: Standalone Flask Application with Valkey
This article explains how our search service runs as a standalone Flask application on a separate server, using Valkey (Redis fork) for high-performance vector search, caching, and autocomplete.
The Problem: Search Performance and Scalability
Search operations are computationally expensive:
-
Filter extraction: Match 2,500+ phrases against every query
-
Related searches: Compute similarity across 65K queries
-
Autocomplete: Prefix matching on 65K queries
-
Product filtering: Filter 5K+ products by multiple criteria
Running these operations on the main web server causes:
-
Slow page loads: Search blocks other requests
-
Memory pressure: Embeddings account for a massive memory footprint
-
CPU contention: Similarity computation is CPU-intensive
-
Scaling difficulty: Can't scale search independently
We need a dedicated search service that can scale independently.
The Solution: Standalone Search Service
We run a separate Flask application on a dedicated server:
Main Web Server
↓ HTTP API calls
Search Service
↓ Valkey queries
Valkey Server
This architecture provides:
-
Independent scaling: Scale search service without affecting main web server
-
Resource isolation: Search operations don't impact main web server
-
Caching: Valkey caches results for fast repeated queries
-
High availability: Search service can restart without affecting main web server
Search Service Components
1. Filter Extraction API
-
Endpoint:
/api/extract_filters -
Purpose: Extract structured filters from natural language queries
-
Example:
GET /api/extract_filters?q=mini+pc+16gb+ram
Response:
{
"Form Factor": "Mini PC",
"Main Memory": "16"
}
Implementation:
-
Load phrase-to-filter mappings from Valkey or JSON
-
Match phrases using word boundary regex
-
Return structured filters
Caching: Phrase mappings cached in Valkey (30-day TTL)
2. Related Searches API
-
Endpoint:
/api/related -
Purpose: Find semantically similar queries using vector search
-
Example:
POST /api/related
{
"query": "mini pc",
"limit": 10
}
Response:
{
"related": [
{"query": "small computer", "similarity": 0.92},
{"query": "compact desktop", "similarity": 0.89},
{"query": "mini pc 8gb", "similarity": 0.87}
]
}
Implementation:
-
Embed query using all-mpnet-base-v2
-
Query Valkey RediSearch for nearest neighbors
-
Return top N results sorted by similarity
Caching: Results cached in Valkey (7-day TTL)
3. Autocomplete API
-
Endpoint:
/api/autocomplete -
Purpose: Suggest queries as user types
-
Example:
GET /api/autocomplete?q=mini+p&limit=5
Response:
{
"suggestions": [
"mini pc",
"mini pc 16gb",
"mini pc 8gb ram",
"mini pc fanless",
"mini pc windows 11"
]
}
Implementation:
-
Query Valkey RediSearch with prefix matching
-
Rank by popularity (impression + click score)
-
Return top N suggestions
Caching: Autocomplete index in Valkey (updated daily)
4. Popular Queries API
-
Endpoint:
/api/popular -
Purpose: Get most popular queries
-
Example:
GET /api/popular?limit=10
Response:
{
"queries": [
"mini pc",
"thin client",
"industrial pc",
"all in one pc"
]
}
Implementation:
-
Load queries from Valkey or JSON
-
Sort by traffic score (impressions + clicks)
-
Return top N queries
Caching: Popular queries cached in Valkey (30-day TTL)
Valkey Integration
Valkey is a Redis fork that provides:
-
Vector search: RediSearch module for similarity search
-
Caching: Fast in-memory key-value store
-
Autocomplete: Prefix matching with sorted sets
-
Persistence: AOF (Append-Only File) for durability
Vector Search with RediSearch
We use Valkey's RediSearch module for vector similarity search:
Index Creation:
client.ft("queries_idx").create_index([
VectorField("embedding", "FLAT", {
"TYPE": "FLOAT32",
"DIM": 768,
"DISTANCE_METRIC": "COSINE"
}),
TextField("query"),
NumericField("score")
])
Vector Search:
query_embedding = model.encode(query)
results = client.ft("queries_idx").search(
Query("*=>[KNN 10 @embedding $vec AS score]")
.sort_by("score")
.return_fields("query", "score")
.dialect(2),
query_params={"vec": query_embedding.tobytes()}
)
This returns the 10 nearest neighbors by cosine similarity.
Caching Strategy
We cache multiple data types in Valkey:
Phrase Mappings (30-day TTL):
client.setex(
"seo:phrase_mappings",
30 * 24 * 3600,
json.dumps(phrase_mappings)
)
Related Searches (7-day TTL):
cache_key = f"related:{query_hash}"
client.setex(cache_key, 7 * 24 * 3600, json.dumps(results))
Popular Queries (30-day TTL):
client.setex(
"seo:popular_queries",
30 * 24 * 3600,
json.dumps(popular_queries)
)
Autocomplete Index (updated daily):
for query, score in queries:
client.zadd("autocomplete:mini", {query: score})
Autocomplete with Sorted Sets
We use Valkey sorted sets for autocomplete:
Index Structure:
autocomplete:m → ["mini pc": 5000, "mini computer": 3000]
autocomplete:mi → ["mini pc": 5000, "mini computer": 3000]
autocomplete:min → ["mini pc": 5000, "mini computer": 3000]
autocomplete:mini → ["mini pc": 5000, "mini computer": 3000]
Prefix Lookup:
prefix = "mini"
results = client.zrevrange(f"autocomplete:{prefix}", 0, 9, withscores=True)
This returns the top 10 queries starting with "mini", sorted by score.
API Communication
The main web server calls the search service via HTTP:
Filter Extraction
from app.shared.filter_service import extract_filters_from_query
filters = extract_filters_from_query("mini pc 16gb ram")
# Internally calls: GET SEARCH_SERVICE_URL/api/extract_filters?q=...
Related Searches
import requests
response = requests.post(
"SEARCH_SERVICE_URL/api/related",
json={"query": "mini pc", "limit": 10},
timeout=2
)
related = response.json()["related"]
Autocomplete
response = requests.get(
"SEARCH_SERVICE_URL/api/autocomplete",
params={"q": "mini p", "limit": 5},
timeout=1
)
suggestions = response.json()["suggestions"]
Error Handling and Fallbacks
The main web server handles search service failures gracefully:
try:
filters = extract_filters_from_query(query)
except Exception as e:
logger.error(f"Search service failed: {e}")
filters = {} # Fallback to empty filters
This ensures the main web server continues functioning even if the search service is down.
### Network Configuration
All servers are on a private network:
- **Main web server**: Can access search service and Valkey
- **Search service**: Can access Valkey
- **Valkey**: Only accessible from main web server and search service
No external access to search service or Valkey.
## Integration with SEO Pipeline
The search service integrates with the SEO pipeline:
### Step 11: Migrate to Valkey
The SEO pipeline loads data into Valkey:
```python
# Load query embeddings
for query, embedding in zip(queries, embeddings):
client.hset(f"query:{query_hash}", mapping={
"query": query,
"embedding": embedding.tobytes(),
"score": score
})
# Create RediSearch index
client.ft("queries_idx").create_index([...])
See Valkey Migration for details.
Query Logging
The search service logs queries for the SEO pipeline:
log_entry = {
"timestamp": datetime.now(timezone.utc).isoformat(),
"query": query,
"filters_extracted": filters,
"results_count": len(results)
}
with open(SEO_LIVE_QUERIES_LOG, "a") as f:
f.write(json.dumps(log_entry) + "\n")
These logs feed back into Step 1d: Fetch Live Queries.
References
Technical Concepts
-
Valkey - Official website
-
Redis - Official website (Valkey fork)
-
RediSearch - Vector search module
-
Cosine Similarity - Wikipedia
Related Articles
-
Filter Extraction - How filters are extracted
-
Related Search Generation - How related searches are generated
-
Embedding Strategy - How embeddings are generated
-
SEO Pipeline Overview - Complete pipeline architecture
-
Valkey Migration - Loading data into Valkey
Summary
Our search service runs as a standalone Flask application on a seperate server:
Architecture:
-
Standalone Flask app on dedicated server
-
Valkey (Redis fork) for caching and vector search
-
HTTP API for communication with main web server
APIs:
-
/api/extract_filters- Extract filters from queries -
/api/related- Find similar queries (vector search) -
/api/autocomplete- Suggest queries (prefix matching) -
/api/popular- Get popular queries
Valkey Features:
-
Vector search (RediSearch module)
-
Caching (30-day TTL for phrase mappings)
-
Autocomplete (sorted sets)
-
Persistence (AOF)
Benefits:
-
Independent scaling
-
Resource isolation
-
High performance (Valkey caching)
-
Fault tolerance (graceful degradation)
This architecture enables fast, scalable search while keeping the main web server responsive.