Search Service Architecture: Standalone Flask Application with Valkey

This article explains how our search service runs as a standalone Flask application on a separate server, using Valkey (Redis fork) for high-performance vector search, caching, and autocomplete.

The Problem: Search Performance and Scalability

Search operations are computationally expensive:

Filter extraction: Match 2,500+ phrases against every query
Related searches: Compute similarity across 65K queries
Autocomplete: Prefix matching on 65K queries
Product filtering: Filter 5K+ products by multiple criteria

Running these operations on the main web server causes:

Slow page loads: Search blocks other requests
Memory pressure: Embeddings account for a massive memory footprint
CPU contention: Similarity computation is CPU-intensive
Scaling difficulty: Can't scale search independently

We need a dedicated search service that can scale independently.

The Solution: Standalone Search Service

We run a separate Flask application on a dedicated server:

Main Web Server
    ↓ HTTP API calls
Search Service
    ↓ Valkey queries
Valkey Server

This architecture provides:

Independent scaling: Scale search service without affecting main web server
Resource isolation: Search operations don't impact main web server
Caching: Valkey caches results for fast repeated queries
High availability: Search service can restart without affecting main web server

Search Service Components

1. Filter Extraction API

Endpoint: /api/extract_filters
Purpose: Extract structured filters from natural language queries
Example:

GET /api/extract_filters?q=mini+pc+16gb+ram

Response:
{
  "Form Factor": "Mini PC",
  "Main Memory": "16"
}

Implementation:

Load phrase-to-filter mappings from Valkey or JSON
Match phrases using word boundary regex
Return structured filters

Caching: Phrase mappings cached in Valkey (30-day TTL)

2. Related Searches API

Endpoint: /api/related
Purpose: Find semantically similar queries using vector search
Example:

POST /api/related
{
  "query": "mini pc",
  "limit": 10
}

Response:
{
  "related": [
    {"query": "small computer", "similarity": 0.92},
    {"query": "compact desktop", "similarity": 0.89},
    {"query": "mini pc 8gb", "similarity": 0.87}
  ]
}

Implementation:

Embed query using all-mpnet-base-v2
Query Valkey RediSearch for nearest neighbors
Return top N results sorted by similarity

Caching: Results cached in Valkey (7-day TTL)

3. Autocomplete API

Endpoint: /api/autocomplete
Purpose: Suggest queries as user types
Example:

GET /api/autocomplete?q=mini+p&limit=5

Response:
{
  "suggestions": [
    "mini pc",
    "mini pc 16gb",
    "mini pc 8gb ram",
    "mini pc fanless",
    "mini pc windows 11"
  ]
}

Implementation:

Query Valkey RediSearch with prefix matching
Rank by popularity (impression + click score)
Return top N suggestions

Caching: Autocomplete index in Valkey (updated daily)

4. Popular Queries API

Endpoint: /api/popular
Purpose: Get most popular queries
Example:

GET /api/popular?limit=10

Response:
{
  "queries": [
    "mini pc",
    "thin client",
    "industrial pc",
    "all in one pc"
  ]
}

Implementation:

Load queries from Valkey or JSON
Sort by traffic score (impressions + clicks)
Return top N queries

Caching: Popular queries cached in Valkey (30-day TTL)

Valkey Integration

Valkey is a Redis fork that provides:

Vector search: RediSearch module for similarity search
Caching: Fast in-memory key-value store
Autocomplete: Prefix matching with sorted sets
Persistence: AOF (Append-Only File) for durability

Vector Search with RediSearch

We use Valkey's RediSearch module for vector similarity search:

Index Creation:

client.ft("queries_idx").create_index([
    VectorField("embedding", "FLAT", {
        "TYPE": "FLOAT32",
        "DIM": 768,
        "DISTANCE_METRIC": "COSINE"
    }),
    TextField("query"),
    NumericField("score")
])

Vector Search:

query_embedding = model.encode(query)
results = client.ft("queries_idx").search(
    Query("*=>[KNN 10 @embedding $vec AS score]")
    .sort_by("score")
    .return_fields("query", "score")
    .dialect(2),
    query_params={"vec": query_embedding.tobytes()}
)

This returns the 10 nearest neighbors by cosine similarity.

Caching Strategy

We cache multiple data types in Valkey:

Phrase Mappings (30-day TTL):

client.setex(
    "seo:phrase_mappings",
    30 * 24 * 3600,
    json.dumps(phrase_mappings)
)

Related Searches (7-day TTL):

cache_key = f"related:{query_hash}"
client.setex(cache_key, 7 * 24 * 3600, json.dumps(results))

Popular Queries (30-day TTL):

client.setex(
    "seo:popular_queries",
    30 * 24 * 3600,
    json.dumps(popular_queries)
)

Autocomplete Index (updated daily):

for query, score in queries:
    client.zadd("autocomplete:mini", {query: score})

Autocomplete with Sorted Sets

We use Valkey sorted sets for autocomplete:

Index Structure:

autocomplete:m     → ["mini pc": 5000, "mini computer": 3000]
autocomplete:mi    → ["mini pc": 5000, "mini computer": 3000]
autocomplete:min   → ["mini pc": 5000, "mini computer": 3000]
autocomplete:mini  → ["mini pc": 5000, "mini computer": 3000]

Prefix Lookup:

prefix = "mini"
results = client.zrevrange(f"autocomplete:{prefix}", 0, 9, withscores=True)

This returns the top 10 queries starting with "mini", sorted by score.

API Communication

The main web server calls the search service via HTTP:

Filter Extraction

from app.shared.filter_service import extract_filters_from_query

filters = extract_filters_from_query("mini pc 16gb ram")
# Internally calls: GET SEARCH_SERVICE_URL/api/extract_filters?q=...

Autocomplete

response = requests.get(
    "SEARCH_SERVICE_URL/api/autocomplete",
    params={"q": "mini p", "limit": 5},
    timeout=1
)
suggestions = response.json()["suggestions"]

Error Handling and Fallbacks

The main web server handles search service failures gracefully:

try:
    filters = extract_filters_from_query(query)
except Exception as e:
    logger.error(f"Search service failed: {e}")
    filters = {}  # Fallback to empty filters

This ensures the main web server continues functioning even if the search service is down.


### Network Configuration

All servers are on a private network:

- **Main web server**: Can access search service and Valkey

- **Search service**: Can access Valkey

- **Valkey**: Only accessible from main web server and search service

No external access to search service or Valkey.

## Integration with SEO Pipeline

The search service integrates with the SEO pipeline:

### Step 11: Migrate to Valkey

The SEO pipeline loads data into Valkey:

```python
# Load query embeddings
for query, embedding in zip(queries, embeddings):
    client.hset(f"query:{query_hash}", mapping={
        "query": query,
        "embedding": embedding.tobytes(),
        "score": score
    })

# Create RediSearch index
client.ft("queries_idx").create_index([...])

See Valkey Migration for details.

Query Logging

The search service logs queries for the SEO pipeline:

log_entry = {
    "timestamp": datetime.now(timezone.utc).isoformat(),
    "query": query,
    "filters_extracted": filters,
    "results_count": len(results)
}
with open(SEO_LIVE_QUERIES_LOG, "a") as f:
    f.write(json.dumps(log_entry) + "\n")

These logs feed back into Step 1d: Fetch Live Queries.

References

Technical Concepts

Valkey - Official website
Redis - Official website (Valkey fork)
RediSearch - Vector search module
Cosine Similarity - Wikipedia

Filter Extraction - How filters are extracted
Related Search Generation - How related searches are generated
Embedding Strategy - How embeddings are generated
SEO Pipeline Overview - Complete pipeline architecture
Valkey Migration - Loading data into Valkey

Summary

Our search service runs as a standalone Flask application on a seperate server:

Architecture:

Standalone Flask app on dedicated server
Valkey (Redis fork) for caching and vector search
HTTP API for communication with main web server

APIs:

/api/extract_filters - Extract filters from queries
/api/related - Find similar queries (vector search)
/api/autocomplete - Suggest queries (prefix matching)
/api/popular - Get popular queries

Valkey Features:

Vector search (RediSearch module)
Caching (30-day TTL for phrase mappings)
Autocomplete (sorted sets)
Persistence (AOF)

Benefits:

Independent scaling
Resource isolation
High performance (Valkey caching)
Fault tolerance (graceful degradation)

This architecture enables fast, scalable search while keeping the main web server responsive.

← Back to Documentation Index

Products

Popular Searches and Blogs

Search Service Architecture: Standalone Flask Application with Valkey

The Problem: Search Performance and Scalability

The Solution: Standalone Search Service

Search Service Components

1. Filter Extraction API

2. Related Searches API

3. Autocomplete API

4. Popular Queries API

Valkey Integration

Vector Search with RediSearch

Caching Strategy

Autocomplete with Sorted Sets

API Communication

Filter Extraction

Related Searches

Autocomplete

Error Handling and Fallbacks

Query Logging

References

Technical Concepts

Related Articles

Summary