Search Service Architecture: Standalone Flask Application with Valkey

This article explains how our search service runs as a standalone Flask application on a separate server, using Valkey (Redis fork) for high-performance vector search, caching, and autocomplete.

The Problem: Search Performance and Scalability

Search operations are computationally expensive:

  • Filter extraction: Match 2,500+ phrases against every query

  • Related searches: Compute similarity across 65K queries

  • Autocomplete: Prefix matching on 65K queries

  • Product filtering: Filter 5K+ products by multiple criteria

Running these operations on the main web server causes:

  • Slow page loads: Search blocks other requests

  • Memory pressure: Embeddings account for a massive memory footprint

  • CPU contention: Similarity computation is CPU-intensive

  • Scaling difficulty: Can't scale search independently

We need a dedicated search service that can scale independently.

The Solution: Standalone Search Service

We run a separate Flask application on a dedicated server:

Main Web Server
    ↓ HTTP API calls
Search Service
    ↓ Valkey queries
Valkey Server

This architecture provides:

  • Independent scaling: Scale search service without affecting main web server

  • Resource isolation: Search operations don't impact main web server

  • Caching: Valkey caches results for fast repeated queries

  • High availability: Search service can restart without affecting main web server

Search Service Components

1. Filter Extraction API

  • Endpoint: /api/extract_filters

  • Purpose: Extract structured filters from natural language queries

  • Example:

GET /api/extract_filters?q=mini+pc+16gb+ram

Response:
{
  "Form Factor": "Mini PC",
  "Main Memory": "16"
}

Implementation:

Caching: Phrase mappings cached in Valkey (30-day TTL)

2. Related Searches API

  • Endpoint: /api/related

  • Purpose: Find semantically similar queries using vector search

  • Example:

POST /api/related
{
  "query": "mini pc",
  "limit": 10
}

Response:
{
  "related": [
    {"query": "small computer", "similarity": 0.92},
    {"query": "compact desktop", "similarity": 0.89},
    {"query": "mini pc 8gb", "similarity": 0.87}
  ]
}

Implementation:

  • Embed query using all-mpnet-base-v2

  • Query Valkey RediSearch for nearest neighbors

  • Return top N results sorted by similarity

Caching: Results cached in Valkey (7-day TTL)

3. Autocomplete API

  • Endpoint: /api/autocomplete

  • Purpose: Suggest queries as user types

  • Example:

GET /api/autocomplete?q=mini+p&limit=5

Response:
{
  "suggestions": [
    "mini pc",
    "mini pc 16gb",
    "mini pc 8gb ram",
    "mini pc fanless",
    "mini pc windows 11"
  ]
}

Implementation:

  • Query Valkey RediSearch with prefix matching

  • Rank by popularity (impression + click score)

  • Return top N suggestions

Caching: Autocomplete index in Valkey (updated daily)

4. Popular Queries API

  • Endpoint: /api/popular

  • Purpose: Get most popular queries

  • Example:

GET /api/popular?limit=10

Response:
{
  "queries": [
    "mini pc",
    "thin client",
    "industrial pc",
    "all in one pc"
  ]
}

Implementation:

  • Load queries from Valkey or JSON

  • Sort by traffic score (impressions + clicks)

  • Return top N queries

Caching: Popular queries cached in Valkey (30-day TTL)

Valkey Integration

Valkey is a Redis fork that provides:

  • Vector search: RediSearch module for similarity search

  • Caching: Fast in-memory key-value store

  • Autocomplete: Prefix matching with sorted sets

  • Persistence: AOF (Append-Only File) for durability

Vector Search with RediSearch

We use Valkey's RediSearch module for vector similarity search:

Index Creation:

client.ft("queries_idx").create_index([
    VectorField("embedding", "FLAT", {
        "TYPE": "FLOAT32",
        "DIM": 768,
        "DISTANCE_METRIC": "COSINE"
    }),
    TextField("query"),
    NumericField("score")
])

Vector Search:

query_embedding = model.encode(query)
results = client.ft("queries_idx").search(
    Query("*=>[KNN 10 @embedding $vec AS score]")
    .sort_by("score")
    .return_fields("query", "score")
    .dialect(2),
    query_params={"vec": query_embedding.tobytes()}
)

This returns the 10 nearest neighbors by cosine similarity.

Caching Strategy

We cache multiple data types in Valkey:

Phrase Mappings (30-day TTL):

client.setex(
    "seo:phrase_mappings",
    30 * 24 * 3600,
    json.dumps(phrase_mappings)
)

Related Searches (7-day TTL):

cache_key = f"related:{query_hash}"
client.setex(cache_key, 7 * 24 * 3600, json.dumps(results))

Popular Queries (30-day TTL):

client.setex(
    "seo:popular_queries",
    30 * 24 * 3600,
    json.dumps(popular_queries)
)

Autocomplete Index (updated daily):

for query, score in queries:
    client.zadd("autocomplete:mini", {query: score})

Autocomplete with Sorted Sets

We use Valkey sorted sets for autocomplete:

Index Structure:

autocomplete:m     → ["mini pc": 5000, "mini computer": 3000]
autocomplete:mi    → ["mini pc": 5000, "mini computer": 3000]
autocomplete:min   → ["mini pc": 5000, "mini computer": 3000]
autocomplete:mini  → ["mini pc": 5000, "mini computer": 3000]

Prefix Lookup:

prefix = "mini"
results = client.zrevrange(f"autocomplete:{prefix}", 0, 9, withscores=True)

This returns the top 10 queries starting with "mini", sorted by score.

API Communication

The main web server calls the search service via HTTP:

Filter Extraction

from app.shared.filter_service import extract_filters_from_query

filters = extract_filters_from_query("mini pc 16gb ram")
# Internally calls: GET SEARCH_SERVICE_URL/api/extract_filters?q=...

Related Searches

import requests

response = requests.post(
    "SEARCH_SERVICE_URL/api/related",
    json={"query": "mini pc", "limit": 10},
    timeout=2
)
related = response.json()["related"]

Autocomplete

response = requests.get(
    "SEARCH_SERVICE_URL/api/autocomplete",
    params={"q": "mini p", "limit": 5},
    timeout=1
)
suggestions = response.json()["suggestions"]

Error Handling and Fallbacks

The main web server handles search service failures gracefully:

try:
    filters = extract_filters_from_query(query)
except Exception as e:
    logger.error(f"Search service failed: {e}")
    filters = {}  # Fallback to empty filters

This ensures the main web server continues functioning even if the search service is down.


### Network Configuration

All servers are on a private network:

- **Main web server**: Can access search service and Valkey

- **Search service**: Can access Valkey

- **Valkey**: Only accessible from main web server and search service

No external access to search service or Valkey.

## Integration with SEO Pipeline

The search service integrates with the SEO pipeline:

### Step 11: Migrate to Valkey

The SEO pipeline loads data into Valkey:

```python
# Load query embeddings
for query, embedding in zip(queries, embeddings):
    client.hset(f"query:{query_hash}", mapping={
        "query": query,
        "embedding": embedding.tobytes(),
        "score": score
    })

# Create RediSearch index
client.ft("queries_idx").create_index([...])

See Valkey Migration for details.

Query Logging

The search service logs queries for the SEO pipeline:

log_entry = {
    "timestamp": datetime.now(timezone.utc).isoformat(),
    "query": query,
    "filters_extracted": filters,
    "results_count": len(results)
}
with open(SEO_LIVE_QUERIES_LOG, "a") as f:
    f.write(json.dumps(log_entry) + "\n")

These logs feed back into Step 1d: Fetch Live Queries.

References

Technical Concepts

Related Articles

Summary

Our search service runs as a standalone Flask application on a seperate server:

Architecture:

  • Standalone Flask app on dedicated server

  • Valkey (Redis fork) for caching and vector search

  • HTTP API for communication with main web server

APIs:

  • /api/extract_filters - Extract filters from queries

  • /api/related - Find similar queries (vector search)

  • /api/autocomplete - Suggest queries (prefix matching)

  • /api/popular - Get popular queries

Valkey Features:

  • Vector search (RediSearch module)

  • Caching (30-day TTL for phrase mappings)

  • Autocomplete (sorted sets)

  • Persistence (AOF)

Benefits:

  • Independent scaling

  • Resource isolation

  • High performance (Valkey caching)

  • Fault tolerance (graceful degradation)

This architecture enables fast, scalable search while keeping the main web server responsive.


← Back to Documentation Index