Multi-Server Architecture: Separation of Concerns

This article explains how our infrastructure uses specialized servers to achieve scalability, reliability, and performance through separation of concerns.

The Problem: Monolithic Server Limitations

Running everything on one server causes:

Resource contention: Web server competes with search service for CPU/memory
Scaling difficulty: Can't scale components independently
Single point of failure: One server down = entire system down
Cost inefficiency: Can't optimize instance types per workload
Deployment risk: Deploying one component affects all others

We need separation of concerns across multiple servers.

The Solution: Specialized Server Roles

We run specialized servers, each optimized for its workload:

graph TB
    subgraph prod[Production ERP Server]
        erp[Internal Management
HR, Finance, Manufacturing
24/7 uptime
Shared storage & email]
    end
    
    subgraph web[Public Website Server]
        www[Public Website
Product catalog, blog
24/7 uptime
Behind CDN]
    end
    
    subgraph dev[Development/Staging Server]
        devenv[Dev & Staging
Email campaigns
Powers off nights/weekends]
    end
    
    subgraph search[Search Service Server]
        searchsvc[Vector Search
Filter extraction
Related searches
24/7 uptime]
    end
    
    subgraph cache[Cache Server - Valkey]
        valkey[RediSearch indexes
Query embeddings
Autocomplete
24/7 uptime]
    end
    
    www --> searchsvc
    searchsvc --> valkey

Why Separate Servers?

1. Resource Isolation

Each server runs one workload type:

Web Server: Optimized for HTTP request handling

High network bandwidth
Moderate CPU
Moderate memory
Fast disk for caching

Search Service: Optimized for vector operations

High CPU (cosine similarity)
High memory (embedding cache)
Low disk I/O

Cache Server: Optimized for memory access

Very high memory
Low CPU
Fast network

Benefit: No resource contention between workloads.

2. Independent Scaling

We can scale each component independently:

High traffic? Add more web servers behind load balancer.
Slow search? Upgrade search server CPU or add replicas.
Cache misses? Increase cache server memory.
Benefit: Scale only what needs scaling, not everything.

3. Fault Isolation

Failures are contained:

Search service down? Website still serves cached results.

Cache server down? Search service computes without cache (slower but functional).

Web server down? Internal ERP unaffected.

Benefit: Partial failures don't cascade to entire system.

4. Deployment Safety

We deploy through environments:

Development → Test changes with debug tools

Staging → Test in production-like environment

Production → Deploy with confidence

Benefit: Catch issues before they reach users.

5. Cost Optimization

We optimize costs per server:

Development server: Powers off nights/weekends (50% cost reduction)
IPv6-only servers: No elastic IP costs ($5/month savings per instance)
Right-sized instances: Pay only for resources needed per workload
Benefit: Lower infrastructure costs without sacrificing performance.

Data Flow Between Servers

Understanding how data flows helps explain why separation matters:

User Request Flow

Product Page Request:

sequenceDiagram
    participant User
    participant CDN
    participant Web as Web Server
    participant Search as Search Service
    participant Cache as Cache Server
    
    User->>CDN: Request product page
    CDN->>CDN: Check cache
    alt Cache hit
        CDN->>User: Serve cached page
    else Cache miss
        CDN->>Web: Forward request
        Web->>Search: Get related products
        Search->>Cache: Query embeddings
        Cache->>Search: Return results
        Search->>Web: Related products
        Web->>CDN: Rendered page
        CDN->>CDN: Cache response
        CDN->>User: Serve page
    end

Search Query Flow:

sequenceDiagram
    participant User
    participant Web as Web Server
    participant Search as Search Service
    participant Cache as Cache Server
    
    User->>Web: Submit search query
    Web->>Search: Forward query
    Search->>Search: Extract filters
    Search->>Search: Compute embedding
    Search->>Cache: Vector search
    Cache->>Search: Ranked results
    Search->>Web: Product matches
    Web->>User: Render results

Benefit: Each server does what it's optimized for.

Storage Strategy Per Server

Different servers use different storage strategies:

Web Server

S3-backed static files: Images, CSS, JS served from S3

High availability
No local disk usage
CDN-friendly

Local cache: Nginx caches rendered pages

Fast repeated access
Automatic invalidation

Search Service

NumPy arrays: Product embeddings stored as binary arrays

Fast vector operations
Memory-mapped for efficiency
65K products × 768 dimensions = ~190 MB

JSON files: Filter mappings, phrase tables

Human-readable
Easy to update
Small size (<10 MB)

Cache Server

Valkey (Redis): Query embeddings, popular queries, autocomplete

In-memory for speed
RediSearch for vector indexes
Persistence for durability

Why different storage?

NumPy: Optimized for vector math (cosine similarity)
JSON: Optimized for human editing (filter rules)
Valkey: Optimized for key-value lookups (query cache)

CDN Architecture

The public website sits behind a Content Delivery Network:

What Gets Cached

Static assets (long TTL):

Images, CSS, JavaScript
Fonts, icons
Cached for 1 year

Product pages (medium TTL):

Product descriptions
Specifications
Cached for 1 hour

Query pages (short TTL):

Search results
Filter combinations
Cached for 5 minutes

Not cached:

User-specific content
API endpoints
Dynamic search

Why CDN Matters

Global latency: Edge locations serve content closer to users
Origin protection: CDN absorbs traffic spikes, protects origin server
DDoS mitigation: CDN filters malicious traffic before it reaches origin
Cost reduction: Fewer origin requests = lower compute costs

Deployment Pipeline

We deploy through environments to catch issues early:

Development Environment

Purpose: Rapid iteration with debug tools Features:

Auto-reload on code changes
Detailed error pages
Debug toolbar
No caching

Benefit: Fast feedback loop for developers.

Staging Environment

Purpose: Production-like testing Features:

Same configuration as production
Same server setup (Gunicorn, Nginx)
Same caching behavior
Isolated from production data

Benefit: Catch production-specific issues before deployment.

Production Environment

Purpose: Serve real users Features:

Optimized for performance
Full caching enabled
Monitoring and alerting
Auto-restart on crashes

Benefit: Stable, reliable service.

References

Technical Concepts

Separation of Concerns - Wikipedia
Content Delivery Network (CDN) - Wikipedia
Horizontal Scaling - Wikipedia
Cosine Similarity - Wikipedia

AWS Services

CloudFront - AWS CDN documentation
S3 - AWS object storage documentation
DynamoDB - AWS NoSQL database documentation

Search Service Architecture - Standalone search service details
SEO Embedding Strategy - Why we use all-mpnet-base-v2
SEO Product Matching - How vector search works
Translation System - Multi-language support architecture

Summary

Our multi-server architecture separates concerns across specialized servers:

Production ERP Server:

Internal management system
24/7 uptime
Shared storage and email

Public Website Server:

Customer-facing website
24/7 uptime
Behind CDN with S3-backed static files

Development/Staging Server:

Safe testing environments
Email campaign processing
Cost-optimized (powers off nights/weekends)

Search Service Server:

Vector similarity search
Filter extraction
NumPy-backed embeddings

Cache Server (Valkey):

Query embeddings cache
RediSearch vector indexes
Popular queries ranking

Key Benefits:

✅ Resource isolation: No contention between workloads
✅ Independent scaling: Scale only what needs scaling
✅ Fault isolation: Failures don't cascade
✅ Deployment safety: Test before production
✅ Cost optimization: Right-sized instances, power schedules
✅ Performance: CDN caching, specialized storage strategies

This architecture enables us to serve millions of requests while maintaining high availability, low latency, and manageable costs.

← Back to Documentation Index

Products

Popular Searches and Blogs