Multi-Server Architecture: Separation of Concerns

This article explains how our infrastructure uses specialized servers to achieve scalability, reliability, and performance through separation of concerns.

The Problem: Monolithic Server Limitations

Running everything on one server causes:

  • Resource contention: Web server competes with search service for CPU/memory

  • Scaling difficulty: Can't scale components independently

  • Single point of failure: One server down = entire system down

  • Cost inefficiency: Can't optimize instance types per workload

  • Deployment risk: Deploying one component affects all others

We need separation of concerns across multiple servers.

The Solution: Specialized Server Roles

We run specialized servers, each optimized for its workload:

graph TB
    subgraph prod[Production ERP Server]
        erp[Internal Management
HR, Finance, Manufacturing
24/7 uptime
Shared storage & email] end subgraph web[Public Website Server] www[Public Website
Product catalog, blog
24/7 uptime
Behind CDN] end subgraph dev[Development/Staging Server] devenv[Dev & Staging
Email campaigns
Powers off nights/weekends] end subgraph search[Search Service Server] searchsvc[Vector Search
Filter extraction
Related searches
24/7 uptime] end subgraph cache[Cache Server - Valkey] valkey[RediSearch indexes
Query embeddings
Autocomplete
24/7 uptime] end www --> searchsvc searchsvc --> valkey

Why Separate Servers?

1. Resource Isolation

Each server runs one workload type:

Web Server: Optimized for HTTP request handling

  • High network bandwidth

  • Moderate CPU

  • Moderate memory

  • Fast disk for caching

Search Service: Optimized for vector operations

  • High CPU (cosine similarity)

  • High memory (embedding cache)

  • Low disk I/O

Cache Server: Optimized for memory access

  • Very high memory

  • Low CPU

  • Fast network

Benefit: No resource contention between workloads.

2. Independent Scaling

We can scale each component independently:

  • High traffic? Add more web servers behind load balancer.

  • Slow search? Upgrade search server CPU or add replicas.

  • Cache misses? Increase cache server memory.

  • Benefit: Scale only what needs scaling, not everything.

3. Fault Isolation

Failures are contained:

Search service down? Website still serves cached results.

Cache server down? Search service computes without cache (slower but functional).

Web server down? Internal ERP unaffected.

Benefit: Partial failures don't cascade to entire system.

4. Deployment Safety

We deploy through environments:

Development → Test changes with debug tools

Staging → Test in production-like environment

Production → Deploy with confidence

Benefit: Catch issues before they reach users.

5. Cost Optimization

We optimize costs per server:

  • Development server: Powers off nights/weekends (50% cost reduction)

  • IPv6-only servers: No elastic IP costs ($5/month savings per instance)

  • Right-sized instances: Pay only for resources needed per workload

  • Benefit: Lower infrastructure costs without sacrificing performance.

Data Flow Between Servers

Understanding how data flows helps explain why separation matters:

User Request Flow

Product Page Request:

sequenceDiagram
    participant User
    participant CDN
    participant Web as Web Server
    participant Search as Search Service
    participant Cache as Cache Server
    
    User->>CDN: Request product page
    CDN->>CDN: Check cache
    alt Cache hit
        CDN->>User: Serve cached page
    else Cache miss
        CDN->>Web: Forward request
        Web->>Search: Get related products
        Search->>Cache: Query embeddings
        Cache->>Search: Return results
        Search->>Web: Related products
        Web->>CDN: Rendered page
        CDN->>CDN: Cache response
        CDN->>User: Serve page
    end

Search Query Flow:

sequenceDiagram
    participant User
    participant Web as Web Server
    participant Search as Search Service
    participant Cache as Cache Server
    
    User->>Web: Submit search query
    Web->>Search: Forward query
    Search->>Search: Extract filters
    Search->>Search: Compute embedding
    Search->>Cache: Vector search
    Cache->>Search: Ranked results
    Search->>Web: Product matches
    Web->>User: Render results

Benefit: Each server does what it's optimized for.

Storage Strategy Per Server

Different servers use different storage strategies:

Web Server

S3-backed static files: Images, CSS, JS served from S3

  • High availability

  • No local disk usage

  • CDN-friendly

Local cache: Nginx caches rendered pages

  • Fast repeated access

  • Automatic invalidation

Search Service

NumPy arrays: Product embeddings stored as binary arrays

  • Fast vector operations

  • Memory-mapped for efficiency

  • 65K products × 768 dimensions = ~190 MB

JSON files: Filter mappings, phrase tables

  • Human-readable

  • Easy to update

  • Small size (<10 MB)

Cache Server

Valkey (Redis): Query embeddings, popular queries, autocomplete

  • In-memory for speed

  • RediSearch for vector indexes

  • Persistence for durability

Why different storage?

  • NumPy: Optimized for vector math (cosine similarity)

  • JSON: Optimized for human editing (filter rules)

  • Valkey: Optimized for key-value lookups (query cache)

CDN Architecture

The public website sits behind a Content Delivery Network:

What Gets Cached

Static assets (long TTL):

  • Images, CSS, JavaScript

  • Fonts, icons

  • Cached for 1 year

Product pages (medium TTL):

  • Product descriptions

  • Specifications

  • Cached for 1 hour

Query pages (short TTL):

  • Search results

  • Filter combinations

  • Cached for 5 minutes

Not cached:

  • User-specific content

  • API endpoints

  • Dynamic search

Why CDN Matters

  • Global latency: Edge locations serve content closer to users

  • Origin protection: CDN absorbs traffic spikes, protects origin server

  • DDoS mitigation: CDN filters malicious traffic before it reaches origin

  • Cost reduction: Fewer origin requests = lower compute costs

Deployment Pipeline

We deploy through environments to catch issues early:

Development Environment

Purpose: Rapid iteration with debug tools Features:

  • Auto-reload on code changes

  • Detailed error pages

  • Debug toolbar

  • No caching

Benefit: Fast feedback loop for developers.

Staging Environment

Purpose: Production-like testing Features:

  • Same configuration as production

  • Same server setup (Gunicorn, Nginx)

  • Same caching behavior

  • Isolated from production data

Benefit: Catch production-specific issues before deployment.

Production Environment

Purpose: Serve real users Features:

  • Optimized for performance

  • Full caching enabled

  • Monitoring and alerting

  • Auto-restart on crashes

Benefit: Stable, reliable service.

References

Technical Concepts

AWS Services

  • CloudFront - AWS CDN documentation

  • S3 - AWS object storage documentation

  • DynamoDB - AWS NoSQL database documentation

Related Articles

Summary

Our multi-server architecture separates concerns across specialized servers:

Production ERP Server:

  • Internal management system

  • 24/7 uptime

  • Shared storage and email

Public Website Server:

  • Customer-facing website

  • 24/7 uptime

  • Behind CDN with S3-backed static files

Development/Staging Server:

  • Safe testing environments

  • Email campaign processing

  • Cost-optimized (powers off nights/weekends)

Search Service Server:

  • Vector similarity search

  • Filter extraction

  • NumPy-backed embeddings

Cache Server (Valkey):

  • Query embeddings cache

  • RediSearch vector indexes

  • Popular queries ranking

Key Benefits:

  • Resource isolation: No contention between workloads

  • Independent scaling: Scale only what needs scaling

  • Fault isolation: Failures don't cascade

  • Deployment safety: Test before production

  • Cost optimization: Right-sized instances, power schedules

  • Performance: CDN caching, specialized storage strategies

This architecture enables us to serve millions of requests while maintaining high availability, low latency, and manageable costs.


← Back to Documentation Index