Multi-Server Architecture: Separation of Concerns
This article explains how our infrastructure uses specialized servers to achieve scalability, reliability, and performance through separation of concerns.
The Problem: Monolithic Server Limitations
Running everything on one server causes:
-
Resource contention: Web server competes with search service for CPU/memory
-
Scaling difficulty: Can't scale components independently
-
Single point of failure: One server down = entire system down
-
Cost inefficiency: Can't optimize instance types per workload
-
Deployment risk: Deploying one component affects all others
We need separation of concerns across multiple servers.
The Solution: Specialized Server Roles
We run specialized servers, each optimized for its workload:
graph TB
subgraph prod[Production ERP Server]
erp[Internal Management
HR, Finance, Manufacturing
24/7 uptime
Shared storage & email]
end
subgraph web[Public Website Server]
www[Public Website
Product catalog, blog
24/7 uptime
Behind CDN]
end
subgraph dev[Development/Staging Server]
devenv[Dev & Staging
Email campaigns
Powers off nights/weekends]
end
subgraph search[Search Service Server]
searchsvc[Vector Search
Filter extraction
Related searches
24/7 uptime]
end
subgraph cache[Cache Server - Valkey]
valkey[RediSearch indexes
Query embeddings
Autocomplete
24/7 uptime]
end
www --> searchsvc
searchsvc --> valkeyWhy Separate Servers?
1. Resource Isolation
Each server runs one workload type:
Web Server: Optimized for HTTP request handling
-
High network bandwidth
-
Moderate CPU
-
Moderate memory
-
Fast disk for caching
Search Service: Optimized for vector operations
-
High CPU (cosine similarity)
-
High memory (embedding cache)
-
Low disk I/O
Cache Server: Optimized for memory access
-
Very high memory
-
Low CPU
-
Fast network
Benefit: No resource contention between workloads.
2. Independent Scaling
We can scale each component independently:
-
High traffic? Add more web servers behind load balancer.
-
Slow search? Upgrade search server CPU or add replicas.
-
Cache misses? Increase cache server memory.
-
Benefit: Scale only what needs scaling, not everything.
3. Fault Isolation
Failures are contained:
Search service down? Website still serves cached results.
Cache server down? Search service computes without cache (slower but functional).
Web server down? Internal ERP unaffected.
Benefit: Partial failures don't cascade to entire system.
4. Deployment Safety
We deploy through environments:
Development → Test changes with debug tools
Staging → Test in production-like environment
Production → Deploy with confidence
Benefit: Catch issues before they reach users.
5. Cost Optimization
We optimize costs per server:
-
Development server: Powers off nights/weekends (50% cost reduction)
-
IPv6-only servers: No elastic IP costs ($5/month savings per instance)
-
Right-sized instances: Pay only for resources needed per workload
-
Benefit: Lower infrastructure costs without sacrificing performance.
Data Flow Between Servers
Understanding how data flows helps explain why separation matters:
User Request Flow
Product Page Request:
sequenceDiagram
participant User
participant CDN
participant Web as Web Server
participant Search as Search Service
participant Cache as Cache Server
User->>CDN: Request product page
CDN->>CDN: Check cache
alt Cache hit
CDN->>User: Serve cached page
else Cache miss
CDN->>Web: Forward request
Web->>Search: Get related products
Search->>Cache: Query embeddings
Cache->>Search: Return results
Search->>Web: Related products
Web->>CDN: Rendered page
CDN->>CDN: Cache response
CDN->>User: Serve page
endSearch Query Flow:
sequenceDiagram
participant User
participant Web as Web Server
participant Search as Search Service
participant Cache as Cache Server
User->>Web: Submit search query
Web->>Search: Forward query
Search->>Search: Extract filters
Search->>Search: Compute embedding
Search->>Cache: Vector search
Cache->>Search: Ranked results
Search->>Web: Product matches
Web->>User: Render resultsBenefit: Each server does what it's optimized for.
Storage Strategy Per Server
Different servers use different storage strategies:
Web Server
S3-backed static files: Images, CSS, JS served from S3
-
High availability
-
No local disk usage
-
CDN-friendly
Local cache: Nginx caches rendered pages
-
Fast repeated access
-
Automatic invalidation
Search Service
NumPy arrays: Product embeddings stored as binary arrays
-
Fast vector operations
-
Memory-mapped for efficiency
-
65K products × 768 dimensions = ~190 MB
JSON files: Filter mappings, phrase tables
-
Human-readable
-
Easy to update
-
Small size (<10 MB)
Cache Server
Valkey (Redis): Query embeddings, popular queries, autocomplete
-
In-memory for speed
-
RediSearch for vector indexes
-
Persistence for durability
Why different storage?
-
NumPy: Optimized for vector math (cosine similarity)
-
JSON: Optimized for human editing (filter rules)
-
Valkey: Optimized for key-value lookups (query cache)
CDN Architecture
The public website sits behind a Content Delivery Network:
What Gets Cached
Static assets (long TTL):
-
Images, CSS, JavaScript
-
Fonts, icons
-
Cached for 1 year
Product pages (medium TTL):
-
Product descriptions
-
Specifications
-
Cached for 1 hour
Query pages (short TTL):
-
Search results
-
Filter combinations
-
Cached for 5 minutes
Not cached:
-
User-specific content
-
API endpoints
-
Dynamic search
Why CDN Matters
-
Global latency: Edge locations serve content closer to users
-
Origin protection: CDN absorbs traffic spikes, protects origin server
-
DDoS mitigation: CDN filters malicious traffic before it reaches origin
-
Cost reduction: Fewer origin requests = lower compute costs
Deployment Pipeline
We deploy through environments to catch issues early:
Development Environment
Purpose: Rapid iteration with debug tools Features:
-
Auto-reload on code changes
-
Detailed error pages
-
Debug toolbar
-
No caching
Benefit: Fast feedback loop for developers.
Staging Environment
Purpose: Production-like testing Features:
-
Same configuration as production
-
Same server setup (Gunicorn, Nginx)
-
Same caching behavior
-
Isolated from production data
Benefit: Catch production-specific issues before deployment.
Production Environment
Purpose: Serve real users Features:
-
Optimized for performance
-
Full caching enabled
-
Monitoring and alerting
-
Auto-restart on crashes
Benefit: Stable, reliable service.
References
Technical Concepts
-
Separation of Concerns - Wikipedia
-
Content Delivery Network (CDN) - Wikipedia
-
Horizontal Scaling - Wikipedia
-
Cosine Similarity - Wikipedia
AWS Services
-
CloudFront - AWS CDN documentation
-
S3 - AWS object storage documentation
-
DynamoDB - AWS NoSQL database documentation
Related Articles
-
Search Service Architecture - Standalone search service details
-
SEO Embedding Strategy - Why we use all-mpnet-base-v2
-
SEO Product Matching - How vector search works
-
Translation System - Multi-language support architecture
Summary
Our multi-server architecture separates concerns across specialized servers:
Production ERP Server:
-
Internal management system
-
24/7 uptime
-
Shared storage and email
Public Website Server:
-
Customer-facing website
-
24/7 uptime
-
Behind CDN with S3-backed static files
Development/Staging Server:
-
Safe testing environments
-
Email campaign processing
-
Cost-optimized (powers off nights/weekends)
Search Service Server:
-
Vector similarity search
-
Filter extraction
-
NumPy-backed embeddings
Cache Server (Valkey):
-
Query embeddings cache
-
RediSearch vector indexes
-
Popular queries ranking
Key Benefits:
-
✅ Resource isolation: No contention between workloads
-
✅ Independent scaling: Scale only what needs scaling
-
✅ Fault isolation: Failures don't cascade
-
✅ Deployment safety: Test before production
-
✅ Cost optimization: Right-sized instances, power schedules
-
✅ Performance: CDN caching, specialized storage strategies
This architecture enables us to serve millions of requests while maintaining high availability, low latency, and manageable costs.