Translation Queue: Batch Processing System
This article explains how we queue and batch-process translations to optimize AI API usage.
The Problem: On-Demand Translation is Expensive
Translating content on-demand has issues:
-
Slow: Each translation takes 1-2 seconds
-
Expensive: API costs per request
-
Redundant: Same text translated multiple times
-
Blocking: User waits for translation
We need a better approach.
The Solution: Queue and Batch
Queue: Collect translation requests
Batch: Process multiple translations together
Cache: Store results for reuse
Schedule: Process queue periodically (not real-time)
Queue Structure
Queue File
Location: JSON file on disk
Format: Array of translation requests
Fields:
-
text: English text to translate -
target_lang: Language code (hi, de, fr, etc.) -
context: Where text appears (product, query, article) -
priority: High/normal/low
Adding to Queue
When translation missing:
queue_translation(text, target_lang, context="product")
Deduplication: Check if already queued
Validation: Reject garbage values
Batch Processing
Script
Location: scripts/web/process_translation_queue.py
Schedule: Runs every 6 hours via cron
Lock file: Prevents concurrent runs
Process Flow
1. Load queue: Read all pending requests
2. Group by language: Batch same-language requests
3. Deduplicate: Remove duplicates within batch
4. Check cache: Skip already-translated texts
5. Translate batch: Send to DeepSeek API
6. Parse results: Extract translations from response
7. Save to cache: Store in phrase tables
8. Clear queue: Remove processed requests
Batch Translation
API Call
Model: DeepSeek-V3 (via Together.ai)
System prompt: Cached (same for all batches in language)
User prompt: Variable (batch-specific)
Format: Numbered list
Example:
Translate these 10 texts:
1. Mini PC
2. Thin Client
3. Compact Desktop
...
Response:
1. मिनी पीसी
2. थिन क्लाइंट
3. कॉम्पैक्ट डेस्कटॉप
...
Parsing
Extract translations by line number:
-
Remove number prefix (
1.,2., etc.) -
Match to original texts by position
-
Validate count matches
Error Handling
API failure: Retry with fallback API
Parse failure: Return original texts
Partial success: Save successful translations, requeue failures
Caching Strategy
Phrase Tables
Location: JSON files per language
Format: {"English": "Translation"}
Loading: Loaded once at startup
Benefit: Fast lookups, no API calls
Cache Hit Rate
First run: Low (everything new)
Subsequent runs: High (most texts cached)
Benefit: Reduced API costs
Preservation Rules
During translation, we preserve:
Brand names: Thinvent®, Intel®, AMD®
HTML tags: <p>, <br>, <strong>
URLs: https://www.thinvent.in
SKUs: Treo-N100-8-256
Numbers: 8GB, 256GB, 4 cores
Implementation: Regex patterns in system prompt
Language Detection
Before translating, check if already translated:
Method: Character set analysis
Hindi: Devanagari script
Chinese: CJK characters
Arabic: Arabic script
Benefit: Skip unnecessary translations
Priority Handling
High priority: Product names, features (process first)
Normal priority: Descriptions, articles (process second)
Low priority: Old content, rarely viewed (process last)
Benefit: Important content translated first
Scheduling
Cron Job
Frequency: Every 6 hours
Command: python3 scripts/web/process_translation_queue.py
Lock file: /tmp/process_translation_queue.lock
Benefit: Automatic processing, no manual intervention
Weekly Tasks
Articles: Translate new articles weekly
Babel strings: Update template translations weekly
Script: scripts/web/translate_articles_weekly.sh
Monitoring
Queue Size
Track pending requests:
-
Total requests
-
Requests per language
-
Oldest request age
Alert: If queue grows too large
Translation Stats
Track processing:
-
Translations per batch
-
API success rate
-
Cache hit rate
-
Processing time
Cost Tracking
Monitor API usage:
-
Requests per day
-
Tokens per request
-
Cost per language
References
Related Articles
-
Translation System - Three-technology hybrid
-
Content AI Generation - DeepSeek integration
-
Language Detection - User language preference
Summary
Translation queue enables efficient batch processing:
Queue:
-
✅ Collect translation requests
-
✅ Deduplicate within batch
-
✅ Priority handling
-
✅ Validation and filtering
Batch processing:
-
✅ Group by language
-
✅ Send to DeepSeek API
-
✅ Parse numbered responses
-
✅ Save to phrase tables
Caching:
-
✅ Check cache before translating
-
✅ High cache hit rate
-
✅ Reduced API costs
Scheduling:
-
✅ Every 6 hours via cron
-
✅ Lock file prevents concurrent runs
-
✅ Weekly article translations
Preservation:
-
✅ Brand names
-
✅ HTML tags
-
✅ URLs and SKUs
This approach reduces API costs and improves translation quality through batching and caching.