Translation Queue: Batch Processing System

This article explains how we queue and batch-process translations to optimize AI API usage.

The Problem: On-Demand Translation is Expensive

Translating content on-demand has issues:

Slow: Each translation takes 1-2 seconds
Expensive: API costs per request
Redundant: Same text translated multiple times
Blocking: User waits for translation

We need a better approach.

The Solution: Queue and Batch

Queue: Collect translation requests

Batch: Process multiple translations together

Cache: Store results for reuse

Schedule: Process queue periodically (not real-time)

Queue Structure

Queue File

Location: JSON file on disk

Format: Array of translation requests

Fields:

text: English text to translate
target_lang: Language code (hi, de, fr, etc.)
context: Where text appears (product, query, article)
priority: High/normal/low

Adding to Queue

When translation missing:

queue_translation(text, target_lang, context="product")

Deduplication: Check if already queued

Validation: Reject garbage values

Batch Processing

Script

Location: scripts/web/process_translation_queue.py

Schedule: Runs every 6 hours via cron

Lock file: Prevents concurrent runs

Process Flow

1. Load queue: Read all pending requests

2. Group by language: Batch same-language requests

3. Deduplicate: Remove duplicates within batch

4. Check cache: Skip already-translated texts

5. Translate batch: Send to DeepSeek API

6. Parse results: Extract translations from response

7. Save to cache: Store in phrase tables

8. Clear queue: Remove processed requests

Batch Translation

API Call

Model: DeepSeek-V3 (via Together.ai)

System prompt: Cached (same for all batches in language)

User prompt: Variable (batch-specific)

Format: Numbered list

Example:

Translate these 10 texts:
1. Mini PC
2. Thin Client
3. Compact Desktop
...

Response:

1. मिनी पीसी
2. थिन क्लाइंट
3. कॉम्पैक्ट डेस्कटॉप
...

Parsing

Extract translations by line number:

Remove number prefix (1., 2., etc.)
Match to original texts by position
Validate count matches

Error Handling

API failure: Retry with fallback API

Parse failure: Return original texts

Partial success: Save successful translations, requeue failures

Caching Strategy

Phrase Tables

Location: JSON files per language

Format: {"English": "Translation"}

Loading: Loaded once at startup

Benefit: Fast lookups, no API calls

Cache Hit Rate

First run: Low (everything new)

Subsequent runs: High (most texts cached)

Benefit: Reduced API costs

Preservation Rules

During translation, we preserve:

Brand names: Thinvent®, Intel®, AMD®

HTML tags: <p>, <br>, <strong>

URLs: https://www.thinvent.in

SKUs: Treo-N100-8-256

Numbers: 8GB, 256GB, 4 cores

Implementation: Regex patterns in system prompt

Language Detection

Before translating, check if already translated:

Method: Character set analysis

Hindi: Devanagari script

Chinese: CJK characters

Arabic: Arabic script

Benefit: Skip unnecessary translations

Priority Handling

High priority: Product names, features (process first)

Normal priority: Descriptions, articles (process second)

Low priority: Old content, rarely viewed (process last)

Benefit: Important content translated first

Scheduling

Cron Job

Frequency: Every 6 hours

Command: python3 scripts/web/process_translation_queue.py

Lock file: /tmp/process_translation_queue.lock

Benefit: Automatic processing, no manual intervention

Weekly Tasks

Articles: Translate new articles weekly

Babel strings: Update template translations weekly

Script: scripts/web/translate_articles_weekly.sh

Monitoring

Queue Size

Track pending requests:

Total requests
Requests per language
Oldest request age

Alert: If queue grows too large

Translation Stats

Track processing:

Translations per batch
API success rate
Cache hit rate
Processing time

Cost Tracking

Monitor API usage:

Requests per day
Tokens per request
Cost per language

References

Translation System - Three-technology hybrid
Content AI Generation - DeepSeek integration
Language Detection - User language preference

Summary

Translation queue enables efficient batch processing:

Queue:

✅ Collect translation requests
✅ Deduplicate within batch
✅ Priority handling
✅ Validation and filtering

Batch processing:

✅ Group by language
✅ Send to DeepSeek API
✅ Parse numbered responses
✅ Save to phrase tables

Caching:

✅ Check cache before translating
✅ High cache hit rate
✅ Reduced API costs

Scheduling:

✅ Every 6 hours via cron
✅ Lock file prevents concurrent runs
✅ Weekly article translations

Preservation:

✅ Brand names
✅ HTML tags
✅ URLs and SKUs

This approach reduces API costs and improves translation quality through batching and caching.

← Back to Documentation Index

Products

Popular Searches and Blogs