Translation Queue: Batch Processing System

This article explains how we queue and batch-process translations to optimize AI API usage.

The Problem: On-Demand Translation is Expensive

Translating content on-demand has issues:

  • Slow: Each translation takes 1-2 seconds

  • Expensive: API costs per request

  • Redundant: Same text translated multiple times

  • Blocking: User waits for translation

We need a better approach.

The Solution: Queue and Batch

Queue: Collect translation requests

Batch: Process multiple translations together

Cache: Store results for reuse

Schedule: Process queue periodically (not real-time)

Queue Structure

Queue File

Location: JSON file on disk

Format: Array of translation requests

Fields:

  • text: English text to translate

  • target_lang: Language code (hi, de, fr, etc.)

  • context: Where text appears (product, query, article)

  • priority: High/normal/low

Adding to Queue

When translation missing:

queue_translation(text, target_lang, context="product")

Deduplication: Check if already queued

Validation: Reject garbage values

Batch Processing

Script

Location: scripts/web/process_translation_queue.py

Schedule: Runs every 6 hours via cron

Lock file: Prevents concurrent runs

Process Flow

1. Load queue: Read all pending requests

2. Group by language: Batch same-language requests

3. Deduplicate: Remove duplicates within batch

4. Check cache: Skip already-translated texts

5. Translate batch: Send to DeepSeek API

6. Parse results: Extract translations from response

7. Save to cache: Store in phrase tables

8. Clear queue: Remove processed requests

Batch Translation

API Call

Model: DeepSeek-V3 (via Together.ai)

System prompt: Cached (same for all batches in language)

User prompt: Variable (batch-specific)

Format: Numbered list

Example:

Translate these 10 texts:
1. Mini PC
2. Thin Client
3. Compact Desktop
...

Response:

1. मिनी पीसी
2. थिन क्लाइंट
3. कॉम्पैक्ट डेस्कटॉप
...

Parsing

Extract translations by line number:

  • Remove number prefix (1., 2., etc.)

  • Match to original texts by position

  • Validate count matches

Error Handling

API failure: Retry with fallback API

Parse failure: Return original texts

Partial success: Save successful translations, requeue failures

Caching Strategy

Phrase Tables

Location: JSON files per language

Format: {"English": "Translation"}

Loading: Loaded once at startup

Benefit: Fast lookups, no API calls

Cache Hit Rate

First run: Low (everything new)

Subsequent runs: High (most texts cached)

Benefit: Reduced API costs

Preservation Rules

During translation, we preserve:

Brand names: Thinvent®, Intel®, AMD®

HTML tags: <p>, <br>, <strong>

URLs: https://www.thinvent.in

SKUs: Treo-N100-8-256

Numbers: 8GB, 256GB, 4 cores

Implementation: Regex patterns in system prompt

Language Detection

Before translating, check if already translated:

Method: Character set analysis

Hindi: Devanagari script

Chinese: CJK characters

Arabic: Arabic script

Benefit: Skip unnecessary translations

Priority Handling

High priority: Product names, features (process first)

Normal priority: Descriptions, articles (process second)

Low priority: Old content, rarely viewed (process last)

Benefit: Important content translated first

Scheduling

Cron Job

Frequency: Every 6 hours

Command: python3 scripts/web/process_translation_queue.py

Lock file: /tmp/process_translation_queue.lock

Benefit: Automatic processing, no manual intervention

Weekly Tasks

Articles: Translate new articles weekly

Babel strings: Update template translations weekly

Script: scripts/web/translate_articles_weekly.sh

Monitoring

Queue Size

Track pending requests:

  • Total requests

  • Requests per language

  • Oldest request age

Alert: If queue grows too large

Translation Stats

Track processing:

  • Translations per batch

  • API success rate

  • Cache hit rate

  • Processing time

Cost Tracking

Monitor API usage:

  • Requests per day

  • Tokens per request

  • Cost per language

References

Related Articles

Summary

Translation queue enables efficient batch processing:

Queue:

  • ✅ Collect translation requests

  • ✅ Deduplicate within batch

  • ✅ Priority handling

  • ✅ Validation and filtering

Batch processing:

  • ✅ Group by language

  • ✅ Send to DeepSeek API

  • ✅ Parse numbered responses

  • ✅ Save to phrase tables

Caching:

  • ✅ Check cache before translating

  • ✅ High cache hit rate

  • ✅ Reduced API costs

Scheduling:

  • ✅ Every 6 hours via cron

  • ✅ Lock file prevents concurrent runs

  • ✅ Weekly article translations

Preservation:

  • ✅ Brand names

  • ✅ HTML tags

  • ✅ URLs and SKUs

This approach reduces API costs and improves translation quality through batching and caching.


← Back to Documentation Index