Translation System: Three-Technology Hybrid Approach

This article explains our translation system that combines three technologies—Flask-Babel for templates, TranslationManager for phrase tables, and DeepSeek AI for content translation—to provide comprehensive multilingual support.

The Problem: Translating a Complex Website

Our website has multiple types of content that need translation:

  • Static templates: Navigation, buttons, labels (500+ strings)

  • Dynamic content: Product names, descriptions, features (10,000+ strings)

  • User-generated content: Reviews, comments, support tickets

  • Technical terms: Brand names, SKUs, URLs (must NOT be translated)

A single translation approach doesn't work:

  • Machine translation: Fast but inaccurate for technical terms

  • Manual translation: Accurate but slow and expensive

  • Template-only: Doesn't handle dynamic content

We need a hybrid approach that combines the best of all three.

Translation Architecture

graph TD
    Request[User Request
lang=hi] subgraph Templates Babel[Flask-Babel
.po files] Static[Static strings
buttons, labels] end subgraph Dynamic TM[TranslationManager
phrase tables] Phrases[Product names
features, terms] end subgraph AI DS[DeepSeek API
AI translation] Content[Descriptions
articles, long text] end Request --> Babel Request --> TM Request --> DS Babel --> Static TM --> Phrases DS --> Content Static --> Response[Translated Page] Phrases --> Response Content --> Response

Three Technologies

1. Flask-Babel: Template Translations

Flask-Babel handles static template strings using gettext .po files.

  • Use case: Navigation, buttons, labels, error messages

  • Example:

# Template
{{ _('Add to Cart') }}

# .po file (Hindi)
msgid "Add to Cart"
msgstr "कार्ट में जोड़ें"

Benefits:

  • ✅ Standard i18n approach

  • ✅ Translation tools support (Poedit, Weblate)

  • ✅ Compile-time validation

  • ✅ Fast lookup (compiled .mo files)

Limitations:

  • ❌ Only works for static strings

  • ❌ Requires code changes to add new strings

  • ❌ No dynamic content support

2. TranslationManager: Phrase Tables

Our custom TranslationManager class handles dynamic content using JSON phrase tables.

  • Use case: Product names, features, specifications, descriptions

  • Example:

{
  "Mini PC": "मिनी पीसी",
  "Intel N100": "Intel N100",
  "8GB RAM": "8GB RAM",
  "256GB SSD": "256GB SSD"
}

Benefits:

  • ✅ Dynamic content support

  • ✅ Runtime updates (no restart needed)

  • ✅ Preservation rules (brand names, SKUs)

  • ✅ Regional formatting (decimal separators)

  • ✅ Translation queue (missing strings)

Limitations:

  • ❌ Manual phrase management

  • ❌ No context for translators

  • ❌ Requires initial translation

3. DeepSeek AI: Content Translation

DeepSeek AI model handles long-form content translation.

  • Use case: Blog articles, product descriptions, marketing copy

  • Example:

prompt = f"Translate to {lang}: {text}"
response = deepseek.generate(prompt)

Benefits:

  • ✅ Handles long content (1000+ words)

  • ✅ Context-aware translation

  • ✅ Natural language output

  • ✅ Batch processing

Limitations:

  • ❌ API cost per translation

  • ❌ Requires internet connection

  • ❌ May translate technical terms incorrectly

TranslationManager Architecture

Phrase Table Structure

Phrase tables are stored as JSON files per language:

app/shared/translation/phrase_tables/

Each file maps English phrases to translations:

{
  "Mini PC": "मिनी पीसी",
  "Thin Client": "थिन क्लाइंट",
  "Industrial PC": "औद्योगिक पीसी",
  "All-in-One": "ऑल-इन-वन"
}

Translation Lookup

The get() method performs multi-tier lookup:

1. Preservation Check:

if self._should_skip_translation(text, lang):
    return text  # Don't translate brand names, SKUs, URLs

2. Comma Splitting (for long strings):

if len(text) > 150 and "," in text:
    parts = text.split(",")
    return ", ".join(self.get(p, lang) for p in parts)

3. Phrase Table Lookup:

if text in self.phrase_tables[lang]:
    return self.phrase_tables[lang][text]

4. Legacy Cache Promotion:

if text_hash in self.legacy_caches[lang]:
    translation = self.legacy_caches[lang][text_hash]
    self.set(text, lang, translation)  # Promote to phrase table
    return translation

5. Queue if Missing:

if queue_if_missing:
    self.add_to_queue(text, lang)
return None  # No translation found

Preservation Rules

We preserve specific content types:

Brand Names:

BRAND_NAMES_PRESERVE = [
    "Thinvent", "Intel", "AMD", "Microsoft", "Ubuntu",
    "Windows", "Linux", "WiFi", "Bluetooth"
]

Technical Terms:

TECHNICAL_TERMS_PRESERVE = [
    "HDMI", "DisplayPort", "VGA", "USB", "Ethernet",
    "DDR4", "SSD", "NVMe", "PCIe", "SATA"
]

SKUs (pattern-based):

if not " " in text and text.count("-") >= 2:
    return True  # Preserve "Treo-N100-8-256-2H-W6-11P"

URLs:

if text.startswith(("/", "http://", "https://")):
    return True

Regional Formatting

For European locales (German, French, Spanish), we apply regional formatting:

Decimal Separator:

# English: 34.00
# German:  34,00
text = text.replace(".", ",")

Voltage/Amperage:

# English: 12.5V
# German:  12,5V
text = re.sub(r"(\d+)\.(\d+)V", r"\1,\2V", text)

This ensures numbers display correctly for each locale.

Translation Queue

Missing translations are queued for batch processing:

[
  {
    "text": "Compact Desktop",
    "lang": "hi",
    "hash": "a1b2c3d4e5f6"
  },
  {
    "text": "Fanless Design",
    "lang": "hi",
    "hash": "f6e5d4c3b2a1"
  }
]

A background script processes the queue using DeepSeek AI:

for item in queue:
    translation = deepseek.translate(item["text"], item["lang"])
    translation_manager.set(item["text"], item["lang"], translation)

File Locking

We use fcntl file locking to prevent concurrent writes:

with open(lock_path, "w") as lock_file:
    fcntl.flock(lock_file, fcntl.LOCK_EX)
    try:
        # Read, modify, write phrase table
        with open(table_path, "r+") as f:
            data = json.load(f)
            data.update(new_translations)
            f.seek(0)
            json.dump(data, f)
            f.truncate()
    finally:
        fcntl.flock(lock_file, fcntl.LOCK_UN)

This ensures multiple processes don't corrupt the phrase tables.

Language Detection

We detect query language using Unicode ranges and stopwords:

Unicode Range Detection

Devanagari (Hindi, Marathi):

if 0x0900 <= ord(char) <= 0x097F:
    return "hi"

Bengali:

if 0x0980 <= ord(char) <= 0x09FF:
    return "bn"

Arabic:

if 0x0600 <= ord(char) <= 0x06FF:
    return "ar"

Cyrillic (Russian):

if 0x0400 <= ord(char) <= 0x04FF:
    return "ru"

CJK (Chinese, Japanese, Korean):

if 0x4E00 <= ord(char) <= 0x9FFF:  # Kanji/Hanzi
    has_cjk = True
if 0x3040 <= ord(char) <= 0x30FF:  # Hiragana/Katakana
    return "ja"
if 0xAC00 <= ord(char) <= 0xD7AF:  # Hangul
    return "ko"

Stopword Detection

For Latin-based languages, we use stopwords:

Spanish:

SPANISH_STOPWORDS = {
    "el", "la", "los", "las", "un", "una", "para", "con",
    "en", "por", "que", "es", "su", "y", "del", "al"
}

French:

FRENCH_STOPWORDS = {
    "le", "la", "les", "des", "du", "un", "une", "pour",
    "avec", "en", "est", "sur", "et", "au", "dans"
}

German:

GERMAN_STOPWORDS = {
    "der", "die", "das", "ein", "eine", "für", "mit",
    "ist", "und", "auf", "den", "dem", "bei", "von"
}

If any stopword matches, we return that language.

Dynamic Language Detection

Users can specify language via:

1. URL Parameter

/p/Treo-N100-8-256-2H-W6-11P?lang=hi

2. Cookie

response.set_cookie("lang", "hi", max_age=31536000)

3. Browser Accept-Language Header

lang = request.accept_languages.best_match(["en", "hi", "es", "fr", "de"])

Priority: URL param > Cookie > Accept-Language > Default (en)

Recursive Translation

We recursively translate nested data structures:

def translate_data(obj, lang):
    if isinstance(obj, dict):
        return {k: translate_data(v, lang) for k, v in obj.items()}
    elif isinstance(obj, list):
        return [translate_data(item, lang) for item in obj]
    elif isinstance(obj, str):
        return self.get(obj, lang) or obj
    return obj

Example:

data = {
    "title": "Mini PC",
    "features": {
        "RAM": "8GB",
        "Storage": "256GB SSD"
    },
    "price": 25000
}

translated = translate_data(data, "hi")
# Result:
# {
#     "title": "मिनी पीसी",
#     "features": {
#         "RAM": "8GB",
#         "Storage": "256GB SSD"
#     },
#     "price": 25000
# }

Feature Translation

Product features require special handling:

def translate_features(features, lang):
    translated = {}
    for heading, feature_dict in features.items():
        # Translate heading
        translated_heading = self.get(heading, lang) or heading

        # Translate feature names and values
        translated_features = {}
        for name, value in feature_dict.items():
            translated_name = self.get(name, lang) or name
            translated_value = self.get(value, lang) or value
            translated_features[translated_name] = translated_value

        translated[translated_heading] = translated_features

    return translated

Example:

features = {
    "Processing": {
        "Processor": "Intel N100",
        "Cores": "4",
        "RAM": "8GB"
    }
}

translated = translate_features(features, "hi")
# Result:
# {
#     "प्रोसेसिंग": {
#         "प्रोसेसर": "Intel N100",
#         "कोर": "4",
#         "RAM": "8GB"
#     }
# }

Integration with SEO Pipeline

The translation system integrates with the SEO pipeline:

Query Language Propagation

When a user searches in Hindi, we propagate the language to related searches:

if SEO_ENABLE_LANGUAGE_PROPAGATION:
    lang = detect_query_language(query)
    if lang != "en":
        url = f"{url}?lang={lang}"

This ensures users stay in their preferred language when clicking related searches.

Translated Query Pages

Query pages are generated in multiple languages:

/q/mini-pc          (English)
/q/mini-pc?lang=hi  (Hindi)
/q/mini-pc?lang=es  (Spanish)

The content is translated using TranslationManager.

Performance Characteristics

Phrase Table Lookup:

  • Time: O(1) dictionary lookup (~1μs)

  • Memory: ~5 MB per language (10,000 phrases)

Language Detection:

  • Time: O(n) where n = string length (~10μs for 100 chars)

  • Memory: Negligible

Recursive Translation:

  • Time: O(n) where n = number of strings (~1ms for 100 strings)

  • Memory: Proportional to data structure size

File Locking:

  • Time: ~1ms per lock acquisition

  • Contention: Rare (writes are infrequent)

References

Technical Concepts

Libraries and Tools

Related Articles

Summary

Our translation system uses three technologies:

Flask-Babel (.po files):

  • Static template strings (navigation, buttons, labels)

  • Standard i18n approach

  • Compile-time validation

TranslationManager (JSON phrase tables):

  • Dynamic content (product names, features, descriptions)

  • Runtime updates

  • Preservation rules (brand names, SKUs, URLs)

  • Regional formatting (decimal separators)

  • Translation queue (missing strings)

DeepSeek AI (API):

  • Long-form content (blog articles, marketing copy)

  • Context-aware translation

  • Batch processing

Language Detection:

  • Unicode ranges (Devanagari, Arabic, Cyrillic, CJK)

  • Stopwords (Spanish, French, German, Italian, Portuguese)

  • URL param > Cookie > Accept-Language header

Integration:

  • Recursive translation for nested data

  • Feature translation for product specifications

  • Query language propagation for SEO

This hybrid approach provides comprehensive multilingual support while preserving technical accuracy and brand consistency.


← Back to Documentation Index