Translation System: Three-Technology Hybrid Approach

This article explains our translation system that combines three technologies—Flask-Babel for templates, TranslationManager for phrase tables, and DeepSeek AI for content translation—to provide comprehensive multilingual support.

The Problem: Translating a Complex Website

Our website has multiple types of content that need translation:

Static templates: Navigation, buttons, labels (500+ strings)
Dynamic content: Product names, descriptions, features (10,000+ strings)
User-generated content: Reviews, comments, support tickets
Technical terms: Brand names, SKUs, URLs (must NOT be translated)

A single translation approach doesn't work:

Machine translation: Fast but inaccurate for technical terms
Manual translation: Accurate but slow and expensive
Template-only: Doesn't handle dynamic content

We need a hybrid approach that combines the best of all three.

Translation Architecture

graph TD
    Request[User Request
lang=hi]
    
    subgraph Templates
        Babel[Flask-Babel
.po files]
        Static[Static strings
buttons, labels]
    end
    
    subgraph Dynamic
        TM[TranslationManager
phrase tables]
        Phrases[Product names
features, terms]
    end
    
    subgraph AI
        DS[DeepSeek API
AI translation]
        Content[Descriptions
articles, long text]
    end
    
    Request --> Babel
    Request --> TM
    Request --> DS
    
    Babel --> Static
    TM --> Phrases
    DS --> Content
    
    Static --> Response[Translated Page]
    Phrases --> Response
    Content --> Response

Three Technologies

1. Flask-Babel: Template Translations

Flask-Babel handles static template strings using gettext .po files.

Use case: Navigation, buttons, labels, error messages
Example:

# Template
{{ _('Add to Cart') }}

# .po file (Hindi)
msgid "Add to Cart"
msgstr "कार्ट में जोड़ें"

Benefits:

✅ Standard i18n approach
✅ Translation tools support (Poedit, Weblate)
✅ Compile-time validation
✅ Fast lookup (compiled .mo files)

Limitations:

❌ Only works for static strings
❌ Requires code changes to add new strings
❌ No dynamic content support

2. TranslationManager: Phrase Tables

Our custom TranslationManager class handles dynamic content using JSON phrase tables.

Use case: Product names, features, specifications, descriptions
Example:

{
  "Mini PC": "मिनी पीसी",
  "Intel N100": "Intel N100",
  "8GB RAM": "8GB RAM",
  "256GB SSD": "256GB SSD"
}

Benefits:

✅ Dynamic content support
✅ Runtime updates (no restart needed)
✅ Preservation rules (brand names, SKUs)
✅ Regional formatting (decimal separators)
✅ Translation queue (missing strings)

Limitations:

❌ Manual phrase management
❌ No context for translators
❌ Requires initial translation

3. DeepSeek AI: Content Translation

DeepSeek AI model handles long-form content translation.

Use case: Blog articles, product descriptions, marketing copy
Example:

prompt = f"Translate to {lang}: {text}"
response = deepseek.generate(prompt)

Benefits:

✅ Handles long content (1000+ words)
✅ Context-aware translation
✅ Natural language output
✅ Batch processing

Limitations:

❌ API cost per translation
❌ Requires internet connection
❌ May translate technical terms incorrectly

TranslationManager Architecture

Phrase Table Structure

Phrase tables are stored as JSON files per language:

app/shared/translation/phrase_tables/

Each file maps English phrases to translations:

{
  "Mini PC": "मिनी पीसी",
  "Thin Client": "थिन क्लाइंट",
  "Industrial PC": "औद्योगिक पीसी",
  "All-in-One": "ऑल-इन-वन"
}

Translation Lookup

The get() method performs multi-tier lookup:

1. Preservation Check:

if self._should_skip_translation(text, lang):
    return text  # Don't translate brand names, SKUs, URLs

2. Comma Splitting (for long strings):

if len(text) > 150 and "," in text:
    parts = text.split(",")
    return ", ".join(self.get(p, lang) for p in parts)

3. Phrase Table Lookup:

if text in self.phrase_tables[lang]:
    return self.phrase_tables[lang][text]

4. Legacy Cache Promotion:

if text_hash in self.legacy_caches[lang]:
    translation = self.legacy_caches[lang][text_hash]
    self.set(text, lang, translation)  # Promote to phrase table
    return translation

5. Queue if Missing:

if queue_if_missing:
    self.add_to_queue(text, lang)
return None  # No translation found

Preservation Rules

We preserve specific content types:

Brand Names:

BRAND_NAMES_PRESERVE = [
    "Thinvent", "Intel", "AMD", "Microsoft", "Ubuntu",
    "Windows", "Linux", "WiFi", "Bluetooth"
]

Technical Terms:

TECHNICAL_TERMS_PRESERVE = [
    "HDMI", "DisplayPort", "VGA", "USB", "Ethernet",
    "DDR4", "SSD", "NVMe", "PCIe", "SATA"
]

SKUs (pattern-based):

if not " " in text and text.count("-") >= 2:
    return True  # Preserve "Treo-N100-8-256-2H-W6-11P"

URLs:

if text.startswith(("/", "http://", "https://")):
    return True

Regional Formatting

For European locales (German, French, Spanish), we apply regional formatting:

Decimal Separator:

# English: 34.00
# German:  34,00
text = text.replace(".", ",")

Voltage/Amperage:

# English: 12.5V
# German:  12,5V
text = re.sub(r"(\d+)\.(\d+)V", r"\1,\2V", text)

This ensures numbers display correctly for each locale.

Translation Queue

Missing translations are queued for batch processing:

[
  {
    "text": "Compact Desktop",
    "lang": "hi",
    "hash": "a1b2c3d4e5f6"
  },
  {
    "text": "Fanless Design",
    "lang": "hi",
    "hash": "f6e5d4c3b2a1"
  }
]

A background script processes the queue using DeepSeek AI:

for item in queue:
    translation = deepseek.translate(item["text"], item["lang"])
    translation_manager.set(item["text"], item["lang"], translation)

File Locking

We use fcntl file locking to prevent concurrent writes:

with open(lock_path, "w") as lock_file:
    fcntl.flock(lock_file, fcntl.LOCK_EX)
    try:
        # Read, modify, write phrase table
        with open(table_path, "r+") as f:
            data = json.load(f)
            data.update(new_translations)
            f.seek(0)
            json.dump(data, f)
            f.truncate()
    finally:
        fcntl.flock(lock_file, fcntl.LOCK_UN)

This ensures multiple processes don't corrupt the phrase tables.

Language Detection

We detect query language using Unicode ranges and stopwords:

Unicode Range Detection

Devanagari (Hindi, Marathi):

if 0x0900 <= ord(char) <= 0x097F:
    return "hi"

Bengali:

if 0x0980 <= ord(char) <= 0x09FF:
    return "bn"

Arabic:

if 0x0600 <= ord(char) <= 0x06FF:
    return "ar"

Cyrillic (Russian):

if 0x0400 <= ord(char) <= 0x04FF:
    return "ru"

CJK (Chinese, Japanese, Korean):

if 0x4E00 <= ord(char) <= 0x9FFF:  # Kanji/Hanzi
    has_cjk = True
if 0x3040 <= ord(char) <= 0x30FF:  # Hiragana/Katakana
    return "ja"
if 0xAC00 <= ord(char) <= 0xD7AF:  # Hangul
    return "ko"

Stopword Detection

For Latin-based languages, we use stopwords:

Spanish:

SPANISH_STOPWORDS = {
    "el", "la", "los", "las", "un", "una", "para", "con",
    "en", "por", "que", "es", "su", "y", "del", "al"
}

French:

FRENCH_STOPWORDS = {
    "le", "la", "les", "des", "du", "un", "une", "pour",
    "avec", "en", "est", "sur", "et", "au", "dans"
}

German:

GERMAN_STOPWORDS = {
    "der", "die", "das", "ein", "eine", "für", "mit",
    "ist", "und", "auf", "den", "dem", "bei", "von"
}

If any stopword matches, we return that language.

Dynamic Language Detection

Users can specify language via:

1. URL Parameter

/p/Treo-N100-8-256-2H-W6-11P?lang=hi

2. Cookie

response.set_cookie("lang", "hi", max_age=31536000)

3. Browser Accept-Language Header

lang = request.accept_languages.best_match(["en", "hi", "es", "fr", "de"])

Priority: URL param > Cookie > Accept-Language > Default (en)

Recursive Translation

We recursively translate nested data structures:

def translate_data(obj, lang):
    if isinstance(obj, dict):
        return {k: translate_data(v, lang) for k, v in obj.items()}
    elif isinstance(obj, list):
        return [translate_data(item, lang) for item in obj]
    elif isinstance(obj, str):
        return self.get(obj, lang) or obj
    return obj

Example:

data = {
    "title": "Mini PC",
    "features": {
        "RAM": "8GB",
        "Storage": "256GB SSD"
    },
    "price": 25000
}

translated = translate_data(data, "hi")
# Result:
# {
#     "title": "मिनी पीसी",
#     "features": {
#         "RAM": "8GB",
#         "Storage": "256GB SSD"
#     },
#     "price": 25000
# }

Feature Translation

Product features require special handling:

def translate_features(features, lang):
    translated = {}
    for heading, feature_dict in features.items():
        # Translate heading
        translated_heading = self.get(heading, lang) or heading

        # Translate feature names and values
        translated_features = {}
        for name, value in feature_dict.items():
            translated_name = self.get(name, lang) or name
            translated_value = self.get(value, lang) or value
            translated_features[translated_name] = translated_value

        translated[translated_heading] = translated_features

    return translated

Example:

features = {
    "Processing": {
        "Processor": "Intel N100",
        "Cores": "4",
        "RAM": "8GB"
    }
}

translated = translate_features(features, "hi")
# Result:
# {
#     "प्रोसेसिंग": {
#         "प्रोसेसर": "Intel N100",
#         "कोर": "4",
#         "RAM": "8GB"
#     }
# }

Integration with SEO Pipeline

The translation system integrates with the SEO pipeline:

Query Language Propagation

When a user searches in Hindi, we propagate the language to related searches:

if SEO_ENABLE_LANGUAGE_PROPAGATION:
    lang = detect_query_language(query)
    if lang != "en":
        url = f"{url}?lang={lang}"

This ensures users stay in their preferred language when clicking related searches.

Translated Query Pages

Query pages are generated in multiple languages:

/q/mini-pc          (English)
/q/mini-pc?lang=hi  (Hindi)
/q/mini-pc?lang=es  (Spanish)

The content is translated using TranslationManager.

Performance Characteristics

Phrase Table Lookup:

Time: O(1) dictionary lookup (~1μs)
Memory: ~5 MB per language (10,000 phrases)

Language Detection:

Time: O(n) where n = string length (~10μs for 100 chars)
Memory: Negligible

Recursive Translation:

Time: O(n) where n = number of strings (~1ms for 100 strings)
Memory: Proportional to data structure size

File Locking:

Time: ~1ms per lock acquisition
Contention: Rare (writes are infrequent)

References

Technical Concepts

Internationalization (i18n) - Wikipedia
Unicode Block - Wikipedia
Gettext - GNU documentation
fcntl - Python documentation

Libraries and Tools

Flask-Babel - Official documentation
DeepSeek - AI translation service

Dynamic Language Detection - How we detect query language
Regional Formatting - Locale-specific number formatting
Translation Queue - Batch processing missing translations
Preservation Rules - What NOT to translate

Summary

Our translation system uses three technologies:

Flask-Babel (.po files):

Static template strings (navigation, buttons, labels)
Standard i18n approach
Compile-time validation

TranslationManager (JSON phrase tables):

Dynamic content (product names, features, descriptions)
Runtime updates
Preservation rules (brand names, SKUs, URLs)
Regional formatting (decimal separators)
Translation queue (missing strings)

DeepSeek AI (API):

Long-form content (blog articles, marketing copy)
Context-aware translation
Batch processing

Language Detection:

Unicode ranges (Devanagari, Arabic, Cyrillic, CJK)
Stopwords (Spanish, French, German, Italian, Portuguese)
URL param > Cookie > Accept-Language header

Integration:

Recursive translation for nested data
Feature translation for product specifications
Query language propagation for SEO

This hybrid approach provides comprehensive multilingual support while preserving technical accuracy and brand consistency.

← Back to Documentation Index

Products

Popular Searches and Blogs

Translation System: Three-Technology Hybrid Approach

The Problem: Translating a Complex Website

Translation Architecture

Three Technologies

1. Flask-Babel: Template Translations

2. TranslationManager: Phrase Tables

3. DeepSeek AI: Content Translation

TranslationManager Architecture

Phrase Table Structure

Translation Lookup

Preservation Rules

Regional Formatting

Translation Queue

File Locking

Language Detection

Unicode Range Detection

Stopword Detection

Dynamic Language Detection

1. URL Parameter

2. Cookie

3. Browser Accept-Language Header

Recursive Translation

Feature Translation

Integration with SEO Pipeline

Query Language Propagation

Translated Query Pages

Performance Characteristics

References

Technical Concepts

Libraries and Tools

Related Articles

Summary