Translation System: Three-Technology Hybrid Approach
This article explains our translation system that combines three technologies—Flask-Babel for templates, TranslationManager for phrase tables, and DeepSeek AI for content translation—to provide comprehensive multilingual support.
The Problem: Translating a Complex Website
Our website has multiple types of content that need translation:
-
Static templates: Navigation, buttons, labels (500+ strings)
-
Dynamic content: Product names, descriptions, features (10,000+ strings)
-
User-generated content: Reviews, comments, support tickets
-
Technical terms: Brand names, SKUs, URLs (must NOT be translated)
A single translation approach doesn't work:
-
Machine translation: Fast but inaccurate for technical terms
-
Manual translation: Accurate but slow and expensive
-
Template-only: Doesn't handle dynamic content
We need a hybrid approach that combines the best of all three.
Translation Architecture
graph TD
Request[User Request
lang=hi]
subgraph Templates
Babel[Flask-Babel
.po files]
Static[Static strings
buttons, labels]
end
subgraph Dynamic
TM[TranslationManager
phrase tables]
Phrases[Product names
features, terms]
end
subgraph AI
DS[DeepSeek API
AI translation]
Content[Descriptions
articles, long text]
end
Request --> Babel
Request --> TM
Request --> DS
Babel --> Static
TM --> Phrases
DS --> Content
Static --> Response[Translated Page]
Phrases --> Response
Content --> ResponseThree Technologies
1. Flask-Babel: Template Translations
Flask-Babel handles static template strings using gettext .po files.
-
Use case: Navigation, buttons, labels, error messages
-
Example:
# Template
{{ _('Add to Cart') }}
# .po file (Hindi)
msgid "Add to Cart"
msgstr "कार्ट में जोड़ें"
Benefits:
-
✅ Standard i18n approach
-
✅ Translation tools support (Poedit, Weblate)
-
✅ Compile-time validation
-
✅ Fast lookup (compiled .mo files)
Limitations:
-
❌ Only works for static strings
-
❌ Requires code changes to add new strings
-
❌ No dynamic content support
2. TranslationManager: Phrase Tables
Our custom TranslationManager class handles dynamic content using JSON phrase tables.
-
Use case: Product names, features, specifications, descriptions
-
Example:
{
"Mini PC": "मिनी पीसी",
"Intel N100": "Intel N100",
"8GB RAM": "8GB RAM",
"256GB SSD": "256GB SSD"
}
Benefits:
-
✅ Dynamic content support
-
✅ Runtime updates (no restart needed)
-
✅ Preservation rules (brand names, SKUs)
-
✅ Regional formatting (decimal separators)
-
✅ Translation queue (missing strings)
Limitations:
-
❌ Manual phrase management
-
❌ No context for translators
-
❌ Requires initial translation
3. DeepSeek AI: Content Translation
DeepSeek AI model handles long-form content translation.
-
Use case: Blog articles, product descriptions, marketing copy
-
Example:
prompt = f"Translate to {lang}: {text}"
response = deepseek.generate(prompt)
Benefits:
-
✅ Handles long content (1000+ words)
-
✅ Context-aware translation
-
✅ Natural language output
-
✅ Batch processing
Limitations:
-
❌ API cost per translation
-
❌ Requires internet connection
-
❌ May translate technical terms incorrectly
TranslationManager Architecture
Phrase Table Structure
Phrase tables are stored as JSON files per language:
app/shared/translation/phrase_tables/
Each file maps English phrases to translations:
{
"Mini PC": "मिनी पीसी",
"Thin Client": "थिन क्लाइंट",
"Industrial PC": "औद्योगिक पीसी",
"All-in-One": "ऑल-इन-वन"
}
Translation Lookup
The get() method performs multi-tier lookup:
1. Preservation Check:
if self._should_skip_translation(text, lang):
return text # Don't translate brand names, SKUs, URLs
2. Comma Splitting (for long strings):
if len(text) > 150 and "," in text:
parts = text.split(",")
return ", ".join(self.get(p, lang) for p in parts)
3. Phrase Table Lookup:
if text in self.phrase_tables[lang]:
return self.phrase_tables[lang][text]
4. Legacy Cache Promotion:
if text_hash in self.legacy_caches[lang]:
translation = self.legacy_caches[lang][text_hash]
self.set(text, lang, translation) # Promote to phrase table
return translation
5. Queue if Missing:
if queue_if_missing:
self.add_to_queue(text, lang)
return None # No translation found
Preservation Rules
We preserve specific content types:
Brand Names:
BRAND_NAMES_PRESERVE = [
"Thinvent", "Intel", "AMD", "Microsoft", "Ubuntu",
"Windows", "Linux", "WiFi", "Bluetooth"
]
Technical Terms:
TECHNICAL_TERMS_PRESERVE = [
"HDMI", "DisplayPort", "VGA", "USB", "Ethernet",
"DDR4", "SSD", "NVMe", "PCIe", "SATA"
]
SKUs (pattern-based):
if not " " in text and text.count("-") >= 2:
return True # Preserve "Treo-N100-8-256-2H-W6-11P"
URLs:
if text.startswith(("/", "http://", "https://")):
return True
Regional Formatting
For European locales (German, French, Spanish), we apply regional formatting:
Decimal Separator:
# English: 34.00
# German: 34,00
text = text.replace(".", ",")
Voltage/Amperage:
# English: 12.5V
# German: 12,5V
text = re.sub(r"(\d+)\.(\d+)V", r"\1,\2V", text)
This ensures numbers display correctly for each locale.
Translation Queue
Missing translations are queued for batch processing:
[
{
"text": "Compact Desktop",
"lang": "hi",
"hash": "a1b2c3d4e5f6"
},
{
"text": "Fanless Design",
"lang": "hi",
"hash": "f6e5d4c3b2a1"
}
]
A background script processes the queue using DeepSeek AI:
for item in queue:
translation = deepseek.translate(item["text"], item["lang"])
translation_manager.set(item["text"], item["lang"], translation)
File Locking
We use fcntl file locking to prevent concurrent writes:
with open(lock_path, "w") as lock_file:
fcntl.flock(lock_file, fcntl.LOCK_EX)
try:
# Read, modify, write phrase table
with open(table_path, "r+") as f:
data = json.load(f)
data.update(new_translations)
f.seek(0)
json.dump(data, f)
f.truncate()
finally:
fcntl.flock(lock_file, fcntl.LOCK_UN)
This ensures multiple processes don't corrupt the phrase tables.
Language Detection
We detect query language using Unicode ranges and stopwords:
Unicode Range Detection
Devanagari (Hindi, Marathi):
if 0x0900 <= ord(char) <= 0x097F:
return "hi"
Bengali:
if 0x0980 <= ord(char) <= 0x09FF:
return "bn"
Arabic:
if 0x0600 <= ord(char) <= 0x06FF:
return "ar"
Cyrillic (Russian):
if 0x0400 <= ord(char) <= 0x04FF:
return "ru"
CJK (Chinese, Japanese, Korean):
if 0x4E00 <= ord(char) <= 0x9FFF: # Kanji/Hanzi
has_cjk = True
if 0x3040 <= ord(char) <= 0x30FF: # Hiragana/Katakana
return "ja"
if 0xAC00 <= ord(char) <= 0xD7AF: # Hangul
return "ko"
Stopword Detection
For Latin-based languages, we use stopwords:
Spanish:
SPANISH_STOPWORDS = {
"el", "la", "los", "las", "un", "una", "para", "con",
"en", "por", "que", "es", "su", "y", "del", "al"
}
French:
FRENCH_STOPWORDS = {
"le", "la", "les", "des", "du", "un", "une", "pour",
"avec", "en", "est", "sur", "et", "au", "dans"
}
German:
GERMAN_STOPWORDS = {
"der", "die", "das", "ein", "eine", "für", "mit",
"ist", "und", "auf", "den", "dem", "bei", "von"
}
If any stopword matches, we return that language.
Dynamic Language Detection
Users can specify language via:
1. URL Parameter
/p/Treo-N100-8-256-2H-W6-11P?lang=hi
2. Cookie
response.set_cookie("lang", "hi", max_age=31536000)
3. Browser Accept-Language Header
lang = request.accept_languages.best_match(["en", "hi", "es", "fr", "de"])
Priority: URL param > Cookie > Accept-Language > Default (en)
Recursive Translation
We recursively translate nested data structures:
def translate_data(obj, lang):
if isinstance(obj, dict):
return {k: translate_data(v, lang) for k, v in obj.items()}
elif isinstance(obj, list):
return [translate_data(item, lang) for item in obj]
elif isinstance(obj, str):
return self.get(obj, lang) or obj
return obj
Example:
data = {
"title": "Mini PC",
"features": {
"RAM": "8GB",
"Storage": "256GB SSD"
},
"price": 25000
}
translated = translate_data(data, "hi")
# Result:
# {
# "title": "मिनी पीसी",
# "features": {
# "RAM": "8GB",
# "Storage": "256GB SSD"
# },
# "price": 25000
# }
Feature Translation
Product features require special handling:
def translate_features(features, lang):
translated = {}
for heading, feature_dict in features.items():
# Translate heading
translated_heading = self.get(heading, lang) or heading
# Translate feature names and values
translated_features = {}
for name, value in feature_dict.items():
translated_name = self.get(name, lang) or name
translated_value = self.get(value, lang) or value
translated_features[translated_name] = translated_value
translated[translated_heading] = translated_features
return translated
Example:
features = {
"Processing": {
"Processor": "Intel N100",
"Cores": "4",
"RAM": "8GB"
}
}
translated = translate_features(features, "hi")
# Result:
# {
# "प्रोसेसिंग": {
# "प्रोसेसर": "Intel N100",
# "कोर": "4",
# "RAM": "8GB"
# }
# }
Integration with SEO Pipeline
The translation system integrates with the SEO pipeline:
Query Language Propagation
When a user searches in Hindi, we propagate the language to related searches:
if SEO_ENABLE_LANGUAGE_PROPAGATION:
lang = detect_query_language(query)
if lang != "en":
url = f"{url}?lang={lang}"
This ensures users stay in their preferred language when clicking related searches.
Translated Query Pages
Query pages are generated in multiple languages:
/q/mini-pc (English)
/q/mini-pc?lang=hi (Hindi)
/q/mini-pc?lang=es (Spanish)
The content is translated using TranslationManager.
Performance Characteristics
Phrase Table Lookup:
-
Time: O(1) dictionary lookup (~1μs)
-
Memory: ~5 MB per language (10,000 phrases)
Language Detection:
-
Time: O(n) where n = string length (~10μs for 100 chars)
-
Memory: Negligible
Recursive Translation:
-
Time: O(n) where n = number of strings (~1ms for 100 strings)
-
Memory: Proportional to data structure size
File Locking:
-
Time: ~1ms per lock acquisition
-
Contention: Rare (writes are infrequent)
References
Technical Concepts
-
Internationalization (i18n) - Wikipedia
-
Unicode Block - Wikipedia
-
Gettext - GNU documentation
-
fcntl - Python documentation
Libraries and Tools
-
Flask-Babel - Official documentation
-
DeepSeek - AI translation service
Related Articles
-
Dynamic Language Detection - How we detect query language
-
Regional Formatting - Locale-specific number formatting
-
Translation Queue - Batch processing missing translations
-
Preservation Rules - What NOT to translate
Summary
Our translation system uses three technologies:
Flask-Babel (.po files):
-
Static template strings (navigation, buttons, labels)
-
Standard i18n approach
-
Compile-time validation
TranslationManager (JSON phrase tables):
-
Dynamic content (product names, features, descriptions)
-
Runtime updates
-
Preservation rules (brand names, SKUs, URLs)
-
Regional formatting (decimal separators)
-
Translation queue (missing strings)
DeepSeek AI (API):
-
Long-form content (blog articles, marketing copy)
-
Context-aware translation
-
Batch processing
Language Detection:
-
Unicode ranges (Devanagari, Arabic, Cyrillic, CJK)
-
Stopwords (Spanish, French, German, Italian, Portuguese)
-
URL param > Cookie > Accept-Language header
Integration:
-
Recursive translation for nested data
-
Feature translation for product specifications
-
Query language propagation for SEO
This hybrid approach provides comprehensive multilingual support while preserving technical accuracy and brand consistency.