Design i18n — Graphite

Reference Answer

For a full example answer with detailed architecture diagrams and deep dives, see our Design a Distributed Cache guide. The cache guide covers multi-tier caching, invalidation strategies, and read-heavy scaling patterns that are central to serving translations at global scale.

Also review the Caching, Message Queues, and Databases building blocks for background on edge caching, event-driven invalidation, and durable storage for translation data.

Problem Statement

Design an internationalization (i18n) system that enables a social media platform to efficiently support hundreds of languages for text content and assets across the entire site. Engineers integrate stable message keys (e.g., t("auth.login.title")) and the system resolves them into localized strings and assets for each user's locale, handling pluralization, gender, date and number formatting, and right-to-left layout hints.

The system is extremely read-heavy: every page render for every user requires resolving dozens of translation keys. Human translators (and optionally machine translation) produce and review translations through a workflow pipeline. The core challenges are delivering translations with sub-millisecond latency at global scale through aggressive caching, maintaining a safe rollout mechanism with versioning and fallbacks so deployments never break strings, and orchestrating a human-in-the-loop translation pipeline that provides context, quality review, and approval before publishing. Interviewers want to see how you separate the source of truth from caches, design safe invalidation, and handle the developer experience for translation key management.

Key Requirements

Functional

Locale-aware rendering -- users see the site in their preferred language with correct regional formatting, pluralization, and sensible fallbacks when translations are missing
Localized assets -- static content and assets (images, email templates, notifications) are served consistently in the user's locale across web and mobile clients
Translation workflow -- translators submit, review, and approve translations through a managed workflow that supports human review and optional machine translation seeding
Developer integration -- engineers use stable translation keys with versioning, so deployments can roll forward or back without breaking visible strings

Non-Functional

Scalability -- serve translations for 10+ million concurrent users across hundreds of locales with sub-5ms resolution latency at the edge
Reliability -- 99.99% availability for translation resolution; fallback to parent locale or default language if a specific translation is missing
Latency -- translation bundle delivery under 10ms from edge caches; cache invalidation propagates within 30 seconds of a publish event
Consistency -- eventual consistency between the translation source of truth and edge caches is acceptable, but published bundles must be atomically consistent (no partial updates)

What Interviewers Focus On

Based on real interview experiences, these are the areas interviewers probe most deeply:

1. Multi-Tier Caching and Invalidation

i18n is a read-dominated workload where every request resolves many strings. Interviewers want to see an aggressive caching strategy that keeps latency low without serving stale translations after updates.

Hints to consider:

Design a three-tier cache: client-side bundle cache (localStorage or service worker), CDN/edge cache for locale-namespace bundles, and a regional Redis layer as the application-level cache
Use content-hash versioned bundle URLs so new translations get new URLs and old cached bundles remain valid until clients refresh
Implement targeted invalidation: when a translation is published, invalidate only the affected locale-namespace bundle rather than flushing the entire cache
Pre-warm edge caches for high-traffic locales during scheduled publish windows to avoid cache stampedes

2. Translation Key Versioning and Fallback Strategy

Changing or deleting translation keys without versioning breaks deployed clients and causes blank UI text. Interviewers probe whether you have a safe rollout and rollback mechanism.

Hints to consider:

Associate each translation bundle with a content hash or version number; clients request a specific version pinned at deploy time
Implement locale fallback chains (e.g., fr-CA falls back to fr, which falls back to en) so missing translations in a specific variant still display something meaningful
Support deprecation warnings for keys that are scheduled for removal, giving engineers time to migrate before deletion
Store effective-dated key versions so you can reproduce exactly what users saw on any historical date for debugging or compliance

3. Human-in-the-Loop Translation Pipeline

Quality translations require context, review, and approval. Interviewers want to see a durable workflow that moves translations through extraction, optional machine translation, human review, QA, and publish stages.

Hints to consider:

Model the pipeline as a multi-step workflow (extract new keys, seed with machine translation, assign to human translator, review, QA, approve, publish) with durable state tracking at each stage
Provide translators with context: screenshots of where the string appears, character limits, pluralization rules, and related strings
Use Kafka to emit events at each workflow transition, enabling audit logging, metrics dashboards, and notifications to stakeholders
Support batch operations so a translator can review an entire namespace at once rather than individual keys

4. Developer Experience and Key Management

The system must be easy for engineers to use without creating orphaned or conflicting keys. Interviewers assess how you integrate with the development workflow.

Hints to consider:

Provide a CLI tool or build plugin that extracts new translation keys from source code and registers them in the translation management system automatically
Detect unused keys by comparing the set of keys in source code against the registry and flagging stale entries for cleanup
Namespace keys by feature or page (e.g., checkout.summary.total) to organize translations logically and allow per-namespace cache bundles
Support ICU MessageFormat or a similar standard for pluralization, gender, and interpolation to avoid ad-hoc string concatenation

Suggested Approach

Step 1: Clarify Requirements

Start by confirming scope and priorities. Ask how many locales the platform supports and whether all locales require complete coverage or if partial coverage with fallbacks is acceptable. Clarify the expected read scale (requests per second) and whether translations change frequently (daily) or infrequently (per release). Verify whether machine translation is in scope or if all translations are human-authored. Establish latency targets for translation resolution and acceptable cache staleness after a publish event.

Step 2: High-Level Architecture

Sketch the core components: a Translation Management Service (TMS) that stores the source of truth for keys and translations in DynamoDB or PostgreSQL, a Workflow Engine that orchestrates the translation pipeline stages, a Bundle Publisher that compiles locale-namespace bundles and pushes them to a CDN, a Redis Cache Layer for application-level resolution, and a Client SDK that fetches and caches bundles locally. Show two data flows: the write path (engineer adds key, translator provides translation, reviewer approves, publisher compiles and pushes bundle) and the read path (client requests versioned bundle URL from CDN, falls back to Redis, falls back to origin database).

Step 3: Deep Dive on Caching and Invalidation

Walk through the read path in detail. When a user loads a page, the client SDK checks its local bundle cache for the current version. If missing, it requests the bundle from the CDN using a versioned URL (e.g., /bundles/en-US/checkout.v3a7f2.json). On a CDN miss, the request reaches the regional Redis cache, which holds pre-compiled bundles. On a Redis miss, the application compiles the bundle from the database, writes it to Redis, and returns it. When a translator publishes an update, the Bundle Publisher compiles a new bundle with a new content hash, writes it to Redis and the CDN, and sends a notification to clients to refresh. Discuss how content-hash URLs avoid cache purge complexity: old versions remain cached and valid, while new deploys reference the new hash.

Step 4: Address Secondary Concerns

Cover the translation workflow: new keys enter a "needs translation" queue, optionally seeded with machine translation, assigned to human translators with context and screenshots, reviewed by a second translator, and published on approval. Discuss monitoring: track cache hit rates per locale, bundle size growth, translation coverage percentage per locale, and pipeline throughput (keys translated per day). Address disaster recovery: replicate the translation database across regions and maintain pre-compiled bundle snapshots in object storage as a fallback if Redis and the CDN both fail. Mention security: restrict publish permissions to approved translators and reviewers to prevent unauthorized content changes.

Real Interview Quotes

"Design a system for building a translator service where it translates the web content (just static content). Core entities: users, human translators, engineering devs."

Related Learning

Distributed Cache -- multi-tier caching architecture and invalidation strategies
Caching -- edge and application-level caching for read-heavy workloads
Message Queues -- Kafka for translation workflow events and cache invalidation signals
Databases -- DynamoDB or PostgreSQL for durable translation key-value storage
CDN -- edge delivery of locale-specific translation bundles

Reference Answer

Also review the Caching, Message Queues, and Databases building blocks for background on edge caching, event-driven invalidation, and durable storage for translation data.

Problem Statement

Key Requirements

Functional

Locale-aware rendering -- users see the site in their preferred language with correct regional formatting, pluralization, and sensible fallbacks when translations are missing
Localized assets -- static content and assets (images, email templates, notifications) are served consistently in the user's locale across web and mobile clients
Translation workflow -- translators submit, review, and approve translations through a managed workflow that supports human review and optional machine translation seeding
Developer integration -- engineers use stable translation keys with versioning, so deployments can roll forward or back without breaking visible strings

Non-Functional

Scalability -- serve translations for 10+ million concurrent users across hundreds of locales with sub-5ms resolution latency at the edge
Reliability -- 99.99% availability for translation resolution; fallback to parent locale or default language if a specific translation is missing
Latency -- translation bundle delivery under 10ms from edge caches; cache invalidation propagates within 30 seconds of a publish event
Consistency -- eventual consistency between the translation source of truth and edge caches is acceptable, but published bundles must be atomically consistent (no partial updates)

What Interviewers Focus On

Based on real interview experiences, these are the areas interviewers probe most deeply:

1. Multi-Tier Caching and Invalidation

Hints to consider:

Design a three-tier cache: client-side bundle cache (localStorage or service worker), CDN/edge cache for locale-namespace bundles, and a regional Redis layer as the application-level cache
Use content-hash versioned bundle URLs so new translations get new URLs and old cached bundles remain valid until clients refresh
Implement targeted invalidation: when a translation is published, invalidate only the affected locale-namespace bundle rather than flushing the entire cache
Pre-warm edge caches for high-traffic locales during scheduled publish windows to avoid cache stampedes

2. Translation Key Versioning and Fallback Strategy

Changing or deleting translation keys without versioning breaks deployed clients and causes blank UI text. Interviewers probe whether you have a safe rollout and rollback mechanism.

Hints to consider:

Associate each translation bundle with a content hash or version number; clients request a specific version pinned at deploy time
Implement locale fallback chains (e.g., fr-CA falls back to fr, which falls back to en) so missing translations in a specific variant still display something meaningful
Support deprecation warnings for keys that are scheduled for removal, giving engineers time to migrate before deletion
Store effective-dated key versions so you can reproduce exactly what users saw on any historical date for debugging or compliance

3. Human-in-the-Loop Translation Pipeline

Hints to consider:

Model the pipeline as a multi-step workflow (extract new keys, seed with machine translation, assign to human translator, review, QA, approve, publish) with durable state tracking at each stage
Provide translators with context: screenshots of where the string appears, character limits, pluralization rules, and related strings
Use Kafka to emit events at each workflow transition, enabling audit logging, metrics dashboards, and notifications to stakeholders
Support batch operations so a translator can review an entire namespace at once rather than individual keys

4. Developer Experience and Key Management

The system must be easy for engineers to use without creating orphaned or conflicting keys. Interviewers assess how you integrate with the development workflow.

Hints to consider:

Provide a CLI tool or build plugin that extracts new translation keys from source code and registers them in the translation management system automatically
Detect unused keys by comparing the set of keys in source code against the registry and flagging stale entries for cleanup
Namespace keys by feature or page (e.g., checkout.summary.total) to organize translations logically and allow per-namespace cache bundles
Support ICU MessageFormat or a similar standard for pluralization, gender, and interpolation to avoid ad-hoc string concatenation

Suggested Approach

Step 1: Clarify Requirements

Step 2: High-Level Architecture

Step 3: Deep Dive on Caching and Invalidation

Step 4: Address Secondary Concerns

Real Interview Quotes

"Design a system for building a translator service where it translates the web content (just static content). Core entities: users, human translators, engineering devs."

Related Learning

Distributed Cache -- multi-tier caching architecture and invalidation strategies
Caching -- edge and application-level caching for read-heavy workloads
Message Queues -- Kafka for translation workflow events and cache invalidation signals
Databases -- DynamoDB or PostgreSQL for durable translation key-value storage
CDN -- edge delivery of locale-specific translation bundles