You are building a content delivery system that serves localized content to users in their preferred language. When a browser makes an HTTP request, it includes an Accept-Language header that specifies which languages the user prefers, ordered by preference.
Your server supports a limited set of languages, and you need to determine which of the requested languages can be fulfilled based on your available languages. The goal is to return a list of supported languages that match the user's preferences, maintaining the user's preference order.
The problem is divided into four progressive parts, each adding more complexity to the language matching logic:
Exact Matching: Match language tags exactly (e.g., "en-US" matches only "en-US")
Prefix Matching: Support generic language codes that match specific variants (e.g., "en" matches "en-US", "en-GB")
Wildcard Support: Handle the wildcard "*" which means "any other language"
Quality Factors: Parse and respect weighted preferences using q-factors (e.g., "en;q=0.8")
There are multiple parts to this problem -- ask the interviewer how many parts there are to better manage your time
Start with the simplest matching logic and extend it as parts get more complex
Write your own test cases and ensure your code compiles and runs correctly
Your function receives two parameters:
Accept-Language Header (string): A comma-separated list of language tags, e.g. "en-US, fr-CA, fr-FR"
Supported Languages (list): Languages your server can provide, e.g. ["en-US", "fr-FR", "es-ES"]
The returned languages should be ordered by the user's preference (same order as in the header).
Implement a function parse_accept_language(accept_header, supported_languages) that returns the intersection of requested and supported languages, maintaining the preference order from the header.
At this stage, only exact string matches count. A language tag like "en-US" will only match if the server supports exactly "en-US".
Extend your function to support generic language codes without region specifiers. A tag like "en" should match any English variant supported by the server ("en-US", "en-GB", "en-CA", etc.).
Language tags follow the pattern language-REGION where the language code comes before the hyphen (e.g., "en", "fr") and the region code comes after (e.g., "US", "CA", "GB"). A generic tag contains only the language code without a region (e.g., "en", "fr").
Parse q-factors from language tags (format: language;q=value)
Default q-factor is 1.0 for entries without explicit weights
Sort results by q-factor in descending order
Use stable sort: maintain original order for languages with equal q-factors
Handle q=0 as lowest priority