REGEXVAULTv2.0
Localization/Locale & Language
Verified Safe

BCP 47 / IETF Language Tag Regex for Python

/^[a-zA-Z]{2,3}(?:-[a-zA-Z]{4})?(?:-(?:[A-Z]{2}|[0-9]{3}))?$/

What this pattern does

This page provides a well-structured, multi-part regular expression for matching bcp 47 / ietf language tag, ported and verified for Python. A rigorously tested regex reduces debugging time and protects your application from edge-case failures. The snippet below is ready to drop into your Python project — whether you're validating in a Django view, a FastAPI endpoint, or a standalone data processing script.

Python Implementation

Python
# BCP 47 / IETF Language Tag
# ReDoS-safe | RegexVault — Localization > Locale & Language

import re

bcp_47_ietf_language_tag_pattern = re.compile(r'^[a-zA-Z]{2,3}(?:-[a-zA-Z]{4})?(?:-(?:[A-Z]{2}|[0-9]{3}))?$')

def validate_bcp_47_ietf_language_tag(value: str) -> bool:
    return bool(bcp_47_ietf_language_tag_pattern.fullmatch(value))

# Example
print(validate_bcp_47_ietf_language_tag("en"))  # True

Test Cases

Matches (Valid)
Rejects (Invalid)
enenglish
zh-CNEN_US
zh-Hantzh-ABCDE
en-USen-US-
zh-Hant-TWzh-TW-Hant
sr-Latn-RSen--US
ar-001

When to use this pattern

This pattern is drawn from the Localization > Locale & Language category and carries a ReDoS-safe certification. That matters for Python developers because particularly important in Python web servers where CPU-bound regex operations can stall concurrent request handling. RegexVault audits patterns against known backtracking attack vectors, ensuring you have the necessary context before using this regex in a high-stakes production environment.

Common Pitfalls

zh-CN is not necessarily Simplified Chinese script — it means Chinese as used in China, which is conventionally Simplified. Use zh-Hans for explicit Simplified script. Unicode CLDR uses BCP 47.

Technical Notes

Language subtag examples: en=English, zh=Chinese, es=Spanish. Script subtags: Hant=Traditional Chinese, Hans=Simplified Chinese, Latn=Latin. Region subtags: US, CN, TW (ISO 3166-1 alpha-2). Use ISO 639-1 (2-char) or ISO 639-2 (3-char) codes.

Have a pattern that belongs in the vault?

Submit it for review — community-verified patterns get credited to your GitHub handle. Free submissions join the queue. Priority review available for $15.

Submit a Pattern