REGEXVAULTv2.0
Localization/Locale & Language
Verified Safe

BCP 47 / IETF Language Tag Regex for PHP

/^[a-zA-Z]{2,3}(?:-[a-zA-Z]{4})?(?:-(?:[A-Z]{2}|[0-9]{3}))?$/

What this pattern does

This page provides a well-structured, multi-part regular expression for matching bcp 47 / ietf language tag, ported and verified for PHP. A rigorously tested regex reduces debugging time and protects your application from edge-case failures. The snippet below is ready to drop into your PHP project — whether you're validating in a Laravel validator, a WordPress plugin, or a standalone PHP script.

Php Implementation

Php
<?php
// BCP 47 / IETF Language Tag
// ReDoS-safe | RegexVault — Localization > Locale & Language

define('BCP_47_IETF_LANGUAGE_TAG_PATTERN', '/^[a-zA-Z]{2,3}(?:-[a-zA-Z]{4})?(?:-(?:[A-Z]{2}|[0-9]{3}))?$/');

function validate_bcp_47_ietf_language_tag(string $input): bool {
    return (bool) preg_match(BCP_47_IETF_LANGUAGE_TAG_PATTERN, $input);
}

// Example
var_dump(validate_bcp_47_ietf_language_tag("en")); // bool(true)

Test Cases

Matches (Valid)
Rejects (Invalid)
enenglish
zh-CNEN_US
zh-Hantzh-ABCDE
en-USen-US-
zh-Hant-TWzh-TW-Hant
sr-Latn-RSen--US
ar-001

When to use this pattern

This pattern is drawn from the Localization > Locale & Language category and carries a ReDoS-safe certification. That matters for PHP developers because especially relevant in PHP where PCRE backtracking limits can trigger silent failures on malicious input. RegexVault audits patterns against known backtracking attack vectors, ensuring you have the necessary context before using this regex in a high-stakes production environment.

Common Pitfalls

zh-CN is not necessarily Simplified Chinese script — it means Chinese as used in China, which is conventionally Simplified. Use zh-Hans for explicit Simplified script. Unicode CLDR uses BCP 47.

Technical Notes

Language subtag examples: en=English, zh=Chinese, es=Spanish. Script subtags: Hant=Traditional Chinese, Hans=Simplified Chinese, Latn=Latin. Region subtags: US, CN, TW (ISO 3166-1 alpha-2). Use ISO 639-1 (2-char) or ISO 639-2 (3-char) codes.

Have a pattern that belongs in the vault?

Submit it for review — community-verified patterns get credited to your GitHub handle. Free submissions join the queue. Priority review available for $15.

Submit a Pattern