REGEXVAULTv2.0
Localization/Locale & Language
Verified Safe

ICU Locale Identifier Regex for Java

/^[a-z]{2,3}(?:_(?:[A-Z]{2}|[0-9]{3})(?:_(?:[A-Z0-9]{2,8}))?)?$/

What this pattern does

This page provides a well-structured, multi-part regular expression for matching icu locale identifier, ported and verified for Java. A rigorously tested regex reduces debugging time and protects your application from edge-case failures. The snippet below is ready to drop into your Java project — whether you're validating in a Spring Boot controller, a Jakarta EE service, or a standalone utility class.

Java Implementation

Java
// ICU Locale Identifier
// ReDoS-safe | RegexVault — Localization > Locale & Language

import java.util.regex.Pattern;

public class IcuLocaleIdentifierValidator {
    private static final Pattern PATTERN =
        Pattern.compile("^[a-z]{2,3}(?:_(?:[A-Z]{2}|[0-9]{3})(?:_(?:[A-Z0-9]{2,8}))?)?$");

    public static boolean validate(String input) {
        return PATTERN.matcher(input).matches();
    }

    // Example
    public static void main(String[] args) {
        System.out.println(validate("en")); // true
    }
}

Test Cases

Matches (Valid)
Rejects (Invalid)
enen-US
en_USEN_US
zh_CNen_
zh_TW_US
pt_BRen_US_POSIX_extra
es_419

When to use this pattern

This pattern is drawn from the Localization > Locale & Language category and carries a ReDoS-safe certification. That matters for Java developers because critical in Java applications since the JVM regex engine uses backtracking and is susceptible to ReDoS without careful pattern design. RegexVault audits patterns against known backtracking attack vectors, ensuring you have the necessary context before using this regex in a high-stakes production environment.

Common Pitfalls

ICU underscore format and BCP 47 hyphen format are not interchangeable. Know which format your framework expects. Java's ResourceBundle uses underscores; HTML lang attribute uses hyphens.

Technical Notes

ICU uses underscores as separators (en_US) while BCP 47 uses hyphens (en-US). Java's Locale class uses underscore format. Unicode CLDR supports both via normalization. es_419 is Spanish for Latin America (UN area code 419).

Have a pattern that belongs in the vault?

Submit it for review — community-verified patterns get credited to your GitHub handle. Free submissions join the queue. Priority review available for $15.

Submit a Pattern