![Page 1: Language Tags and Locale Identifiers A Status Report](https://reader035.vdocuments.site/reader035/viewer/2022062511/551466f9550346414e8b5bd8/html5/thumbnails/1.jpg)
Language Tagsand Locale Identifiers
A Status Report
![Page 2: Language Tags and Locale Identifiers A Status Report](https://reader035.vdocuments.site/reader035/viewer/2022062511/551466f9550346414e8b5bd8/html5/thumbnails/2.jpg)
Presenter and Agenda
Addison Phillips
Internationalization Architect, Yahoo! Co-Editor, Language Tag Registry Update (LTRU)
Working Group (RFC 3066bis, draft-matching)
Language tags Locale identifiers
![Page 3: Language Tags and Locale Identifiers A Status Report](https://reader035.vdocuments.site/reader035/viewer/2022062511/551466f9550346414e8b5bd8/html5/thumbnails/3.jpg)
Languages? Locales?
What’s a language tag?
What the #@&%$ is a locale?
Why do identifiers matter?
![Page 4: Language Tags and Locale Identifiers A Status Report](https://reader035.vdocuments.site/reader035/viewer/2022062511/551466f9550346414e8b5bd8/html5/thumbnails/4.jpg)
Language Tags
Enable presentation, selection, and negotiation of content
Defined by BCP 47– Widely used! XML, HTML, RSS, MIME, SOAP,
SMTP, LDAP, CSS, XSL, CCXML, Java, C#, ASP, perl……….
– Well understood (?)
![Page 5: Language Tags and Locale Identifiers A Status Report](https://reader035.vdocuments.site/reader035/viewer/2022062511/551466f9550346414e8b5bd8/html5/thumbnails/5.jpg)
Locale Identifiers
Different ideas:– Accept-Locale vs. Accept-Language– URIs/URNs, etc.– CLDR/LDML
And Requirements:– Operating environments and harmonization– App Servers– Web Services
New Solution? Cost of Adoption:– UTF-8 to the browser: 8 long years
![Page 6: Language Tags and Locale Identifiers A Status Report](https://reader035.vdocuments.site/reader035/viewer/2022062511/551466f9550346414e8b5bd8/html5/thumbnails/6.jpg)
In the Beginning
Received Wisdom from the Dark Ages Locales:
– japanese, french, german, C– ENU, FRA, JPN– ja_JP.PCK– AMERICAN_AMERICA.WE8ISO8859P1
Languages…… looked a lot like locales (and vice
versa)
![Page 7: Language Tags and Locale Identifiers A Status Report](https://reader035.vdocuments.site/reader035/viewer/2022062511/551466f9550346414e8b5bd8/html5/thumbnails/7.jpg)
Locales and Language Tags meet
Conversations in Prague…– Language tags are being
locale identifiers anyway…– Not going to need a big
new thing…– Just a few things to fix…
… we can do this really fast
![Page 8: Language Tags and Locale Identifiers A Status Report](https://reader035.vdocuments.site/reader035/viewer/2022062511/551466f9550346414e8b5bd8/html5/thumbnails/8.jpg)
BCP 47 Basic Structure
Alphanumeric (ASCII only) subtags Up to eight characters long Separated by hyphens Case not important (i.e. zh = ZH = zH = Zh)
1*8alphanum * [ “-” 1*8 alphanum ]
![Page 9: Language Tags and Locale Identifiers A Status Report](https://reader035.vdocuments.site/reader035/viewer/2022062511/551466f9550346414e8b5bd8/html5/thumbnails/9.jpg)
RFC 1766
zh-TW
ISO
63
9-1
(alp
ha2
)
ISO
31
66 (a
lpha2)
i-klingoni-klingonR
egiste
red
valu
e
![Page 10: Language Tags and Locale Identifiers A Status Report](https://reader035.vdocuments.site/reader035/viewer/2022062511/551466f9550346414e8b5bd8/html5/thumbnails/10.jpg)
RFC 3066
sco-GB
ISO
63
9-2
(alp
ha 3
codes)
But use…
enengg-GB-GBalpha 2 codes when they exist
X
![Page 11: Language Tags and Locale Identifiers A Status Report](https://reader035.vdocuments.site/reader035/viewer/2022062511/551466f9550346414e8b5bd8/html5/thumbnails/11.jpg)
Problems
Script Variation:– zh-Hant/zh-Hans– (sr-Cyrl/sr-Latn, az-Arab/az-Latn/az-Cyrl, etc.)
Obsolence of registrations:– art-lojban (now jbo), i-klingon (now tlh)
Instability in underlying standards:– sr-CS (CS used to be Czechoslovakia…
![Page 12: Language Tags and Locale Identifiers A Status Report](https://reader035.vdocuments.site/reader035/viewer/2022062511/551466f9550346414e8b5bd8/html5/thumbnails/12.jpg)
And More Problems
Lack of scripts Little support for registered values in software Reassignment of values by ISO 3166 Lack of consistent tag formation (Chinese dialects?) Standards not readily available, bad references Bad implementation assumptions
– 1*8 alphanum *[ “-” 1*8 alphanum]– 2*3 ALPHA [ “-” 2ALPHA ]
Many registrations to cover small variations– 8 German registrations to cover two variations
![Page 13: Language Tags and Locale Identifiers A Status Report](https://reader035.vdocuments.site/reader035/viewer/2022062511/551466f9550346414e8b5bd8/html5/thumbnails/13.jpg)
LTRU and “draft-registry”
Defines a generative syntax – machine readable– future proof, extensible
Defines a single source– Stable subtags, no conflicts– Machine readable
Defines when to use subtags– (sometimes)
![Page 14: Language Tags and Locale Identifiers A Status Report](https://reader035.vdocuments.site/reader035/viewer/2022062511/551466f9550346414e8b5bd8/html5/thumbnails/14.jpg)
RFC 3066bis and LTRU
sl-Latn-IT-rozaj-x-mine
ISO
63
9-1
/2 (a
lpha2/3
)
ISO
15
924 scrip
t codes
(alp
ha 4
)
ISO
31
66 (a
lpha2) o
r UN
M
49
Registe
red v
aria
nts (a
ny
num
ber)
Priv
ate
Use
and
Exte
nsio
n
![Page 15: Language Tags and Locale Identifiers A Status Report](https://reader035.vdocuments.site/reader035/viewer/2022062511/551466f9550346414e8b5bd8/html5/thumbnails/15.jpg)
More Examples
es-419 (Spanish for Americas) en-US (English for USA) de-CH-1996 (Old tags are all valid) sl-rozaj-nedis (Multiple variants) zh-t-wadegile (Extensions)
![Page 16: Language Tags and Locale Identifiers A Status Report](https://reader035.vdocuments.site/reader035/viewer/2022062511/551466f9550346414e8b5bd8/html5/thumbnails/16.jpg)
Benefits
Subtag registry in one place: one source. Subtags identified by length/content Extensible Compatible with RFC 3066 tags Stable: subtags are forever
![Page 17: Language Tags and Locale Identifiers A Status Report](https://reader035.vdocuments.site/reader035/viewer/2022062511/551466f9550346414e8b5bd8/html5/thumbnails/17.jpg)
Problems
Matching– Does “en-US” match “en-Latn-US”?
Tag Choices– Users have more to choose from.
Implementations– More to do, more to think about– (easier to parse, process, support the good stuff)
![Page 18: Language Tags and Locale Identifiers A Status Report](https://reader035.vdocuments.site/reader035/viewer/2022062511/551466f9550346414e8b5bd8/html5/thumbnails/18.jpg)
Tag Matching
Uses “Language Ranges” in a “Language Priority List” to select sets of content according to the language tag
Four Schemes– Basic Filtering– Extended Filtering– Scored Filtering– Lookup
![Page 19: Language Tags and Locale Identifiers A Status Report](https://reader035.vdocuments.site/reader035/viewer/2022062511/551466f9550346414e8b5bd8/html5/thumbnails/19.jpg)
Filtering
Ranges specify the least specific item – “en” matches “en”, “en-US”, “en-Brai”, “en-boont”
Basic matching uses plain prefixes Extended matching can match “inside bits”
– “en-*-US”
![Page 20: Language Tags and Locale Identifiers A Status Report](https://reader035.vdocuments.site/reader035/viewer/2022062511/551466f9550346414e8b5bd8/html5/thumbnails/20.jpg)
Scored Filtering
Assigns a “weight” or “score” to each match Result set is ordered by match quality
Postulated by John Cowan
![Page 21: Language Tags and Locale Identifiers A Status Report](https://reader035.vdocuments.site/reader035/viewer/2022062511/551466f9550346414e8b5bd8/html5/thumbnails/21.jpg)
Lookup
Range specifies the most specific tag in a match.– “en-US” matches “en” and “en-US” but not “en-
US-boont”
Mirrors the locale fallback mechanism and many language negotiation schemes.
![Page 22: Language Tags and Locale Identifiers A Status Report](https://reader035.vdocuments.site/reader035/viewer/2022062511/551466f9550346414e8b5bd8/html5/thumbnails/22.jpg)
What Do I Do (Content Author)?
Not much.– Existing tags are all still valid: tagging is mostly
unchanged.– Resist temptation to (ab)use the private use
subtags. Unless your language has script variations:
– Tag content with the appropriate script subtag(s) Script subtags only apply to a small number of
languages: “zh”, “sr”, “uz”, “az”, “mn”, and a very small number of others.
![Page 23: Language Tags and Locale Identifiers A Status Report](https://reader035.vdocuments.site/reader035/viewer/2022062511/551466f9550346414e8b5bd8/html5/thumbnails/23.jpg)
What Do I Do (Programmer)?
Check code for compliance with 3066bis– Decide on well-formed or validating– Implement suppress-script– Change to using the registry– Bother infrastructure folks (Java, MS, Mozilla, etc)
to implement the standard
![Page 24: Language Tags and Locale Identifiers A Status Report](https://reader035.vdocuments.site/reader035/viewer/2022062511/551466f9550346414e8b5bd8/html5/thumbnails/24.jpg)
What Do I Do (End-User)?
Check and update your language ranges. Tag content wisely.
![Page 25: Language Tags and Locale Identifiers A Status Report](https://reader035.vdocuments.site/reader035/viewer/2022062511/551466f9550346414e8b5bd8/html5/thumbnails/25.jpg)
LTRU Milestone Dates
(Done) RFC 3066bis – Registry went live in December 2005
Produce “Matching” RFC– Draft-11 available (WG Last Call started…
Monday)
(Anticipated) Produce RFC 3066ter– This includes ISO 639-3 support, extended
language subtags, and possibly ISO 639-6
![Page 26: Language Tags and Locale Identifiers A Status Report](https://reader035.vdocuments.site/reader035/viewer/2022062511/551466f9550346414e8b5bd8/html5/thumbnails/26.jpg)
Things to Read
Registry Drafthttp://www.inter-locale.com
http://www.ietf.org/internet-drafts/draft-ietf-ltru-registry-12.txt
Matching Drafthttp://www.inter-locale.com
LTRU Mailing Listhttps://www1.ietf.org/mailman/listinfo/ltru
![Page 27: Language Tags and Locale Identifiers A Status Report](https://reader035.vdocuments.site/reader035/viewer/2022062511/551466f9550346414e8b5bd8/html5/thumbnails/27.jpg)
Things to Do (languages)
Get involved in LTRU Get involved in W3C I18N Core WG! Write implementations Work on adoption of 3066bis: understand the
impact
Then get involved with Locale identifiers…
![Page 28: Language Tags and Locale Identifiers A Status Report](https://reader035.vdocuments.site/reader035/viewer/2022062511/551466f9550346414e8b5bd8/html5/thumbnails/28.jpg)
Back to Locales…
IUC 20 Round Table Suzanne Topping’s
Multilingual Article Tex Texin and the Locales
list…
![Page 29: Language Tags and Locale Identifiers A Status Report](https://reader035.vdocuments.site/reader035/viewer/2022062511/551466f9550346414e8b5bd8/html5/thumbnails/29.jpg)
Locale Identifiers and Web Services
![Page 30: Language Tags and Locale Identifiers A Status Report](https://reader035.vdocuments.site/reader035/viewer/2022062511/551466f9550346414e8b5bd8/html5/thumbnails/30.jpg)
W3C and Unicode
W3C– Identifiers and cross-over with language tags– Web services– XML, HTML
Unicode Consortium– LDML– CLDR– Standards for content
![Page 31: Language Tags and Locale Identifiers A Status Report](https://reader035.vdocuments.site/reader035/viewer/2022062511/551466f9550346414e8b5bd8/html5/thumbnails/31.jpg)
“Language Tags and Locale Identifiers” SPEC
First Working Draft coming soon– URIs?– Simple tags?
![Page 32: Language Tags and Locale Identifiers A Status Report](https://reader035.vdocuments.site/reader035/viewer/2022062511/551466f9550346414e8b5bd8/html5/thumbnails/32.jpg)
WS-I18N SPEC
First Working Draft now available:– http://www.w3.org/TR/ws-i18n
![Page 33: Language Tags and Locale Identifiers A Status Report](https://reader035.vdocuments.site/reader035/viewer/2022062511/551466f9550346414e8b5bd8/html5/thumbnails/33.jpg)
Ideas?