language Code

From MusicBrainz Wiki
Revision as of 15:39, 15 March 2009 by Nikki (talk | contribs)

Language Codes, Scripts, and Associated Country Codes

The following table lists the ISO 639 two letter language codes (or three letter codes where no two-letter code exists) along with associated ISO 15924 script code(s), ISO 3166 country code(s), and charset encodings.

  • Note that ISO 639-2 defines two sets of three-letter codes: bibliographic and terminologic. The bibliographic codes resemble Z39.53 and the names of languages in English; the terminologic codes resemble the two-letter codes and the names languages have for themselves. There are 22 languages where these codes differ, see [http:#639note below] for a complete list; in every case there is a 2-letter code for the language. We should pick one of these two sets and standardize on it. Other metadata projects, like Dublin Core, suggest that the "bibliographic set is preferred for metadata because of its widespread use in bibliographic agencies." On the other hand, the terminologic set would be less confusing due to its similarity with the two-letter codes. I would suggest that MusicBrainz should use the terminologic (ISO 639-2/T) codes only. @alex

In the interest of brevity and relevance, the complete ISO 639 list is not used. The following categories are omitted:

  • Historical variants of modern languages (e.g. Old French, Ancient Greek)
  • Historical languages without modern pronounciation (e.g. Linear B, Phoenician, Egyptian)
  • Sign languages (for obvious reasons, this is MusicBrainz)
  • Catch-all codes (with "(Other)")
  • Languages written only in scripts not representable with Unicode

Furthermore, to be listed, a language has to have at least one of the following:

  • A Wikipedia with more than 100 articles
  • GNU / Linux locale for perror()
  • Windows or Macintosh language code
  • Google localization (ISO 639 code required; no h4x0r or swedish chef)
ar Arabic Arab AE BH DZ EG IQ JO KW LB LY MA MR OM PS SA SD SY TN YE 8859-6
en English Latn GB US + many others ASCII 8859-1 8859-*
gr Greek Grek GR 8859-7
he Hebrew Hebr IL windows-1255 8859-8 8859-8-i
jp Japanese Hani+Hrkt JP Shift-JIS EUC-JP Big5 UTF-8
kr Korean Hang+Hani KD KR EUC-KR UTF-8
ru Russian Cyrl RU KOI8-R windows-1251 8859-5
tlh Klingon Latn   ASCII
th Thai Thai TH 8859-11
zh Chinese Hant HK TW GB18030 UTF-8
zh Chinese Hans CN HK Big5 UTF-8

(this is incomplete, but intended to demonstrate the table contents)


 [1] Differences between ISO 639-2/B and 639-2/T: 
Bibl Term 639-1 Language
alb sqi sq Albanian
arm hye hy Armenian
baq eus eu Basque
bur mya my Burmese
chi zho zh Chinese
cze ces cs Czech
dut nld nl Dutch
fre fra fr French
geo kat ka Georgian
ger deu de German
gre ell el Greek
ice isl is Icelandic
mac mkd mk Macedonian
mao mri mi Maori
may msa ms Malay
per fas fa Persian
rum ron ro Romanian
scc srp sr Serbian
scr hrv hr Croatian
slo slk sk Slovak
tib bod bo Tibetan
wel cym cy Welsh