Revision as of 15:39, 15 March 2009

Language Codes, Scripts, and Associated Country Codes

The following table lists the ISO 639 two letter language codes (or three letter codes where no two-letter code exists) along with associated ISO 15924 script code(s), ISO 3166 country code(s), and charset encodings.

Note that ISO 639-2 defines two sets of three-letter codes: bibliographic and terminologic. The bibliographic codes resemble Z39.53 and the names of languages in English; the terminologic codes resemble the two-letter codes and the names languages have for themselves. There are 22 languages where these codes differ, see [http:#639note below] for a complete list; in every case there is a 2-letter code for the language. We should pick one of these two sets and standardize on it. Other metadata projects, like Dublin Core, suggest that the "bibliographic set is preferred for metadata because of its widespread use in bibliographic agencies." On the other hand, the terminologic set would be less confusing due to its similarity with the two-letter codes. I would suggest that MusicBrainz should use the terminologic (ISO 639-2/T) codes only. @alex

In the interest of brevity and relevance, the complete ISO 639 list is not used. The following categories are omitted:

Historical variants of modern languages (e.g. Old French, Ancient Greek)
Historical languages without modern pronounciation (e.g. Linear B, Phoenician, Egyptian)
Sign languages (for obvious reasons, this is MusicBrainz)
Catch-all codes (with "(Other)")
Languages written only in scripts not representable with Unicode

Furthermore, to be listed, a language has to have at least one of the following:

A Wikipedia with more than 100 articles
GNU / Linux locale for perror()
Windows or Macintosh language code
Google localization (ISO 639 code required; no h4x0r or swedish chef)

ar	Arabic	Arab	AE BH DZ EG IQ JO KW LB LY MA MR OM PS SA SD SY TN YE	8859-6
en	English	Latn	GB US + many others	ASCII 8859-1 8859-*
gr	Greek	Grek	GR	8859-7
he	Hebrew	Hebr	IL	windows-1255 8859-8 8859-8-i
jp	Japanese	Hani+Hrkt	JP	Shift-JIS EUC-JP Big5 UTF-8
kr	Korean	Hang+Hani	KD KR	EUC-KR UTF-8
ru	Russian	Cyrl	RU	KOI8-R windows-1251 8859-5
tlh	Klingon	Latn		ASCII
th	Thai	Thai	TH	8859-11
zh	Chinese	Hant	HK TW	GB18030 UTF-8
zh	Chinese	Hans	CN HK	Big5 UTF-8

(this is incomplete, but intended to demonstrate the table contents)

 [1] Differences between ISO 639-2/B and 639-2/T:

Bibl	Term	639-1	Language
alb	sqi	sq	Albanian
arm	hye	hy	Armenian
baq	eus	eu	Basque
bur	mya	my	Burmese
chi	zho	zh	Chinese
cze	ces	cs	Czech
dut	nld	nl	Dutch
fre	fra	fr	French
geo	kat	ka	Georgian
ger	deu	de	German
gre	ell	el	Greek
ice	isl	is	Icelandic
mac	mkd	mk	Macedonian
mao	mri	mi	Maori
may	msa	ms	Malay
per	fas	fa	Persian
rum	ron	ro	Romanian
scc	srp	sr	Serbian
scr	hrv	hr	Croatian
slo	slk	sk	Slovak
tib	bod	bo	Tibetan
wel	cym	cy	Welsh

@@ Line 96: / Line 96: @@
 | wel || cym || cy || Welsh
 |}
-----Author: [[User:Dupuy|@alex]]
 [[Category:To Be Reviewed]] [[Category:Documentation]]

language Code: Difference between revisions

Revision as of 15:39, 15 March 2009

Language Codes, Scripts, and Associated Country Codes

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Search

sites

Tools