Chinese Notes

From MusicBrainz Wiki
Revision as of 20:41, 2 June 2009 by Foolip (talk | contribs) (→‎Punctuation and spacing: list of punctuation)
Jump to navigationJump to search

Notes on Chinese artists and releases in MusicBrainz

These are some notes about open issues and tips to editors/voters concerning the Chinese artists and releases in MusicBrainz. See also:

Please contribute and comment at will!

Romanization systems

Wikipedia covers romanization of Chinese in great depth, but a quick summary is in order.

Hanyu Pinyin (漢語拼音/汉语拼音) is the romanization system used in mainland China and is also an ISO standard for romanization of Mandarin Chinese. Students of Chinese will typically learn this system which probably why virtually all transliterated releases in MusicBrainz use it.

Mandarin Chinese is also the official language on Taiwan and since 2009 Hanyu Pinyin is the official romanization system. For political and historical reasons a mixture of systems are used, including Wades-Giles, MPS2, Tongyong and Hanyu Pinyin. However, person names are usually based on Wade-Giles spellings, which is what you will see in the sort names of Taiwanese artists. More details on Wikipedia.

Cantonese-speaking artists (Hong Kong, Macao, Guangdong) will typically not use any of the systems mentioned above, but rather a romanization based on Cantonese pronounciation.

Wikipedia's list of common Chinese surnames provides a very good overview of common family names and their romanization using different systems.

Futhermore, there are different standards for spacing and capitalization of romanized names. In mainland China the given name is typically written without spacing. In Taiwan and Hong Kong however, two-syllable given names are usually written with a hyphen between each syllable, sometimes also capitalizing the second.

It is also common for artists to choose their own "English" name or a non-standard romanization, so some should be taken before changing the sort names of Chinese artists with which you are not familiar.

Traditional and simplified Chinese

The current text search does no conversion between traditional and simplified Chinese, which has caused many duplicate artists to be entered (can be fixed with ArtistAlias) and lookup of releases with Picard to fail if they are in the wrong script. No bugs have been filed for this issue, which would be a first step.

Style issues

Punctuation and spacing

Classic Chinese uses neither punctuation nor spacing, but modern Chinese has adopted the common punctuation from Latin scripts. However, they are usually used in their full-width forms.

Half-width , . ? ! : ; ( )
Full-width

Which form is used is currently very inconsistent in the database and depends on the preference of the editor.

There is also some inconsistency in how extra title information is formatted:

  1. 標題(某某版) [full-width brackets]
  2. 標題(某某版) [half-width brackets]
  3. 標題 (某某版) [half-width brackets with leading space]

Extra Title Information Style only states that such must information "must be entered in parentheses after the Main Title", which is true of all three above formats.

For reference, here is all punctuation that can be encoded in Big5 and GB2312 (but excluding ASCII). If some punctuation symbol is not in the list it's a good indication that it's foreign to Chinese and probably should be avoided in Chinese release/track titles.

Big5 GB2312 Unicode
· Yes No U+00B7 MIDDLE DOT
Yes No U+2013 EN DASH
Yes No U+2014 EM DASH
No Yes U+2015 HORIZONTAL BAR
No Yes U+2016 DOUBLE VERTICAL LINE
Yes Yes U+2018 LEFT SINGLE QUOTATION MARK
Yes Yes U+2019 RIGHT SINGLE QUOTATION MARK
Yes Yes U+201C LEFT DOUBLE QUOTATION MARK
Yes Yes U+201D RIGHT DOUBLE QUOTATION MARK
Yes No U+2022 BULLET
Yes No U+2025 TWO DOT LEADER
Yes Yes U+2026 HORIZONTAL ELLIPSIS
No Yes U+2030 PER MILLE SIGN
Yes Yes U+2032 PRIME
No Yes U+2033 DOUBLE PRIME
Yes No U+2035 REVERSED PRIME
Yes Yes U+203B REFERENCE MARK
Yes No U+203E OVERLINE
Yes Yes U+3001 IDEOGRAPHIC COMMA
Yes Yes U+3002 IDEOGRAPHIC FULL STOP
Yes Yes U+3003 DITTO MARK
Yes Yes U+3008 LEFT ANGLE BRACKET
Yes Yes U+3009 RIGHT ANGLE BRACKET
Yes Yes U+300A LEFT DOUBLE ANGLE BRACKET
Yes Yes U+300B RIGHT DOUBLE ANGLE BRACKET
Yes Yes U+300C LEFT CORNER BRACKET
Yes Yes U+300D RIGHT CORNER BRACKET
Yes Yes U+300E LEFT WHITE CORNER BRACKET
Yes Yes U+300F RIGHT WHITE CORNER BRACKET
Yes Yes U+3010 LEFT BLACK LENTICULAR BRACKET
Yes Yes U+3011 RIGHT BLACK LENTICULAR BRACKET
Yes Yes U+3014 LEFT TORTOISE SHELL BRACKET
Yes Yes U+3015 RIGHT TORTOISE SHELL BRACKET
No Yes U+3016 LEFT WHITE LENTICULAR BRACKET
No Yes U+3017 RIGHT WHITE LENTICULAR BRACKET
Yes No U+301D REVERSED DOUBLE PRIME QUOTATION MARK
Yes No U+301E DOUBLE PRIME QUOTATION MARK
No Yes U+30FB KATAKANA MIDDLE DOT
Yes No U+FE30 PRESENTATION FORM FOR VERTICAL TWO DOT LEADER
Yes No U+FE31 PRESENTATION FORM FOR VERTICAL EM DASH
Yes No U+FE33 PRESENTATION FORM FOR VERTICAL LOW LINE
Yes No U+FE34 PRESENTATION FORM FOR VERTICAL WAVY LOW LINE
Yes No U+FE35 PRESENTATION FORM FOR VERTICAL LEFT PARENTHESIS
Yes No U+FE36 PRESENTATION FORM FOR VERTICAL RIGHT PARENTHESIS
Yes No U+FE37 PRESENTATION FORM FOR VERTICAL LEFT CURLY BRACKET
Yes No U+FE38 PRESENTATION FORM FOR VERTICAL RIGHT CURLY BRACKET
Yes No U+FE39 PRESENTATION FORM FOR VERTICAL LEFT TORTOISE SHELL BRACKET
Yes No U+FE3A PRESENTATION FORM FOR VERTICAL RIGHT TORTOISE SHELL BRACKET
Yes No U+FE3B PRESENTATION FORM FOR VERTICAL LEFT BLACK LENTICULAR BRACKET
Yes No U+FE3C PRESENTATION FORM FOR VERTICAL RIGHT BLACK LENTICULAR BRACKET
Yes No U+FE3D PRESENTATION FORM FOR VERTICAL LEFT DOUBLE ANGLE BRACKET
Yes No U+FE3E PRESENTATION FORM FOR VERTICAL RIGHT DOUBLE ANGLE BRACKET
︿ Yes No U+FE3F PRESENTATION FORM FOR VERTICAL LEFT ANGLE BRACKET
Yes No U+FE40 PRESENTATION FORM FOR VERTICAL RIGHT ANGLE BRACKET
Yes No U+FE41 PRESENTATION FORM FOR VERTICAL LEFT CORNER BRACKET
Yes No U+FE42 PRESENTATION FORM FOR VERTICAL RIGHT CORNER BRACKET
Yes No U+FE43 PRESENTATION FORM FOR VERTICAL LEFT WHITE CORNER BRACKET
Yes No U+FE44 PRESENTATION FORM FOR VERTICAL RIGHT WHITE CORNER BRACKET
Yes No U+FE49 DASHED OVERLINE
Yes No U+FE4A CENTRELINE OVERLINE
Yes No U+FE4B WAVY OVERLINE
Yes No U+FE4C DOUBLE WAVY OVERLINE
Yes No U+FE4D DASHED LOW LINE
Yes No U+FE4E CENTRELINE LOW LINE
Yes No U+FE4F WAVY LOW LINE
Yes No U+FE50 SMALL COMMA
Yes No U+FE52 SMALL FULL STOP
Yes No U+FE54 SMALL SEMICOLON
Yes No U+FE55 SMALL COLON
Yes No U+FE56 SMALL QUESTION MARK
Yes No U+FE57 SMALL EXCLAMATION MARK
Yes No U+FE59 SMALL LEFT PARENTHESIS
Yes No U+FE5A SMALL RIGHT PARENTHESIS
Yes No U+FE5B SMALL LEFT CURLY BRACKET
Yes No U+FE5C SMALL RIGHT CURLY BRACKET
Yes No U+FE5D SMALL LEFT TORTOISE SHELL BRACKET
Yes No U+FE5E SMALL RIGHT TORTOISE SHELL BRACKET
Yes No U+FE5F SMALL NUMBER SIGN
Yes No U+FE60 SMALL AMPERSAND
Yes No U+FE61 SMALL ASTERISK
Yes No U+FE63 SMALL HYPHEN-MINUS
Yes No U+FE6A SMALL PERCENT SIGN
Yes No U+FE6B SMALL COMMERCIAL AT
Yes Yes U+FF01 FULLWIDTH EXCLAMATION MARK
No Yes U+FF02 FULLWIDTH QUOTATION MARK
Yes Yes U+FF03 FULLWIDTH NUMBER SIGN
Yes Yes U+FF05 FULLWIDTH PERCENT SIGN
Yes Yes U+FF06 FULLWIDTH AMPERSAND
No Yes U+FF07 FULLWIDTH APOSTROPHE
Yes Yes U+FF08 FULLWIDTH LEFT PARENTHESIS
Yes Yes U+FF09 FULLWIDTH RIGHT PARENTHESIS
Yes Yes U+FF0A FULLWIDTH ASTERISK
Yes Yes U+FF0C FULLWIDTH COMMA
Yes Yes U+FF0D FULLWIDTH HYPHEN-MINUS
Yes Yes U+FF0E FULLWIDTH FULL STOP
Yes Yes U+FF0F FULLWIDTH SOLIDUS
Yes Yes U+FF1A FULLWIDTH COLON
Yes Yes U+FF1B FULLWIDTH SEMICOLON
Yes Yes U+FF1F FULLWIDTH QUESTION MARK
Yes Yes U+FF20 FULLWIDTH COMMERCIAL AT
No Yes U+FF3B FULLWIDTH LEFT SQUARE BRACKET
Yes Yes U+FF3C FULLWIDTH REVERSE SOLIDUS
No Yes U+FF3D FULLWIDTH RIGHT SQUARE BRACKET
_ Yes Yes U+FF3F FULLWIDTH LOW LINE
Yes Yes U+FF5B FULLWIDTH LEFT CURLY BRACKET
Yes Yes U+FF5D FULLWIDTH RIGHT CURLY BRACKET
Yes No U+FF64 HALFWIDTH IDEOGRAPHIC COMMA

Artist collaborations

Featuring Artist Style requires the use of "藝人甲 (feat. 藝人乙)" for the typical featured artist case. This format is not completely alien to Chinese and does appear on some covers, but there are still some entries in the database using the "藝人甲 (藝人乙合唱)" format. I've been slightly hesitant to change them since the AR:s already show the relationship with great clarity and because "合唱" means "sing together" and thus implies vocal performance. This is a fairly minor issue, but should be put right in the future.

Featuring Artist Style does not clearly state how collaborations between 3 or more artists should be formatted. The de facto standard "Artist A, Artist B & Artist C" is seldom used for Chinese artists, for various reasons. This issue has been discussed on the style mailing list.

The collaboration artists in question:

Traditional Chinese Music

It's not obvious how Classical Style Guide should be applied to traditional Chinese music. The use of English and Latin script in an otherwise Chinese context is sub-optimal. These are some releases with traditional Chinese music which may have style issues:


Needs Intertwingling Bad WikiName