Difference between revisions of "Chinese Notes"

From MusicBrainz Wiki
(Notes on Chinese artists and releases in MusicBrainz)
(Punctuation and spacing: list of punctuation)
Line 33: Line 33:
  
 
Classic Chinese uses neither punctuation nor spacing, but modern Chinese has adopted the common punctuation from Latin scripts. However, they are usually used in their full-width forms.  
 
Classic Chinese uses neither punctuation nor spacing, but modern Chinese has adopted the common punctuation from Latin scripts. However, they are usually used in their full-width forms.  
{| border="1"
+
{|
 
|-
 
|-
 
| '''Half-width''' || , || . || ? || ! || : || ; || ( || )
 
| '''Half-width''' || , || . || ? || ! || : || ; || ( || )
Line 47: Line 47:
 
# 標題 (某某版) [half-width brackets with leading space]  
 
# 標題 (某某版) [half-width brackets with leading space]  
  
[[Extra Title Information Style]] only states that such must information "must be entered in parentheses after the [[Main Title]]", which is true of all three above formats.  
+
[[Extra Title Information Style]] only states that such must information "must be entered in parentheses after the [[Main Title]]", which is true of all three above formats.
 +
 
 +
For reference, here is all punctuation that can be encoded in Big5 and GB2312 (but excluding ASCII). If some punctuation symbol is not in the list it's a good indication that it's foreign to Chinese and probably should be avoided in Chinese release/track titles.
 +
 
 +
{|
 +
|+
 +
| || '''Big5''' || '''GB2312''' || '''Unicode'''
 +
|-
 +
|·||Yes||No||U+00B7 MIDDLE DOT
 +
|-
 +
|–||Yes||No||U+2013 EN DASH
 +
|-
 +
|—||Yes||No||U+2014 EM DASH
 +
|-
 +
|―||No||Yes||U+2015 HORIZONTAL BAR
 +
|-
 +
|‖||No||Yes||U+2016 DOUBLE VERTICAL LINE
 +
|-
 +
|‘||Yes||Yes||U+2018 LEFT SINGLE QUOTATION MARK
 +
|-
 +
|’||Yes||Yes||U+2019 RIGHT SINGLE QUOTATION MARK
 +
|-
 +
|“||Yes||Yes||U+201C LEFT DOUBLE QUOTATION MARK
 +
|-
 +
|”||Yes||Yes||U+201D RIGHT DOUBLE QUOTATION MARK
 +
|-
 +
|•||Yes||No||U+2022 BULLET
 +
|-
 +
|‥||Yes||No||U+2025 TWO DOT LEADER
 +
|-
 +
|…||Yes||Yes||U+2026 HORIZONTAL ELLIPSIS
 +
|-
 +
|‰||No||Yes||U+2030 PER MILLE SIGN
 +
|-
 +
|′||Yes||Yes||U+2032 PRIME
 +
|-
 +
|″||No||Yes||U+2033 DOUBLE PRIME
 +
|-
 +
|‵||Yes||No||U+2035 REVERSED PRIME
 +
|-
 +
|※||Yes||Yes||U+203B REFERENCE MARK
 +
|-
 +
|‾||Yes||No||U+203E OVERLINE
 +
|-
 +
|、||Yes||Yes||U+3001 IDEOGRAPHIC COMMA
 +
|-
 +
|。||Yes||Yes||U+3002 IDEOGRAPHIC FULL STOP
 +
|-
 +
|〃||Yes||Yes||U+3003 DITTO MARK
 +
|-
 +
|〈||Yes||Yes||U+3008 LEFT ANGLE BRACKET
 +
|-
 +
|〉||Yes||Yes||U+3009 RIGHT ANGLE BRACKET
 +
|-
 +
|《||Yes||Yes||U+300A LEFT DOUBLE ANGLE BRACKET
 +
|-
 +
|》||Yes||Yes||U+300B RIGHT DOUBLE ANGLE BRACKET
 +
|-
 +
|「||Yes||Yes||U+300C LEFT CORNER BRACKET
 +
|-
 +
|」||Yes||Yes||U+300D RIGHT CORNER BRACKET
 +
|-
 +
|『||Yes||Yes||U+300E LEFT WHITE CORNER BRACKET
 +
|-
 +
|』||Yes||Yes||U+300F RIGHT WHITE CORNER BRACKET
 +
|-
 +
|【||Yes||Yes||U+3010 LEFT BLACK LENTICULAR BRACKET
 +
|-
 +
|】||Yes||Yes||U+3011 RIGHT BLACK LENTICULAR BRACKET
 +
|-
 +
|〔||Yes||Yes||U+3014 LEFT TORTOISE SHELL BRACKET
 +
|-
 +
|〕||Yes||Yes||U+3015 RIGHT TORTOISE SHELL BRACKET
 +
|-
 +
|〖||No||Yes||U+3016 LEFT WHITE LENTICULAR BRACKET
 +
|-
 +
|〗||No||Yes||U+3017 RIGHT WHITE LENTICULAR BRACKET
 +
|-
 +
|〝||Yes||No||U+301D REVERSED DOUBLE PRIME QUOTATION MARK
 +
|-
 +
|〞||Yes||No||U+301E DOUBLE PRIME QUOTATION MARK
 +
|-
 +
|・||No||Yes||U+30FB KATAKANA MIDDLE DOT
 +
|-
 +
|︰||Yes||No||U+FE30 PRESENTATION FORM FOR VERTICAL TWO DOT LEADER
 +
|-
 +
|︱||Yes||No||U+FE31 PRESENTATION FORM FOR VERTICAL EM DASH
 +
|-
 +
|︳||Yes||No||U+FE33 PRESENTATION FORM FOR VERTICAL LOW LINE
 +
|-
 +
|︴||Yes||No||U+FE34 PRESENTATION FORM FOR VERTICAL WAVY LOW LINE
 +
|-
 +
|︵||Yes||No||U+FE35 PRESENTATION FORM FOR VERTICAL LEFT PARENTHESIS
 +
|-
 +
|︶||Yes||No||U+FE36 PRESENTATION FORM FOR VERTICAL RIGHT PARENTHESIS
 +
|-
 +
|︷||Yes||No||U+FE37 PRESENTATION FORM FOR VERTICAL LEFT CURLY BRACKET
 +
|-
 +
|︸||Yes||No||U+FE38 PRESENTATION FORM FOR VERTICAL RIGHT CURLY BRACKET
 +
|-
 +
|︹||Yes||No||U+FE39 PRESENTATION FORM FOR VERTICAL LEFT TORTOISE SHELL BRACKET
 +
|-
 +
|︺||Yes||No||U+FE3A PRESENTATION FORM FOR VERTICAL RIGHT TORTOISE SHELL BRACKET
 +
|-
 +
|︻||Yes||No||U+FE3B PRESENTATION FORM FOR VERTICAL LEFT BLACK LENTICULAR BRACKET
 +
|-
 +
|︼||Yes||No||U+FE3C PRESENTATION FORM FOR VERTICAL RIGHT BLACK LENTICULAR BRACKET
 +
|-
 +
|︽||Yes||No||U+FE3D PRESENTATION FORM FOR VERTICAL LEFT DOUBLE ANGLE BRACKET
 +
|-
 +
|︾||Yes||No||U+FE3E PRESENTATION FORM FOR VERTICAL RIGHT DOUBLE ANGLE BRACKET
 +
|-
 +
|︿||Yes||No||U+FE3F PRESENTATION FORM FOR VERTICAL LEFT ANGLE BRACKET
 +
|-
 +
|﹀||Yes||No||U+FE40 PRESENTATION FORM FOR VERTICAL RIGHT ANGLE BRACKET
 +
|-
 +
|﹁||Yes||No||U+FE41 PRESENTATION FORM FOR VERTICAL LEFT CORNER BRACKET
 +
|-
 +
|﹂||Yes||No||U+FE42 PRESENTATION FORM FOR VERTICAL RIGHT CORNER BRACKET
 +
|-
 +
|﹃||Yes||No||U+FE43 PRESENTATION FORM FOR VERTICAL LEFT WHITE CORNER BRACKET
 +
|-
 +
|﹄||Yes||No||U+FE44 PRESENTATION FORM FOR VERTICAL RIGHT WHITE CORNER BRACKET
 +
|-
 +
|﹉||Yes||No||U+FE49 DASHED OVERLINE
 +
|-
 +
|﹊||Yes||No||U+FE4A CENTRELINE OVERLINE
 +
|-
 +
|﹋||Yes||No||U+FE4B WAVY OVERLINE
 +
|-
 +
|﹌||Yes||No||U+FE4C DOUBLE WAVY OVERLINE
 +
|-
 +
|﹍||Yes||No||U+FE4D DASHED LOW LINE
 +
|-
 +
|﹎||Yes||No||U+FE4E CENTRELINE LOW LINE
 +
|-
 +
|﹏||Yes||No||U+FE4F WAVY LOW LINE
 +
|-
 +
|﹐||Yes||No||U+FE50 SMALL COMMA
 +
|-
 +
|﹒||Yes||No||U+FE52 SMALL FULL STOP
 +
|-
 +
|﹔||Yes||No||U+FE54 SMALL SEMICOLON
 +
|-
 +
|﹕||Yes||No||U+FE55 SMALL COLON
 +
|-
 +
|﹖||Yes||No||U+FE56 SMALL QUESTION MARK
 +
|-
 +
|﹗||Yes||No||U+FE57 SMALL EXCLAMATION MARK
 +
|-
 +
|﹙||Yes||No||U+FE59 SMALL LEFT PARENTHESIS
 +
|-
 +
|﹚||Yes||No||U+FE5A SMALL RIGHT PARENTHESIS
 +
|-
 +
|﹛||Yes||No||U+FE5B SMALL LEFT CURLY BRACKET
 +
|-
 +
|﹜||Yes||No||U+FE5C SMALL RIGHT CURLY BRACKET
 +
|-
 +
|﹝||Yes||No||U+FE5D SMALL LEFT TORTOISE SHELL BRACKET
 +
|-
 +
|﹞||Yes||No||U+FE5E SMALL RIGHT TORTOISE SHELL BRACKET
 +
|-
 +
|﹟||Yes||No||U+FE5F SMALL NUMBER SIGN
 +
|-
 +
|﹠||Yes||No||U+FE60 SMALL AMPERSAND
 +
|-
 +
|﹡||Yes||No||U+FE61 SMALL ASTERISK
 +
|-
 +
|﹣||Yes||No||U+FE63 SMALL HYPHEN-MINUS
 +
|-
 +
|﹪||Yes||No||U+FE6A SMALL PERCENT SIGN
 +
|-
 +
|﹫||Yes||No||U+FE6B SMALL COMMERCIAL AT
 +
|-
 +
|!||Yes||Yes||U+FF01 FULLWIDTH EXCLAMATION MARK
 +
|-
 +
|"||No||Yes||U+FF02 FULLWIDTH QUOTATION MARK
 +
|-
 +
|#||Yes||Yes||U+FF03 FULLWIDTH NUMBER SIGN
 +
|-
 +
|%||Yes||Yes||U+FF05 FULLWIDTH PERCENT SIGN
 +
|-
 +
|&||Yes||Yes||U+FF06 FULLWIDTH AMPERSAND
 +
|-
 +
|'||No||Yes||U+FF07 FULLWIDTH APOSTROPHE
 +
|-
 +
|(||Yes||Yes||U+FF08 FULLWIDTH LEFT PARENTHESIS
 +
|-
 +
|)||Yes||Yes||U+FF09 FULLWIDTH RIGHT PARENTHESIS
 +
|-
 +
|*||Yes||Yes||U+FF0A FULLWIDTH ASTERISK
 +
|-
 +
|,||Yes||Yes||U+FF0C FULLWIDTH COMMA
 +
|-
 +
|-||Yes||Yes||U+FF0D FULLWIDTH HYPHEN-MINUS
 +
|-
 +
|.||Yes||Yes||U+FF0E FULLWIDTH FULL STOP
 +
|-
 +
|/||Yes||Yes||U+FF0F FULLWIDTH SOLIDUS
 +
|-
 +
|:||Yes||Yes||U+FF1A FULLWIDTH COLON
 +
|-
 +
|;||Yes||Yes||U+FF1B FULLWIDTH SEMICOLON
 +
|-
 +
|?||Yes||Yes||U+FF1F FULLWIDTH QUESTION MARK
 +
|-
 +
|@||Yes||Yes||U+FF20 FULLWIDTH COMMERCIAL AT
 +
|-
 +
|[||No||Yes||U+FF3B FULLWIDTH LEFT SQUARE BRACKET
 +
|-
 +
|\||Yes||Yes||U+FF3C FULLWIDTH REVERSE SOLIDUS
 +
|-
 +
|]||No||Yes||U+FF3D FULLWIDTH RIGHT SQUARE BRACKET
 +
|-
 +
|_||Yes||Yes||U+FF3F FULLWIDTH LOW LINE
 +
|-
 +
|{||Yes||Yes||U+FF5B FULLWIDTH LEFT CURLY BRACKET
 +
|-
 +
|}||Yes||Yes||U+FF5D FULLWIDTH RIGHT CURLY BRACKET
 +
|-
 +
|、||Yes||No||U+FF64 HALFWIDTH IDEOGRAPHIC COMMA
 +
|}
  
 
===Artist collaborations===
 
===Artist collaborations===

Revision as of 20:41, 2 June 2009

Notes on Chinese artists and releases in MusicBrainz

These are some notes about open issues and tips to editors/voters concerning the Chinese artists and releases in MusicBrainz. See also:

Please contribute and comment at will!

Romanization systems

Wikipedia covers romanization of Chinese in great depth, but a quick summary is in order.

Hanyu Pinyin (漢語拼音/汉语拼音) is the romanization system used in mainland China and is also an ISO standard for romanization of Mandarin Chinese. Students of Chinese will typically learn this system which probably why virtually all transliterated releases in MusicBrainz use it.

Mandarin Chinese is also the official language on Taiwan and since 2009 Hanyu Pinyin is the official romanization system. For political and historical reasons a mixture of systems are used, including Wades-Giles, MPS2, Tongyong and Hanyu Pinyin. However, person names are usually based on Wade-Giles spellings, which is what you will see in the sort names of Taiwanese artists. More details on Wikipedia.

Cantonese-speaking artists (Hong Kong, Macao, Guangdong) will typically not use any of the systems mentioned above, but rather a romanization based on Cantonese pronounciation.

Wikipedia's list of common Chinese surnames provides a very good overview of common family names and their romanization using different systems.

Futhermore, there are different standards for spacing and capitalization of romanized names. In mainland China the given name is typically written without spacing. In Taiwan and Hong Kong however, two-syllable given names are usually written with a hyphen between each syllable, sometimes also capitalizing the second.

It is also common for artists to choose their own "English" name or a non-standard romanization, so some should be taken before changing the sort names of Chinese artists with which you are not familiar.

Traditional and simplified Chinese

The current text search does no conversion between traditional and simplified Chinese, which has caused many duplicate artists to be entered (can be fixed with ArtistAlias) and lookup of releases with Picard to fail if they are in the wrong script. No bugs have been filed for this issue, which would be a first step.

Style issues

Punctuation and spacing

Classic Chinese uses neither punctuation nor spacing, but modern Chinese has adopted the common punctuation from Latin scripts. However, they are usually used in their full-width forms.

Half-width , . ? ! : ; ( )
Full-width

Which form is used is currently very inconsistent in the database and depends on the preference of the editor.

There is also some inconsistency in how extra title information is formatted:

  1. 標題(某某版) [full-width brackets]
  2. 標題(某某版) [half-width brackets]
  3. 標題 (某某版) [half-width brackets with leading space]

Extra Title Information Style only states that such must information "must be entered in parentheses after the Main Title", which is true of all three above formats.

For reference, here is all punctuation that can be encoded in Big5 and GB2312 (but excluding ASCII). If some punctuation symbol is not in the list it's a good indication that it's foreign to Chinese and probably should be avoided in Chinese release/track titles.

Big5 GB2312 Unicode
· Yes No U+00B7 MIDDLE DOT
Yes No U+2013 EN DASH
Yes No U+2014 EM DASH
No Yes U+2015 HORIZONTAL BAR
No Yes U+2016 DOUBLE VERTICAL LINE
Yes Yes U+2018 LEFT SINGLE QUOTATION MARK
Yes Yes U+2019 RIGHT SINGLE QUOTATION MARK
Yes Yes U+201C LEFT DOUBLE QUOTATION MARK
Yes Yes U+201D RIGHT DOUBLE QUOTATION MARK
Yes No U+2022 BULLET
Yes No U+2025 TWO DOT LEADER
Yes Yes U+2026 HORIZONTAL ELLIPSIS
No Yes U+2030 PER MILLE SIGN
Yes Yes U+2032 PRIME
No Yes U+2033 DOUBLE PRIME
Yes No U+2035 REVERSED PRIME
Yes Yes U+203B REFERENCE MARK
Yes No U+203E OVERLINE
Yes Yes U+3001 IDEOGRAPHIC COMMA
Yes Yes U+3002 IDEOGRAPHIC FULL STOP
Yes Yes U+3003 DITTO MARK
Yes Yes U+3008 LEFT ANGLE BRACKET
Yes Yes U+3009 RIGHT ANGLE BRACKET
Yes Yes U+300A LEFT DOUBLE ANGLE BRACKET
Yes Yes U+300B RIGHT DOUBLE ANGLE BRACKET
Yes Yes U+300C LEFT CORNER BRACKET
Yes Yes U+300D RIGHT CORNER BRACKET
Yes Yes U+300E LEFT WHITE CORNER BRACKET
Yes Yes U+300F RIGHT WHITE CORNER BRACKET
Yes Yes U+3010 LEFT BLACK LENTICULAR BRACKET
Yes Yes U+3011 RIGHT BLACK LENTICULAR BRACKET
Yes Yes U+3014 LEFT TORTOISE SHELL BRACKET
Yes Yes U+3015 RIGHT TORTOISE SHELL BRACKET
No Yes U+3016 LEFT WHITE LENTICULAR BRACKET
No Yes U+3017 RIGHT WHITE LENTICULAR BRACKET
Yes No U+301D REVERSED DOUBLE PRIME QUOTATION MARK
Yes No U+301E DOUBLE PRIME QUOTATION MARK
No Yes U+30FB KATAKANA MIDDLE DOT
Yes No U+FE30 PRESENTATION FORM FOR VERTICAL TWO DOT LEADER
Yes No U+FE31 PRESENTATION FORM FOR VERTICAL EM DASH
Yes No U+FE33 PRESENTATION FORM FOR VERTICAL LOW LINE
Yes No U+FE34 PRESENTATION FORM FOR VERTICAL WAVY LOW LINE
Yes No U+FE35 PRESENTATION FORM FOR VERTICAL LEFT PARENTHESIS
Yes No U+FE36 PRESENTATION FORM FOR VERTICAL RIGHT PARENTHESIS
Yes No U+FE37 PRESENTATION FORM FOR VERTICAL LEFT CURLY BRACKET
Yes No U+FE38 PRESENTATION FORM FOR VERTICAL RIGHT CURLY BRACKET
Yes No U+FE39 PRESENTATION FORM FOR VERTICAL LEFT TORTOISE SHELL BRACKET
Yes No U+FE3A PRESENTATION FORM FOR VERTICAL RIGHT TORTOISE SHELL BRACKET
Yes No U+FE3B PRESENTATION FORM FOR VERTICAL LEFT BLACK LENTICULAR BRACKET
Yes No U+FE3C PRESENTATION FORM FOR VERTICAL RIGHT BLACK LENTICULAR BRACKET
Yes No U+FE3D PRESENTATION FORM FOR VERTICAL LEFT DOUBLE ANGLE BRACKET
Yes No U+FE3E PRESENTATION FORM FOR VERTICAL RIGHT DOUBLE ANGLE BRACKET
︿ Yes No U+FE3F PRESENTATION FORM FOR VERTICAL LEFT ANGLE BRACKET
Yes No U+FE40 PRESENTATION FORM FOR VERTICAL RIGHT ANGLE BRACKET
Yes No U+FE41 PRESENTATION FORM FOR VERTICAL LEFT CORNER BRACKET
Yes No U+FE42 PRESENTATION FORM FOR VERTICAL RIGHT CORNER BRACKET
Yes No U+FE43 PRESENTATION FORM FOR VERTICAL LEFT WHITE CORNER BRACKET
Yes No U+FE44 PRESENTATION FORM FOR VERTICAL RIGHT WHITE CORNER BRACKET
Yes No U+FE49 DASHED OVERLINE
Yes No U+FE4A CENTRELINE OVERLINE
Yes No U+FE4B WAVY OVERLINE
Yes No U+FE4C DOUBLE WAVY OVERLINE
Yes No U+FE4D DASHED LOW LINE
Yes No U+FE4E CENTRELINE LOW LINE
Yes No U+FE4F WAVY LOW LINE
Yes No U+FE50 SMALL COMMA
Yes No U+FE52 SMALL FULL STOP
Yes No U+FE54 SMALL SEMICOLON
Yes No U+FE55 SMALL COLON
Yes No U+FE56 SMALL QUESTION MARK
Yes No U+FE57 SMALL EXCLAMATION MARK
Yes No U+FE59 SMALL LEFT PARENTHESIS
Yes No U+FE5A SMALL RIGHT PARENTHESIS
Yes No U+FE5B SMALL LEFT CURLY BRACKET
Yes No U+FE5C SMALL RIGHT CURLY BRACKET
Yes No U+FE5D SMALL LEFT TORTOISE SHELL BRACKET
Yes No U+FE5E SMALL RIGHT TORTOISE SHELL BRACKET
Yes No U+FE5F SMALL NUMBER SIGN
Yes No U+FE60 SMALL AMPERSAND
Yes No U+FE61 SMALL ASTERISK
Yes No U+FE63 SMALL HYPHEN-MINUS
Yes No U+FE6A SMALL PERCENT SIGN
Yes No U+FE6B SMALL COMMERCIAL AT
Yes Yes U+FF01 FULLWIDTH EXCLAMATION MARK
No Yes U+FF02 FULLWIDTH QUOTATION MARK
Yes Yes U+FF03 FULLWIDTH NUMBER SIGN
Yes Yes U+FF05 FULLWIDTH PERCENT SIGN
Yes Yes U+FF06 FULLWIDTH AMPERSAND
No Yes U+FF07 FULLWIDTH APOSTROPHE
Yes Yes U+FF08 FULLWIDTH LEFT PARENTHESIS
Yes Yes U+FF09 FULLWIDTH RIGHT PARENTHESIS
Yes Yes U+FF0A FULLWIDTH ASTERISK
Yes Yes U+FF0C FULLWIDTH COMMA
Yes Yes U+FF0D FULLWIDTH HYPHEN-MINUS
Yes Yes U+FF0E FULLWIDTH FULL STOP
Yes Yes U+FF0F FULLWIDTH SOLIDUS
Yes Yes U+FF1A FULLWIDTH COLON
Yes Yes U+FF1B FULLWIDTH SEMICOLON
Yes Yes U+FF1F FULLWIDTH QUESTION MARK
Yes Yes U+FF20 FULLWIDTH COMMERCIAL AT
No Yes U+FF3B FULLWIDTH LEFT SQUARE BRACKET
Yes Yes U+FF3C FULLWIDTH REVERSE SOLIDUS
No Yes U+FF3D FULLWIDTH RIGHT SQUARE BRACKET
_ Yes Yes U+FF3F FULLWIDTH LOW LINE
Yes Yes U+FF5B FULLWIDTH LEFT CURLY BRACKET
Yes Yes U+FF5D FULLWIDTH RIGHT CURLY BRACKET
Yes No U+FF64 HALFWIDTH IDEOGRAPHIC COMMA

Artist collaborations

Featuring Artist Style requires the use of "藝人甲 (feat. 藝人乙)" for the typical featured artist case. This format is not completely alien to Chinese and does appear on some covers, but there are still some entries in the database using the "藝人甲 (藝人乙合唱)" format. I've been slightly hesitant to change them since the AR:s already show the relationship with great clarity and because "合唱" means "sing together" and thus implies vocal performance. This is a fairly minor issue, but should be put right in the future.

Featuring Artist Style does not clearly state how collaborations between 3 or more artists should be formatted. The de facto standard "Artist A, Artist B & Artist C" is seldom used for Chinese artists, for various reasons. This issue has been discussed on the style mailing list.

The collaboration artists in question:

Traditional Chinese Music

It's not obvious how Classical Style Guide should be applied to traditional Chinese music. The use of English and Latin script in an otherwise Chinese context is sub-optimal. These are some releases with traditional Chinese music which may have style issues:


Needs Intertwingling Bad WikiName