History:Release Transliteration And Translation: Difference between revisions

From MusicBrainz Wiki
Jump to navigationJump to search
(reworking the whole page (Imported from MoinMoin))
 
(correct link (Imported from MoinMoin))
(8 intermediate revisions by the same user not shown)
Line 1: Line 1:
'''Summary:''' ''This is a proposal to add a human-language to human-language track title transliteration relationship which works on Album units.''
'''Summary:''' ''This is a proposal to add a human-language to human-language track title transliteration relationship which works on Release units.''

<ul><li style="list-style-type:none">''Please [[Edit Mercilessly|EditMercilessly]]. Rephrase, don't append.''
[[Image:Attention.png]] '''Status:''' [[Needs Intertwingling|NeedsIntertwingling]]; Please [[Edit Mercilessly|EditMercilessly]]. Rephrase, don't append
</ul>


==Proposal==
==Proposal==


Add an [[Advanced Relationship Type|AdvancedRelationshipType]] that relates an album in one language to another album in another language to represent translations and transliterations ([[Translation And Transliteration Relationship Type|TranslationAndTransliterationRelationshipType]]).
Add an [[Advanced Relationship Type|AdvancedRelationshipType]] that relates an release in one language to another release in another language to represent translations and transliterations ([[Translation Transliteration Relationship Type|TranslationTransliterationRelationshipType]]).


===Definitions===
===Definitions===
<dl><dt>Translation (t9n)

Are these actually correct? <dl><dt>Translation (t9n)
<dd>
<dd>


means that one human-oriented natural language (e.g. English-US, Japanese, etc) is translated to another. Words, that have ''meaning'' in language A are translated into words that have meaning in language B.
means that a text in one language (e.g. English-US, Japanese, etc) is interpreted to produce an equivalent text in another language. Words, that have ''meaning'' in language A are translated into words that have meaning in language B.
<dt>Transliteration (t13n)
<dt>Transliteration (t13n)
<dd>
<dd>


means that words which have meaning in the original language are transliterated into the scripting conventions or another language, so that the ''sound'' is roughly the same. This can include a change of script (e.g. Katakana to Latin), but does not necessarily so (a French name can be transliterated to German).
means that characters and phonetic sounds from one alphabet are converted into the scripting conventions of another, so that the ''sound'' is roughly the same. This can include a change of script (e.g. katakana to Latin), but does not necessarily so (a French name can be transliterated to German). A transliteration can occur within a language that use multiple scripts (e.g. kanji to katakana for Japanese or Latin to Braille for English)
</dl>
</dl>


This is a interimmeasure for the current MB database and not intended to be a permanent solution. It does not solve the Artist name transliteration problem.
This is an interim measure for the current MB database and not intended to be a permanent solution. It does not solve the Artist name transliteration problem.


===Advantage(s):===
===Advantage(s):===


* The user's assertion that this album is related to this album can be stored.
* The user's assertion that this release is related to this release can be stored.
* When updating the MB database schema in a way that will truly support i18n, the information in the relationships between Album data can be retrieved and automaitcally imported.
* When updating the MB database schema in a way that will truly support i18n, the information in the relationships between Release data can be retrieved and automaitcally imported.
* (Allows the user to choose which album info when extracting from the database. Actually this functionality already exists when the Artist is the same.)
* (Allows the user to choose which release info when extracting from the database. Actually this functionality already exists when the Artist is the same.)


===Disadvantage(s):===
===Disadvantage(s):===


* Information grouped on Albums such as [[Disc I Ds|DiscIDs]] is not automatically related. Solving this woul deither need
* Information grouped on Albums such as [[Disc ID|DiscID]]s is not automatically related. Solving this would either need
** Duplication of data, i.e. storing the [[Disc ID|DiscID]] in multiple albums (how bad is this? [[Nadelnder Bambus|NadelnderBambus]] will support this).
** Duplication of data, i.e. storing the [[Disc ID|DiscID]] in multiple albums (how bad is this? [[Nadelnder Bambus|NadelnderBambus]] will support this).
** The presentation software on the MB server side needs to collect and show shared DiscIDs, etc parsing the relationships.
** The presentation software on the MB server side needs to collect and show shared [[Disc ID|DiscID]]s, etc parsing the relationships.


* If e.g. an English and a Japanese release of the same album get transliterated to Cyrillic, you will have to break the [[Don't Make Relationship Clusters|DontMakeRelationshipClusters]] rule.
* If e.g. an English and a Japanese release of the same release get transliterated to Cyrillic, you will have to break the [[Don't Make Relationship Clusters|DontMakeRelationshipClusters]] rule.
* Validity?
* Validity?
* What to do with unique Album info such as release dates or even Artist?
* What to do with unique Release info such as release dates or even Artist?
* How to publish this capability to a MB database contributor?
* How to publish this capability to a MB database contributor?


==Amendments==
==Amendments==


Amendment 1: Add a "official" or "unofficial" status to this transliteration relationship. "This is a 'official' transliteration."
Amendment 1: Add a "official" or "unofficial" status to this relationship. "This is an 'official' transliteration." or "This is an 'unofficial' translation."


Advantage(s):
Advantage(s):
* Transliteration validity (but how is validity concretely defined?)
* Validity (but how is validity concretely defined?)


Disadvantage(s):
Disadvantage(s):
* Is an unmarked Album transliteration more or less official than an "unofficial" Album transliteration?
* Is an unmarked Release more or less official than an "unofficial" Release?


Amendment 2: Add a language identifier to this tranliteration relationship. "This is a 'Japanese' transliteration."
Amendment 2: Add a language and script identifier to each end of the relationship. "This is an 'English, Latin to English, katakana' transliteration." or "This is a 'Japanese, Kanji & Kana to English, Latin' translation."


Advantage(s):
Advantage(s):
Line 54: Line 53:


Disadvantage(s):
Disadvantage(s):
* Some may find the indentifiers in transliteration relationships confusing as the language does not change, only the script.
* None?


==Background: Discussion on the Mailing List==
==Background: Discussion on the Mailing List==
Line 62: Line 61:
===Some Relevant Points===
===Some Relevant Points===


* Recently there was some discussion about how to deal with Kate Bush's [http://musicbrainz.org/album/f205627f-b70a-409d-adbe-66289b614e80.html Aerial] which contains a track named <pi>. To appease clients that can't deal with unicode, it looks as though it was decided to create two identical versions of this album, one with the track named the symbol <pi> (shouldn't that be capital <Pi>?) and one named "Pi". Beside this the two albums are identical. In fact, there is a third album representing the Japanese release with the same discid as well but with Japanese naming. This approach seems to me to be an unmaintainable solution. You end up with redundant (and unnormalized) data that need to be maintained in parallel...
* Recently there was some discussion about how to deal with Kate Bush's [http://musicbrainz.org/album/f205627f-b70a-409d-adbe-66289b614e80.html Aerial] which contains a track named <pi>. To appease clients that can't deal with unicode, it looks as though it was decided to create two identical versions of this release, one with the track named the symbol <pi> (shouldn't that be capital <Pi>?) and one named "Pi". Beside this the two albums are identical. In fact, there is a third release representing the Japanese release with the same discid as well but with Japanese naming. This approach seems to me to be an unmaintainable solution. You end up with redundant (and unnormalized) data that need to be maintained in parallel...
* People will not like this, since arbitrary scipts and laguages can be stored in the db. Fact is, however, that the db cannot deal with the relationships between such translitterations in a proper way.
* People will not like this, since arbitrary scipts and laguages can be stored in the db. Fact is, however, that the db cannot deal with the relationships between such transliterations in a proper way.
<ul><li style="list-style-type:none">Could you elaborate on this? I was thinking that there could be an album-unit relationship entity called "xxx is a transliteration of yyy" that could be added in the mean time. It would help to identify data in future improvements. Does it make sense?
<ul><li style="list-style-type:none">Could you elaborate on this? I was thinking that there could be an release-unit relationship entity called "xxx is a transliteration of yyy" that could be added in the mean time. It would help to identify data in future improvements. Does it make sense?
</ul>
</ul>
* Albums are often not pure transliterations or translations in the real world. The Japanese entry for the 2 disc Ariel release mentioned above is ~67% transliterated, ~22% translated, and ~11% untouched.


[[Category:To Be Reviewed]]
[[Category:To Be Reviewed]]

Revision as of 20:54, 4 November 2006

Summary: This is a proposal to add a human-language to human-language track title transliteration relationship which works on Release units.

Attention.png Status: NeedsIntertwingling; Please EditMercilessly. Rephrase, don't append

Proposal

Add an AdvancedRelationshipType that relates an release in one language to another release in another language to represent translations and transliterations (TranslationTransliterationRelationshipType).

Definitions

Translation (t9n)
means that a text in one language (e.g. English-US, Japanese, etc) is interpreted to produce an equivalent text in another language. Words, that have meaning in language A are translated into words that have meaning in language B.
Transliteration (t13n)
means that characters and phonetic sounds from one alphabet are converted into the scripting conventions of another, so that the sound is roughly the same. This can include a change of script (e.g. katakana to Latin), but does not necessarily so (a French name can be transliterated to German). A transliteration can occur within a language that use multiple scripts (e.g. kanji to katakana for Japanese or Latin to Braille for English)

This is an interim measure for the current MB database and not intended to be a permanent solution. It does not solve the Artist name transliteration problem.

Advantage(s):

  • The user's assertion that this release is related to this release can be stored.
  • When updating the MB database schema in a way that will truly support i18n, the information in the relationships between Release data can be retrieved and automaitcally imported.
  • (Allows the user to choose which release info when extracting from the database. Actually this functionality already exists when the Artist is the same.)

Disadvantage(s):

  • Information grouped on Albums such as DiscIDs is not automatically related. Solving this would either need
    • Duplication of data, i.e. storing the DiscID in multiple albums (how bad is this? NadelnderBambus will support this).
    • The presentation software on the MB server side needs to collect and show shared DiscIDs, etc parsing the relationships.
  • If e.g. an English and a Japanese release of the same release get transliterated to Cyrillic, you will have to break the DontMakeRelationshipClusters rule.
  • Validity?
  • What to do with unique Release info such as release dates or even Artist?
  • How to publish this capability to a MB database contributor?

Amendments

Amendment 1: Add a "official" or "unofficial" status to this relationship. "This is an 'official' transliteration." or "This is an 'unofficial' translation."

Advantage(s):

  • Validity (but how is validity concretely defined?)

Disadvantage(s):

  • Is an unmarked Release more or less official than an "unofficial" Release?

Amendment 2: Add a language and script identifier to each end of the relationship. "This is an 'English, Latin to English, katakana' transliteration." or "This is a 'Japanese, Kanji & Kana to English, Latin' translation."

Advantage(s):

  • Human language can be identified for filtering purposes, thus people preferring a certain language can recieve the appropriate transliteration if available. (future server functionality?)

Disadvantage(s):

  • Some may find the indentifiers in transliteration relationships confusing as the language does not change, only the script.

Background: Discussion on the Mailing List

This was discussed on the UsersMailingList as Duplicate albums for transliteration. It is currently (2005-12) discussed on the StyleMailingList as Cyrrilic.

Some Relevant Points

  • Recently there was some discussion about how to deal with Kate Bush's Aerial which contains a track named <pi>. To appease clients that can't deal with unicode, it looks as though it was decided to create two identical versions of this release, one with the track named the symbol <pi> (shouldn't that be capital <Pi>?) and one named "Pi". Beside this the two albums are identical. In fact, there is a third release representing the Japanese release with the same discid as well but with Japanese naming. This approach seems to me to be an unmaintainable solution. You end up with redundant (and unnormalized) data that need to be maintained in parallel...
  • People will not like this, since arbitrary scipts and laguages can be stored in the db. Fact is, however, that the db cannot deal with the relationships between such transliterations in a proper way.
  • Could you elaborate on this? I was thinking that there could be an release-unit relationship entity called "xxx is a transliteration of yyy" that could be added in the mean time. It would help to identify data in future improvements. Does it make sense?
  • Albums are often not pure transliterations or translations in the real world. The Japanese entry for the 2 disc Ariel release mentioned above is ~67% transliterated, ~22% translated, and ~11% untouched.