History:Release Transliteration And Translation

From MusicBrainz Wiki
Status: This page describes a failed proposal. It is not official, and should only be used, if at all, as the basis for a new proposal.



Proposal number: RFC-Unassigned
Champion: None
Status: Failed, due to Officially closed as Abandoned, March 24, 2010
This proposal was not tracked in Trac.



Summary: This is a proposal to add a human-language to human-language track title transliteration relationship which works on Release units.

Attention.png Status: NeedsIntertwingling; Please EditMercilessly. Rephrase, don't append

Proposal

Add an AdvancedRelationshipType that relates an release in one language to another release in another language to represent translations and transliterations (TranslationTransliterationRelationshipType).

Definitions

Translation (t9n)
means that a text in one language (e.g. English-US, Japanese, etc) is interpreted to produce an equivalent text in another language. Words, that have meaning in language A are translated into words that have meaning in language B.
Transliteration (t13n)
means that characters and phonetic sounds from one alphabet are converted into the scripting conventions of another, so that the sound is roughly the same. This can include a change of script (e.g. katakana to Latin), but does not necessarily so (a French name can be transliterated to German). A transliteration can occur within a language that use multiple scripts (e.g. kanji to katakana for Japanese or Latin to Braille for English)

This is an interim measure for the current MB database and not intended to be a permanent solution. It does not solve the Artist name transliteration problem.

Advantage(s):

  • The user's assertion that this release is related to this release can be stored.
  • When updating the MB database schema in a way that will truly support i18n, the information in the relationships between Release data can be retrieved and automaitcally imported.
  • (Allows the user to choose which release info when extracting from the database. Actually this functionality already exists when the Artist is the same.)

Disadvantage(s):

  • Information grouped on Albums such as DiscIDs is not automatically related. Solving this would either need
    • Duplication of data, i.e. storing the DiscID in multiple albums (how bad is this? NadelnderBambus will support this).
    • The presentation software on the MB server side needs to collect and show shared DiscIDs, etc parsing the relationships.
  • If e.g. an English and a Japanese release of the same release get transliterated to Cyrillic, you will have to break the DontMakeRelationshipClusters rule.
  • Validity?
  • What to do with unique Release info such as release dates or even Artist?
  • How to publish this capability to a MB database contributor?

Amendments

Amendment 1: Add a "official" or "unofficial" status to this relationship. "This is an 'official' transliteration." or "This is an 'unofficial' translation."

Advantage(s):

  • Validity (but how is validity concretely defined?)

Disadvantage(s):

  • Is an unmarked Release more or less official than an "unofficial" Release?

Amendment 2: Add a language and script identifier to each end of the relationship. "This is an 'English, Latin to English, katakana' transliteration." or "This is a 'Japanese, Kanji & Kana to English, Latin' translation."

Advantage(s):

  • Human language can be identified for filtering purposes, thus people preferring a certain language can recieve the appropriate transliteration if available. (future server functionality?)

Disadvantage(s):

  • Some may find the indentifiers in transliteration relationships confusing as the language does not change, only the script.

Background: Discussion on the Mailing List

This was discussed on the UsersMailingList as Duplicate albums for transliteration. It is currently (2005-12) discussed on the StyleMailingList as Cyrrilic.

Some Relevant Points

  • Recently there was some discussion about how to deal with Kate Bush's Aerial which contains a track named <pi>. To appease clients that can't deal with unicode, it looks as though it was decided to create two identical versions of this release, one with the track named the symbol <pi> (shouldn't that be capital <Pi>?) and one named "Pi". Beside this the two albums are identical. In fact, there is a third release representing the Japanese release with the same discid as well but with Japanese naming. This approach seems to me to be an unmaintainable solution. You end up with redundant (and unnormalized) data that need to be maintained in parallel...
  • People will not like this, since arbitrary scipts and laguages can be stored in the db. Fact is, however, that the db cannot deal with the relationships between such transliterations in a proper way.
  • Could you elaborate on this? I was thinking that there could be an release-unit relationship entity called "xxx is a transliteration of yyy" that could be added in the mean time. It would help to identify data in future improvements. Does it make sense?
  • Albums are often not pure transliterations or translations in the real world. The Japanese entry for the 2 disc Ariel release mentioned above is ~67% transliterated, ~22% translated, and ~11% untouched.