Internationalization: Difference between revisions

From MusicBrainz Wiki
Jump to navigationJump to search
(→‎Track times: Dropped as no localization system even Unicode CLDR seems to support this anyway)
(→‎Languages: Unlisting the unreviewed/inactive language pages (Esperanto/Finnish/Polish) - Feel free to (re)list any page active (again))
 
(47 intermediate revisions by 5 users not shown)
Line 1: Line 1:
This page is about '''internationalization''' of MetaBrainz projects, which means making the projects support different languages and work as expected in different regions.
=Internationalization (i18n) and Multiple Language Support=
The main section below is about '''translation''' of the projects’ user interface: being able to provide the software itself in different languages.
The rest is about more general issues, such as correctly handling data from different languages and different regions.


== Translation ==


This section is intended for translators of projects related to the MetaBrainz community.
==Overview==


=== Platform ===
One of the goals of MusicBrainz is to store information about music from all over the world, and since that music is written in many languages, support for those languages is essential. In the future, we also want people to be able to use MusicBrainz in ''any'' language, not just English, especially since the people who know the most about music in other languages are often native speakers of those languages.


We have a '''[https://translations.metabrainz.org dedicated translation platform]''' which you can use with your MusicBrainz account. It is hosted by the [https://weblate.org/about/ Weblate team], with a data processing agreement in conformity to the GDPR. Note that the legal terms are generic to all Weblate instances and that “purchase” isn’t available from MetaBrainz Weblate instance.
The decision to use Unicode for MusicBrainz was an important first step on the road to internationalization, and it has allowed entry of hundreds of [[International Artists]] with works in dozens of languages, but there remains much work to be done. The work of adapting software so that it can be used with different languages or in different regions is called '''internationalization''' (abbreviated as I18N), and translating it into each of those languages and regions is called '''localization''' (abbreviated as L10N). Both of these are substantial efforts, but the resources needed are different. Internationalization requires specialized understanding of aspects of many languages, but that is often easier to find than the native linguistic ability in non-Western languages needed for localization.


See the [https://docs.weblate.org/en/latest/user/translating.html Weblate user docs] for a general introduction on how to use the platform.
----


=== Projects ===
The following is a breakdown of the many issues for i18n and l10n by area. Issues that should have RFEs filed are marked with '''RFE ME''' and a note on the priority (low, med, high). Where RFEs have already been filed, they should be linked.


Each of the [https://translations.metabrainz.org/projects/ translation projects] has an “Info” tab linking to one of the below wiki page that gathers project-specific information for translators and, more generally, about internationalization for that project:
==Database==


* [[MusicBrainz_Server/Internationalization|MusicBrainz Server and data]]
Many of the most crucial issues for i18n are with the database schema, in order to support the additional data needed to properly localize artists, releases, etc. The localization itself is done by moderators, and can even be done to some extent without full i18n support in the database.
* [[MusicBrainz_Picard/Internationalization|MusicBrainz Picard]]
* [[MetaBrainz:MetaBrainz.org_Website/Internationalization|MetaBrainz.org website]]
* [[CritiqueBrainz:Internationalization|CritiqueBrainz]]


Note:
===Countries===
The following official MetaBrainz projects aren’t translatable at the moment: [[BookBrainz]], [[Cover Art Archive]], and [[ListenBrainz]].
The archived projects [[AcousticBrainz]] and [[MessyBrainz]] have never been (and won’t ever be) translatable.


Our platform isn’t limited to official projects maintained by the Foundation, and we also host translations for community projects. If you have a MetaBrainz-related project you want to make translatable, such as a script or extension, get in touch!
Releases currently record a country with a two-character country code from a fixed set of country codes; See [[Release/Country]] for details.


=== Languages ===
Country names are localizable as a separate resource in Transifex and updated from the MusicBrainz Server code repository when updating beta.
Localized names are displayed in the website for both display and editing.


Among [https://translations.metabrainz.org/languages/ all the languages we currently support on Weblate] a fair amount are likely to contain only old translations and not have any recent activity, and not all projects are available for all languages yet. You can find out more by clicking on the desired language; projects supporting the language will be listed on the “Projects” tab and activity data on the “Information” tab.
Only the documentation [[Release/Country]] is not localized for now; See[[jira:MBS-13109]] for follow-up.


If a language is not available at all or your language is not available for the project you want to translate, just ask for it on the [https://community.metabrainz.org/c/internationalization/21 “Internationalization” community forum category].
===Locales===


To help with coordinating translations in the same language, you can use Weblate [https://docs.weblate.org/en/latest/user/glossary.html glossaries] and [https://docs.weblate.org/en/latest/user/translating.html#comments comments], the MetaBrainz [https://community.metabrainz.org/ community forums] (see below [[#Questions or problems|questions or problems]]), and/or a wiki page (create one if missing):
Just as releases have countries associated with them, artists, aliases, and releases should have locales associated with them; this would be a way of capturing the language (and country & encoding variants) of the names and titles. The [http://www.openi18n.org/modules.php?op=modload&name=Sections&file=index&req=viewarticle&artid=46&page=1 Open I18N guidelines for locale names] should be used where possible: the basic format for standard locales is ''lc''<code><nowiki> '''_''' </nowiki></code>''CC''<code><nowiki> '''.''' </nowiki></code>''CSet'', where ''lc'' is an ISO 639 two letter [[Language Code|LanguageCode]] (three-letter codes may be used if no two-letter code exists), ''CC'' is an ISO 3166 two letter [[Country Code|CountryCode]], and ''CSet'' is an IANA registered preferred MIME encoding name, or if none is preferred, a standard name from [http://www.openi18n.org/docs/html/CodesetAliasTable-V10.html Open I18N Codeset Alias Table].


* [[Internationalization/French|French]]
Alternately, we could use the convention adopted for CSS (and other XML/HTML/HTTP?) of using hyphen ("-") as the separator for all components, instead of underscore ("_") and period ("."). The disadvantage of that form is that it doesn't allow you to omit leading components. Extending the Open I18N guidelines, both by allowing any of language/country/encoding to be omitted (if either of the second two components is omitted, their preceding separator would also be omitted) and perhaps to add another variant component, as noted below, adds some functionality that may be very useful.
* [[Internationalization/German|German]]
* [[Internationalization/Italian|Italian]]
* [[Internationalization/Spanish|Spanish]]


== Beyond translation ==
In most cases, the language code alone would be used, but there would be uses for country variants, e.g. for the group known ("en" as "Yazoo" but in the U.S. "en_US" as "Yaz"). Although it is not strictly speaking correct, simplified Chinese is often identified as "zh_CN" and traditional Chinese as "zh_TW" (although both are used outside of those regions); see [http:#zh below] for a discussion on Chinese languages and scripts.


=== [https://community.metabrainz.org Forums] ===
It may be that the best solution is to add more components for scripts and "dialects" (preceded by hyphen "-") so that you could have "zh-hant-guoyu_CN.UTF-8" to indicate a title in Mandarin (guoyu) using traditional (hant) Chinese characters, in the PRC, using UTF-8 encoding. But this could be overkill. On the other hand there are many languages which use multiple scripts (typically Latin "Latn", Cyrillic "Cyrl" or Arabic "Arab" - see the [http://www.iana.org/assignments/language-tags IANA language tags] for examples like Azerbaijani; there are others, like Moldovan, and many cases where very similar dialects (e.g. Hindi-Urdu, Serbo-Croatian) are divided mostly by use of different scripts.


[https://www.discourse.org/ Discourse]’s user interface is [https://meta.discourse.org/t/contribute-a-translation-to-discourse/14882 localized].
Encoding components could be used to identify misencoded alias names, e.g. "zh.BIG5" for a alias with Big5-misencoded Chinese; and could even be used to automatically generate misencoding aliases for artist names in common character encodings.


* The few topics written in a language other than English can use [https://community.metabrainz.org/tags tags] such as [https://community.metabrainz.org/tag/de <code>de</code>] for Deutsch (German). A separate selector is available and language names are localized using the [https://meta.discourse.org/t/multilingual-plugin/142740 Multilingual plugin].<br/>There is a [https://community.metabrainz.org/t/forums-internationalization/136199 topic] about making this practice official.
It might be desirable to have a fourth variant component (preceded by ":" or another character?) that could be used to identify multiple variants; it could be used to represent misspellings (e.g. "en:TYPO") or performance variants, for association with particular releases (e.g. a release could be marked "en:2" to get the second variant of the artist name, marked with the same locale). There's a [http://lists.musicbrainz.org/pipermail/musicbrainz-users/2004-November/018909.html discussion] of why this might be desirable on the [[Mailing List|MailingList]].
* Pinned welcome topics are not localized, the same topic as above is also about localizing welcome topics.
* Category titles are now localized using the Multilingual plugin.
* Some semi-static texts are not localized. It could be implemented in [https://github.com/paviliondev/discourse-multilingual Multilingual Plugin] if anyone feel up to coding for this Ruby project.
* Content is not translated. The [https://meta.discourse.org/t/discourse-translator/32630 Translator plugin] could be considered.
* The ListenBrainz plugin can be translated in the project [https://translations.metabrainz.org/projects/discourse-listenbrainz/ Discourse ListenBrainz] on the MetaBrainz Weblate translation platform.


=== [https://tickets.metabrainz.org Tickets] ===
Some possible examples for usage:
* ".UTF-8" (standard alias for any UTF-8 locale in absence of a more specific match; the preferred Artist Name might have this locale implicitly)
* ".ISO-8859-1" (Latin-1 representation)
* ".ASCII" (an ASCII representation without accents etc.)
* "en" (English name, typically for a non-English artist)
* "en_US" (Preferred name in USA, e.g. "Yaz")


[https://jira.atlassian.com Jira]’s user interface is localized.
Users could specify in preferences their preferred locale; it might also be possible to glean something from X-Accept-Languages: and similar headers in HTTP requests.


Note that the content doesn’t have to be localized since English is the work language for the MetaBrainz team.
===Artists===
(Tickets reported in another language would probably be translated in English and replied in both languages.)


=== [https://wiki.musicbrainz.org Wiki] ===
One of the most pressing needs is for i18n of artist names. Although all transliterations and translations can be supported by aliases, currently, only the "official" name is used for tagging ([[Artist Sort Name|ArtistSortName]]s are displayed but not yet tagged). Especially since tagging of non-latin names is poorly supported, the existing localization of artist names to Japanese and Chinese (or even Cyrillic) creates problems for other users, especially with [[Various Artists|VariousArtists]] release where one or two artists with non-latin names appear together with mostly western artists.


[[mw:MediaWiki|MediaWiki]]’s user interface is localized but not the content, which is generally assumed to be in English (with a few exceptions such as language translation project pages).
====ArtistNames====


We gave up using the [[mw:Content translation|Content translation]] tool due to incompatibilities at that time and inability to ensure consistency with other translations unless using [[mw:Extension:Translate|Translate]] extension instead of Transifex or Weblate; See comments to [[jira:OTHER-350]] for details.
A default locale for releases by the artist (and locale indicator for the official [[Artist Name|ArtistName]]) should be added. '''RFE ME med'''


=== [https://translations.metabrainz.org/ Translation platform] ===
====ArtistAliases====
Weblate’s user interface itself is [https://hosted.weblate.org/engage/weblate/ localized].


== Questions or problems ==
Since an artist can have an unlimited number of [[Artist Alias|artist aliases]], there is some support for i18n already; the [[Misencoding FAQ|MisencodingFAQ]] has served as a training manual for a number of moderators who have done an excellent job of adding aliases in different languages. [http://sourceforge.net/tracker/index.php?func=detail&aid=1059830&group_id=19506&atid=369506 RFE 1059830] '''high''' suggests adding locales to each alias to indicate their language.


To discuss about:
<span id="sortname"></span>
* Project-specific translation: Use the [https://community.metabrainz.org/tag/translation “translation” forum tag] in the [https://community.metabrainz.org/categories category] for that project.
====ArtistSortNames====
* Project-specific internationalization: Use the [https://community.metabrainz.org/tag/internationalization “internationalization” forum tag] in the [https://community.metabrainz.org/categories category] for that project.
* General translation/internationalization: Use the [https://community.metabrainz.org/c/internationalization/21 “Internationalization” forum category] with the appropriate tags.


To report problems:
Currently, the database only supports a single sortname for each artist. In order to provide a consistent sort order across multiple alphabets, the generally accepted guideline is to use only roman (latin) alphabet characters in [[Artist Sort Name|ArtistSortName]]s. While this is less than ideal, solving this problem internationally is an extremely complex problem, since the rules for sorting vary by locale, and conventions about spelling out numbers in names differ as well. Given the total amount of i18n and l10n work needed for more important problems, and the difficulty of solving this relatively unimportant one, it is probably best to postpone a better solution for this until after the i18n effort is largely complete. In the meantime, some of the following points should probably be added to the [[Style Guideline|style guidelines]]:
* Search for [https://tickets.metabrainz.org/issues/?jql=labels%20%3D%20internationalization%20ORDER%20BY%20status%20ASC%2C%20resolution%20DESC tickets with “internationalization” label] and create a new ticket if not found.
# [[Artist Sort Name|ArtistSortName]]s should be restricted to the Latin-1 (8859-1) character set (''current convention allows any roman characters, even Vietnamese'')
# Sortnames should indicate family name in Asian languages with comma, even though reversal is unneeded, e.g. <code><nowiki> Mao, Tse-Tung </nowiki></code>
# Sortnames should be transliterations of the official artist name, not translations (''currently, translations are sometimes used'')
# Transliterations should use the artist's home country's standard transliteration into roman characters supported by 8859-1
# If there is no standard transliteration, the standard English transliteration should be used - transliterations into other languages may be different (e.g. Ч = Ch in English, but Ч = Tch in french)
# However, common English spellings should be preferred, e.g. Tchaikovsky, not Chajkovskij.


For more real-time interactive conversation with developers, you’re welcome to ask in the [[Communication/IRC|#metabrainz IRC channel]].
Points 1 and 3 represent a change from existing convention - comments are welcome on the mailing list. In particular, point 3 is often broken for Asian artists who have "English names" that are not transliterations, but more like alternate names.


[[Category:Development]] [[Category:Internationalization]]
For a more international sort, there could be a point requiring numbers to be written numerically (e.g. <code><nowiki> 4 Tops, The </nowiki></code> rather than <code><nowiki> Four Tops, The </nowiki></code>) but as most languages specify sorting numbers as if written out in full (in the local language, of course) this is likely to meet weak acceptance and strong opposition.

===Releases and Tracks===

Having multiple aliases for each track title seems a far too complex mechanism to ever be implemented; instead it will probably be preferable to have translations on a release basis, so that there will be duplicate entries for releases sharing the same track time data and TRMs. (Release data and disc ids may be something we don't want to share; so that the Japanese release (and disc id) is associated with the Japanese translation, but not the English titles -- this is not yet entirely clear.

Especially now that the database supports assigning [[Disc ID|DiscID]]s to multiple releases, it is quite reasonable to have [[Virtual Duplicate Release|VirtualDuplicateRelease]]s that are not truly duplicates, since they represent different translations/transliterations.

[[Advanced Relationships|AdvancedRelationships]] could potentially be used to link different releases that represent translations/transliterations of each other. [[User:TarragonAllen|TarragonAllen]]'s [[Release Groups|ReleaseGroups]] proposal could provide a framework for this as well.

It may be desirable to have per-track locale information, but this should probably be used to record the performance language of a particular track, which would not necessarily be the same as the language of the track title (especially on a translated release).

For titles where translations or transliterations are present together with the original title, perhaps there should be a [[Style Guideline|StyleGuideline]] specifying use of square brackets; however in most cases it will be preferable to have them in separate titles on duplicate releases. Parts of the original title that are written in latin letters, e.g. (remix) should be omitted from the translated version, e.g. "Знаю Я (remix)[[I Know]]".

If a title is given only in translation or transliteration, do not use square brackets, e.g. "Yang Ku Tunggu" not "[Yang Ku Tunggu]".

On [[Various Artists|VariousArtists]] releases, Artist aliases that are most compatible with the locale of the release itself should be used. Thus, on an "en" compilation where a Chinese artist appears, her "en" alias (if any) would be preferred to her official "zh" name. There would probably need to be some interaction with user preferences here as well.

===Unicode issues===

As there are sometimes multiple ways to represent the same symbols with different unicode byte sequences (e.g. using combining accent marks) it may be desirable to enter these as aliases; a normalized form should be used for all names and titles in the database. See [http://lists.w3.org/Archives/Public/www-xml-canonicalization-comments/2000Jan/0002.html http://lists.w3.org/Archives/Public/www-xml-canonicalization-comments/2000Jan/0002.html] - it's possible that the musicbrainz server and or database software may (or should) do this already.

There's a Perl Unicode normalization tool [http://www.w3.org/International/charlint/ Charlint] that could be used or adapted to do this normalization; for artist names and aliases, which are supposedly unique NFKC (normal form, compatibility decomposition + canonical composition) would probably be best, while for release and track titles, NFC (normal form, canonical decomposition + canonical composition, which doesn't change visible appearance) would probably be better.

There are also ranges of Unicode characters that should be avoided as they do not provide increased range of expression but merely create interoperability issues for those without complete Unicode fonts. In particular, the following should be explicitly prohibited by [[Style Guideline|style guidelines]] (and perhaps enforced by the database):
# Soft hyphen '''U+00AD'''
# Non-breaking space '''U+00A0'''
# [http://www.fileformat.info/info/unicode/block/halfwidth_and_fullwidth_forms/list.htm Fullwidth latin and halfwidth kana/hangul] '''U+FF00-FFEF'''
# [http://www.unicode.org/faq/utf_bom.html#BOM Byte order mark] '''U+FEFF'''
# Narrow non-breaking space '''U+202F'''
# Ideographic space '''U+3000'''
# Medium Mathematical space '''U+205F'''
# [http://www.fileformat.info/info/unicode/block/general_punctuation/list.htm Typographic spaces] '''U+2000-200B'''
# [http://www.fileformat.info/info/unicode/block/number_forms/list.htm Roman numeral characters, e.g. single character IX] '''U+2160-217F'''
# Private use surrogates and codes '''U+DB80-DBFF''' '''U+E000-F8FF'''
# Control characters '''U+0000-001F''' '''U+007F''' '''U+0080-009F'''

The first two are not specifically Unicode issues as they occur in Latin-1, but these plus the third are among the ones where database enforcement is most desirable as they can lead to artist names that are visually identical in appearance but which are, in fact, different. '''RFE ME low'''

===Language-specific issues===

<span id="zh"></span>
====Chinese scripts and languages====

There may be some mapping from simplified to traditional Chinese and ''vice-versa'' that could be automatically applied. At least, when searching for simplified or traditional Chinese characters with Google, both traditional and simplified versions are considered to match; there may be more information on this in the Wikipedia multi-lingual coordination project. It would be nice to have some sort of automatic conversion between these, at least for searching. '''RFE ME low'''

There are also issues with the locale names. As noted above, zh_CN is often used to indicate Simplified Chinese. and zh_TW to indicate Traditional Chinese, since those are the official scripts of the PRC and Taiwan, respectively. However, both scripts are used elsewhere as well, so this is a bit of a hack. Furthermore, Chinese (although written in only two ways) actually is several distinct languages (Mandarin is the standard, but there are also Cantonese, Hakka, Xiang, and others). As long as we are only concerned with titles, we mostly only need to worry about simplified/traditional, but if we use locales to indicate performance languages like Cantonese, it gets messy (there is some usage of zh_HK to indicate Cantonese, but what about Hakka? etc.) Even without considering performance language, transliterated titles may need better indicators of language. A locale of zh.ASCII would imply Pinyin transliteration (based on Mandarin pronounciation) by default, but if the language was actually Cantonese or Hakka, you end up with zh_HK.ASCII or what?

====Icelandic sorting====

Although Icelandic uses only characters from the Latin-1 alphabet, there are some unusual locale issues for Icelandic artists. As last names in Icelandic are patronymics rather than family names, and as such are less significant, the phone book is sorted by first names. It would probably be a good idea to have a [[Style Guideline|StyleGuideline]] that specifies that the [[Artist Sort Name|ArtistSortName]] for [[Icelandic Artists|IcelandicArtists]] should '''not''' be reversed. (Fortunately, this isn't an issue for the most famous Icelandic artist, [[artist:87c5dedd-371d-4a53-9f7f-80522fb7f3cb|Björk]], as she only uses one name artistically.)

==Web Server==

===Messages and Text on Web pages===

All text and messages in the user interface of the web site are currently in English, and lacks internationalization. But even with internationalization, which is technically fairly straightforward, there are much larger problems with localization: How can you avoid making the process to create new pages awkward and painful? How do you handle translation issues when some of the volunteer translators for a language are slacking and you are ready to roll out a new version of the website? There are several social and technical issues to address here.

We may want to provide something like a wiki interface to allow moderators to "Edit the translation of this page" to update and edit the translated content of the pages; this will make providing new translations as easy as reasonably possible. It doesn't address certain aspects of the user interface that aren't embodied in a web page, e.g. the email sent with moderation notes.

The new generation server is currently translated in transifex here https://www.transifex.com/projects/p/musicbrainz/
don't hesitate to subscribe to the team in the language you speak and help translating the web server messages.

Also, see the [[Server Internationalization|NGS Server Internationalization]] for language-specific translation guidelines and additional information.

===FreeDB importing===

Now that it is possible to select different encodings for the [[FreeDB]] data, this is less of a problem; however, there are "breakaway" [[FreeDB]] registries apart from [[FreeDB]].org, especially for non-western artists (there is at least one in Japan, and I would not be surprised to find one for India). There should be some way to support importing from alternate [[FreeDB]] servers. '''RFC ME med'''

===Browse Artists===

Currently, [[Browse Artists|BrowseArtists]] uses a Latin alphabet with no accents or non-English letters (accents and ligatures/digraphs like ae are mapped to unaccented letters); since [[Artist Sort Name|ArtistSortName]]s are restricted to Latin-1 by convention (or future [[Style Guideline|StyleGuideline]]) as noted [http:#sortname above], this is not a terribly limiting factor. A helpful short-term fix would be to display the regular artist name together with the sortname if they differ. '''RFE ME low'''

A complete fix should allow artists to have multiple locale-dependent sortnames, and permit users to choose the desired alphabet to browse in a drop-down or some such. However, given the general uselessness of alphabet-based browsing with a database containing over 100 thousand artists, this is unlikely to ever be worth implementing. Significant improvements to browsing would only come as part of a complete overhaul, probably using genre information that is also not currently stored in the database.

===Quick Searches===

The existing [[Artist Search|ArtistSearch]], [[Release Search|ReleaseSearch]], and [[Track Search|TrackSearch]] are all based on keyword indexes that use non-alphanumeric characters to break names and titles into words. This works well for simple phonetic scripts, but does poorly with ideographic scripts that don't generally use whitespace and punctuation to separate words. An improvement would be to treat all ideographic characters (Japanese kanji; most, but not all, Chinese characters; Korean hanja and maybe hangul, possibly others) as standalone words. '''RFE ME med'''

===Advanced Search===

Currently, [[Advanced Search|AdvancedSearch]] uses aliases to generate a candidate set of artists that match (based on a quick search) but the relevance ranking is based only on the [[Artist Name|ArtistName]]; this gives suboptimal results when a different character set is used, e.g. Advanced Search for artist name Wong.

===LuceneSearch===

It should be possible to use locale data to distinguish between correct names and misspelled names added as aliases for search purposes. A better search, like that provided by Lucene, would not rely on aliases entered with typos, but rather use a phonetic similarity algorithm of some sort, so it should ignore misspellings.

There are probably other issues for internationalization of [[Lucene Search|Lucene searching]]; this area requires more exploration.

===Automatic Transliteration===

Automatic transliteration could be done for many languages if no transliterated/translated alias is available. For best results it is necessary to know the language (e.g. cyrillic script is used by several languages; transliteration will be subtly different from Ukrainian or from Azerbaijani - in the case of Chinese, differences between dialects are even more dramatic). For Japanese, where identical kanji can have multiple different readings, the correct transliteration may not be easy to determine at all. In addition, individual artists often may prefer nonstandard transliteration of their names, or may have an "English" name that isn't really a transliteration.

===Non-linguistic Locale issues===

There are some issues that are purely based on non-lingustic locale

====Release dates====

Different countries use different date formats. While the server uses the international ISO date format (2004-12-08) users may prefer to see other formats (European dd.mm.yyyy or American mm/dd/yyyy). '''RFE ME low'''

====Amazon links====

Currently the user can select a preference for a single Amazon site; but some releases are available only on one site. [http://sourceforge.net/tracker/index.php?func=detail&aid=1031066&group_id=19506&atid=369506 RFE 1031066] suggests that some way to have a default (or fallback) store for each country would be useful, and could be used based on release country if no match is found at the user's preferred store.

===Language-specific issues===

There are some issues that are particular to specific languages. Since the browser is doing the rendering, problems like combining characters aren't an issue for [[MusicBrainz]], but some issues remain.

====Greek final sigma====

One case where Unicode forces applications to deal with combining characters directly, rather than leave it to the browsers, is the alternate form of lowercase sigma at the end of a word. [http://sourceforge.net/tracker/index.php?func=detail&aid=1021537&group_id=19506&atid=119506 RFE 1021537] points out two places (Javascript "Guess Case" and auto-approval for case/accent-changing edits) where this needs to be handled.

====Right-to-Left Support (Arabic & Hebrew)====

Characters in these alphabets are written right-to-left, and the hairy and complex bidirectionality support tries to make these work correctly, even when embedded in a page that is primarily left-to-right. However, some things get botched. Meta-information in parentheses, like (disc 1), is particularly mangled, and in subcription notification emails, you also get things like "2) <hebrew> open, 4 applied)" since parentheses and numbers don't override the current default direction, and parentheses are reversed based on current direction. Judicious use of RTL and LTR overrides at the beginning and end of [[Artist Name|artist names]] in Arabic and Hebrew would help (although I don't believe they should be embedded in the [[Artist Name|artist names]] themselves). '''RFE ME med'''

Furthermore, for localization of the web server itself into Hebrew and Arabic, right justification (or really, mirror display) of all the pages layouts will be needed. The i18n support for this is surely not yet present. '''RFE ME low'''

==Wiki==

The user interface is localized but not the content which is generally assumed to be in English with a few exceptions such as language translation project pages.

We gave up using the [[mw:Content translation]] tool due to incompatibilities at that time and inability to ensure consistency with other translations unless using [[mw:Extension:Translate]] instead of Transifex or Weblate; See comments to [[jira:OTHER-350]] for details.

==Tagger==

The [[MusicBrainz Tagger|MusicBrainzTagger]] does not handle non-latin (8859-1) characters correctly, but it is probably not worth fixing; the new [[Picard Tagger|PicardTagger]] works somewhat better, but it has problems on different platforms.

===ID3 tags===

[[ID3v2 Tags|ID3v2Tags]] only provide full i18n support in ID3v2.4 (ID3v1 only supports 8859-1 encoding in tags. ID3v2.3 generally supports unicode except in URLs and numeric strings, however taggers and players may not themselves support unicode). However, the tagger needs to correctly set the encoding to UTF-8. '''RFE ME high'''

Some users may prefer to use other encodings for their tags (e.g. Big5 or other 8859 variants); the tagger should support user selection of preferred encodings. '''RFE ME med'''

===Filenames===

Support for non-latin filenames in filesystems are patchy and inconsistent; most systems will support any 8859 variant, or UTF-8, but there could be problems with other multibyte encodings. Using the existing UTF-8 encoding should be okay in most cases, but it might be desirable to allow user selection of other encodings. '''RFE ME low'''

==Discussion==

[https://community.metabrainz.org/c/internationalization/21 Internationalization community forum category] is available for the community to discuss about general internationalization issues.
To discuss translating project-specific terms, it is rather recommended to use the [https://community.metabrainz.org/tag/translation translation tag] in the community forum category for that project.
To discuss a project-specific internationalization matter with non-translators, it is rather recommended to use the [https://community.metabrainz.org/tag/translation internationalization tag] in the community forum category for that project.

To search for issues see [https://tickets.metabrainz.org/issues/?jql=labels%20%3D%20internationalization%20ORDER%20BY%20status%20ASC%2C%20resolution%20DESC internationalization-labeled tickets] and create a new ticket if not found.

There is also and old [[Talk:Internationalization]] attached page taken from the previous version of this wiki page. Further extensive discussion should probably be moved elsewhere (to either community forums or tickets).

<span id="en_UK"></span> ''For consistency with the [[mw:Manual:Page_title|WikiName]] of this page, the U.S. English (en_US) spellings of internationalization and localization have been used throughout, but it is worth noting that British English (en_GB, ''not'' en_UK) spells these words with "s" instead of "z": internationalisation and localisation.''

[[Category:To Be Reviewed]] [[Category:Development]] [[Category:Internationalization]]

Latest revision as of 20:28, 20 December 2023

This page is about internationalization of MetaBrainz projects, which means making the projects support different languages and work as expected in different regions. The main section below is about translation of the projects’ user interface: being able to provide the software itself in different languages. The rest is about more general issues, such as correctly handling data from different languages and different regions.

Translation

This section is intended for translators of projects related to the MetaBrainz community.

Platform

We have a dedicated translation platform which you can use with your MusicBrainz account. It is hosted by the Weblate team, with a data processing agreement in conformity to the GDPR. Note that the legal terms are generic to all Weblate instances and that “purchase” isn’t available from MetaBrainz Weblate instance.

See the Weblate user docs for a general introduction on how to use the platform.

Projects

Each of the translation projects has an “Info” tab linking to one of the below wiki page that gathers project-specific information for translators and, more generally, about internationalization for that project:

Note: The following official MetaBrainz projects aren’t translatable at the moment: BookBrainz, Cover Art Archive, and ListenBrainz. The archived projects AcousticBrainz and MessyBrainz have never been (and won’t ever be) translatable.

Our platform isn’t limited to official projects maintained by the Foundation, and we also host translations for community projects. If you have a MetaBrainz-related project you want to make translatable, such as a script or extension, get in touch!

Languages

Among all the languages we currently support on Weblate a fair amount are likely to contain only old translations and not have any recent activity, and not all projects are available for all languages yet. You can find out more by clicking on the desired language; projects supporting the language will be listed on the “Projects” tab and activity data on the “Information” tab.

If a language is not available at all or your language is not available for the project you want to translate, just ask for it on the “Internationalization” community forum category.

To help with coordinating translations in the same language, you can use Weblate glossaries and comments, the MetaBrainz community forums (see below questions or problems), and/or a wiki page (create one if missing):

Beyond translation

Forums

Discourse’s user interface is localized.

  • The few topics written in a language other than English can use tags such as de for Deutsch (German). A separate selector is available and language names are localized using the Multilingual plugin.
    There is a topic about making this practice official.
  • Pinned welcome topics are not localized, the same topic as above is also about localizing welcome topics.
  • Category titles are now localized using the Multilingual plugin.
  • Some semi-static texts are not localized. It could be implemented in Multilingual Plugin if anyone feel up to coding for this Ruby project.
  • Content is not translated. The Translator plugin could be considered.
  • The ListenBrainz plugin can be translated in the project Discourse ListenBrainz on the MetaBrainz Weblate translation platform.

Tickets

Jira’s user interface is localized.

Note that the content doesn’t have to be localized since English is the work language for the MetaBrainz team. (Tickets reported in another language would probably be translated in English and replied in both languages.)

Wiki

MediaWiki’s user interface is localized but not the content, which is generally assumed to be in English (with a few exceptions such as language translation project pages).

We gave up using the Content translation tool due to incompatibilities at that time and inability to ensure consistency with other translations unless using Translate extension instead of Transifex or Weblate; See comments to jira:OTHER-350 for details.

Translation platform

Weblate’s user interface itself is localized.

Questions or problems

To discuss about:

To report problems:

For more real-time interactive conversation with developers, you’re welcome to ask in the #metabrainz IRC channel.