MusicBrainz Server/Internationalization: Difference between revisions
(→Current features: Selected changes by User:Reosarevok) |
(→Future: Trim empty subsections that don’t fit in the current layout, renamed as ideas up in the air) |
||
Line 105: | Line 105: | ||
It mostly requires making proper use of [https://solr.apache.org/guide/solr/latest/indexing-guide/language-analysis.html language analysis] from Apache Solr. |
It mostly requires making proper use of [https://solr.apache.org/guide/solr/latest/indexing-guide/language-analysis.html language analysis] from Apache Solr. |
||
=== |
=== Ideas up in the air === |
||
==== Overview ==== |
|||
One of the goals of MusicBrainz is to store information about music from all over the world, and since that music is written in many languages, support for those languages is essential. In the future, we also want people to be able to use MusicBrainz in ''any'' language, not just English, especially since the people who know the most about music in other languages are often native speakers of those languages. |
|||
The decision to use Unicode for MusicBrainz was an important first step on the road to internationalization, and it has allowed entry of hundreds of [[International Artists]] with works in dozens of languages, but there remains much work to be done. The work of adapting software so that it can be used with different languages or in different regions is called '''internationalization''' (abbreviated as I18N), and translating it into each of those languages and regions is called '''localization''' (abbreviated as L10N). Both of these are substantial efforts, but the resources needed are different. Internationalization requires specialized understanding of aspects of many languages, but that is often easier to find than the native linguistic ability in non-Western languages needed for localization. |
|||
---- |
|||
The following is a breakdown of the many issues for i18n and l10n by area. Issues that should have RFEs filed are marked with '''RFE ME''' and a note on the priority (low, med, high). Where RFEs have already been filed, they should be linked. |
|||
==== Database ==== |
|||
Many of the most crucial issues for i18n are with the database schema, in order to support the additional data needed to properly localize artists, releases, etc. The localization itself is done by moderators, and can even be done to some extent without full i18n support in the database. |
|||
===== Artists sort names ===== |
===== Artists sort names ===== |
||
Artists currently get a (main) sort name which must be in Latin script, and translations or transliterations are used for artists with non-Latin names. This could eventually be replaced by the already existing alias sort name feature, which already allows any appropriate script for the alias locale; it might require the introduction of either a generic "Latin script" alias locale or a way to indicate Latin transliterations for non-Latin alias locales. |
Artists currently get a (main) sort name which must be in Latin script, and translations or transliterations are used for artists with non-Latin names. This could eventually be replaced by the already existing alias sort name feature, which already allows any appropriate script for the alias locale; it might require the introduction of either a generic "Latin script" alias locale or a way to indicate Latin transliterations for non-Latin alias locales. |
||
==== Web Server ==== |
|||
===== Automatic Transliteration ===== |
===== Automatic Transliteration ===== |
Revision as of 11:37, 27 June 2023
Getting started
If you want to help translate, go to the Transifex page and create an account. If there is already a team for your language, you can join it, if not, you can ask for the creation of a new team.
There used to be an i18n mailing list, but it is discontinued and has been replaced by new forums (using categories and tags).
Questions or problems
If you have any questions or you're having any problems, you're welcome to ask in the #metabrainz IRC channel.
If you find a bug in the server, you can enter an issue in our bug tracker.
Translation components
The following components are available for translation:
Attributes
It contains the names and the descriptions of MusicBrainz entity attributes such as artist’s type and so on.
It is also used by MusicBrainz Picard.
Countries
It contains the names of release countries.
It is also used by MusicBrainz Picard.
Note that country names should be the same as area aliases; See jira:MBS-13140 for follow-up.
Only the documentation Release/Country is not localized for now; See jira:MBS-13109 for follow-up.
Instrument Descriptions
It contains only the descriptions of instruments.
Instruments
It contains only the names of instruments.
Note that instrument names should be the same as instrument aliases; See jira:MBS-13141 for follow-up.
Languages
It contains the names of languages that can be set for [[Release#Language|release]’s tracklist and [[Work|work]’s lyrics.
Relationship Types
It contains the names, descriptions, and (forward/long/reverse) link phrases of relationship types as well as the names and descriptions of relationship attributes. See also Relationships.
Scripts
It contains the names of scripts that can be set for [[Release#Script|release]’s tracklist.
Note: Because of transliteration a language is not necessarily paired with its usual script/writing system.
Server
It contains the messages shown to users and admins by the MusicBrainz website.
Statistics
It contains the events in MusicBrainz timeline and the messages for Database Statistics section of the website UI.
Viewing the translations
Some of the more complete translations (generally those over 50% translated) are available on the beta server at https://beta.musicbrainz.org/. The translations do not update automatically (see development beta cycle), but the beta server uses the same database as the main server. If you want to use the beta server all of the time for your editing, click the "Use beta site" link in the footer of https://musicbrainz.org/.
Variables
Translatable messages not only contain plain text or HTML markup, they can also contain replaceable variables. For example:
- In
{entity1} has a BookBrainz page at {entity0}
, which is a URL-Work relationship link phrase, there are two entity variables whose name should not be translated, since variable{entity1}
will be replaced by a work title and{entity0}
by a URL. - In link phrases, variables are often used for (optional) attributes, in order to avoid inflating the number of messages. Below are examples with the “additional” attribute:
{additional}
will be replaced by additional if the “additional” attribute is set, otherwise it will be removed from the text.{additional:additionally}
will be replaced by additionally if the “additional” attribute is set, otherwise it will be removed from the text.{additional:an|a}
will be replaced by an if the “additional” attribute is set, otherwise it will be replaced by a.{additional:%|regular}
will be replaced by additional if the “additional” attribute is set, otherwise it will be replaced by regular.- Hence,
{additional}
can be translated as{additional:aldona}
in Esperanto.
- Note that
{instrument}
and{vocals}
variables are replaced by the specific instrument/vocals name:{instrument:%|instruments}
will be replaced by piano (or its translation) if the related instrument is “piano”, otherwise it will be replaced by instruments.
Development
The MusicBrainz Server code is using gettext to provide with automatic internationalization of messages and texts used in the Perl code and templates.
A .pot file is provided with all the strings used in the server. They are in English.
Beyond translation
Current features
- Localized aliases whose list of locales is imported from Unicode CLDR.
- Support to search for entities by name, alias, or both using fuzzy search (the default).
- Specific sort name for each alias indicating how the entity should be sorted under that name in the given locale.
- Aliases are returned (if specifically requested) by the MusicBrainz API.
- However aliases are not otherwise used for display in the website; See jira:MBS-11965 for follow-up.
- Language and script (a.k.a. writing system) of each release’s tracklist.
- Relationship type to link releases having translated/transliterated title and tracklist.
- The ability for editors to add pseudo-releases (to be backed with alternative tracklists) for translating/transliterating any release.
- Support to search for releases by their tracklist’s language and script
- Relationship type to link releases having translated/transliterated title and tracklist.
- Language of work’s lyrics.
- Relationship type attribute to link works having translated lyrics.
- Support to search for works by lyrics’ language
- Ability to enter localized artist names as appropriate on releases and recordings using artist credits.
Current issues
Most of current issues are tracked through MusicBrainz Server internationalization tickets. Some more long-term goals are not tracked yet.
There are most likely some internationalization issues with fuzzy search in some languages (with agglutinative words or ideographic characters). It mostly requires making proper use of language analysis from Apache Solr.
Ideas up in the air
Artists sort names
Artists currently get a (main) sort name which must be in Latin script, and translations or transliterations are used for artists with non-Latin names. This could eventually be replaced by the already existing alias sort name feature, which already allows any appropriate script for the alias locale; it might require the introduction of either a generic "Latin script" alias locale or a way to indicate Latin transliterations for non-Latin alias locales.
Automatic Transliteration
Automatic transliteration could be done for many languages if no transliterated/translated alias is available. For best results it is necessary to know the language (e.g. cyrillic script is used by several languages; transliteration will be subtly different from Ukrainian or from Azerbaijani - in the case of Chinese, differences between dialects are even more dramatic). For Japanese, where identical kanji can have multiple different readings, the correct transliteration may not be easy to determine at all. In addition, individual artists often may prefer nonstandard transliteration of their names, or may have an "English" name that isn't really a transliteration.