User:Ianmcorvidae/Translation Process

From MusicBrainz Wiki

Introduction to gettext and translation

MusicBrainz's translation system is based on gettext, which is a fairly standard way to translate applications that run on *nix platforms (such as our Linux servers). gettext is centered around the notion of "message catalogs" -- at the most basic, lists of bits of text (strings) and their translations. It also supports some special handling for pluralizations (which get weird in some languages) and for providing a certain amount of context for strings that might otherwise get confused (especially very short strings). It also supports using several different catalogs for different parts of a project, calling these 'domains'. MusicBrainz uses a number of different domains, some with plain catalogs and some using pluralizations and/or contexts. A current list is available at https://www.transifex.com/projects/p/musicbrainz/resources/ (Transifex's "resource" is the same as a "domain" in gettext).

gettext uses three file formats, primarily: .po, for catalogs of translated messages for a given language, in human-readable form, .pot (PO Template) for catalogs of messages to be translated (essentially, taking a .pot and adding translations makes it a .po), and finally .mo files, which are the computer-readable equivalent of .po files, akin to the compiled version of software versus the source code. .pot files are generated by extracting strings from the application's code (and in MusicBrainz's case, also from the database), then translators use the source strings in the .pot files and translate them to create .po files, then the .po files are compiled into .mo files and used to display translated strings for users of the application.

At present, we use Transifex, an online platform for translation. MusicBrainz is at https://www.transifex.com/projects/p/musicbrainz/. Transifex provides a command-line client, called 'tx', that allows for automating some of the things possible through the web interface, including downloading and uploading translations.

Process

  1. The various .pot files (one per domain) get generated (by someone or some automatic process) from some branch of the code, at some time (we'd discussed "at beta freeze"). This is done by running 'make pot' in the po/ folder on a machine with a checkout; a working musicbrainz-server & Locale::PO (for extraction of strings from the DB and from templates); and xgettext (for extraction of strings from Perl code).
  2. The .pot files are uploaded to their associated 'resources' on Transifex manually, or if they're on my (ianmcorvidae's) update-pot branch on github with the correct names, Transifex auto-updates from them. (This autoupdate location can be changed, of course.)
  3. Translators translate things, using Transifex's web interface, downloading .po files, or whatever method they choose.
  4. Once translations exist or are up to date, anyone who wants to use them downloads the .po files for their chosen languages and domains, using the Transifex web interface or the tx client.
    • Once we have some complete/official translations, their associated .po files might get checked into git as well, so that people who want to use these official translations don't have to download them manually.
  5. After naming the .po files correctly (within the po/ folder, <domain>.<language>.po), 'make install' will compile the <domain>.<language>.mo files and install them to lib/LocaleData/<language>/LC_MESSAGES/<domain>.mo.
    • If official translations are checked in, they'd obviously be named correctly, such that step 4 would just be irrelevant entirely and those users would go to 'make install' directly.
    • On rika, we have a tx client configuration that downloads things to the right names, which we might consider checking in so it's easy to update translations from Transifex too
  6. With the relevant language(s) in lib/DBDefs.pm under MB_LANGUAGES, and a system locale for that language on the system running musicbrainz-server anyone sending the appropriate Accept-Language header (configured in browser preferences) gets a translated site!
    • Having to build locales is somewhat of an annoyance, but is incredibly easy on every system, so it's not really a big deal. The advantages of using the system that makes this a requirement cancel out this detriment quite completely!
    • The language-switcher will result in changing the way people customarily set the language, but Accept-Language will still be a fallback method.

Notes and open questions

  • There are some problems with Transifex, and it may eventually get moved away from. In any case, this changes only a few steps from the perspective of a non-translator. The main contender, in my opinion, is weblate, which screws up less frequently and has better git integration. It's fairly new, or I suspect we'd have been using it already.
  • nikki also has a 'translations' branch (on github) that backs up all our translations hourly (when there's changes). I copy this sometimes to my server and my github, for further backup goodness. We might eventually integrate this process -- of course, if we switch to weblate, that commits to git for us and it'd just be pulling/pushing elsewhere occasionally.
  • The above is pretty vague about when these various things happen, because at present we don't really know. We've discussed beta freeze time for when to regenerate the .pot files, and if we check official translations into git they'd presumably be part of the releases.