Difference between revisions of "User:Ianmcorvidae/Translation Process"

From MusicBrainz Wiki
(New page: == Process == # .pot files get generated (by someone, automatic process, whatever) from something (some branch, presumably), at some time (we'd discussed "at beta freeze"). This is done by...)
 
 
Line 1: Line 1:
 +
== Introduction to gettext and translation ==
 +
MusicBrainz's translation system is based on gettext, which is a fairly standard way to translate applications that run on *nix platforms (such as our Linux servers). gettext is centered around the notion of "message catalogs" -- at the most basic, lists of bits of text (strings) and their translations. It also supports some special handling for pluralizations (which get weird in some languages) and for providing a certain amount of context for strings that might otherwise get confused (especially very short strings). It also supports using several different catalogs for different parts of a project, calling these 'domains'. MusicBrainz uses a number of different domains, some with plain catalogs and some using pluralizations and/or contexts. A current list is available at https://www.transifex.com/projects/p/musicbrainz/resources/ (Transifex's "resource" is the same as a "domain" in gettext).
 +
 +
gettext uses three file formats, primarily: .po, for catalogs of translated messages for a given language, in human-readable form, .pot (PO Template) for catalogs of messages to be translated (essentially, taking a .pot and adding translations makes it a .po), and finally .mo files, which are the computer-readable equivalent of .po files, akin to the compiled version of software versus the source code. .pot files are generated by extracting strings from the application's code (and in MusicBrainz's case, also from the database), then translators use the source strings in the .pot files and translate them to create .po files, then the .po files are compiled into .mo files and used to display translated strings for users of the application.
 +
 +
At present, we use Transifex, an online platform for translation. MusicBrainz is at https://www.transifex.com/projects/p/musicbrainz/. Transifex provides a command-line client, called 'tx', that allows for automating some of the things possible through the web interface, including downloading and uploading translations.
 +
 
== Process ==
 
== Process ==
# .pot files get generated (by someone, automatic process, whatever) from something (some branch, presumably), at some time (we'd discussed "at beta freeze"). This is done by running 'make pot' in the po/ folder on a machine with a checkout; working mbserver dependencies & Locale::PO (for extraction of strings from the DB and from templates); and xgettext (for extraction of strings from Perl code).
+
# The various .pot files (one per domain) get generated (by someone or some automatic process) from some branch of the code, at some time (we'd discussed "at beta freeze"). This is done by running 'make pot' in the po/ folder on a machine with a checkout; a working musicbrainz-server & Locale::PO (for extraction of strings from the DB and from templates); and xgettext (for extraction of strings from Perl code).
# .pot files get uploaded to 'resources' on Transifex manually, or if they're on my (ianmcorvidae's) update-pot branch on github with the correct names, Transifex auto-updates from them. (This autoupdate location can be changed, of course.)
+
# The .pot files are uploaded to their associated 'resources' on Transifex manually, or if they're on my (ianmcorvidae's) update-pot branch on github with the correct names, Transifex auto-updates from them. (This autoupdate location can be changed, of course.)
# Translators translate things!
+
# Translators translate things, using Transifex's web interface, downloading .po files, or whatever method they choose.
# Anyone who wants to use translations uses the tx client, or downloads manually from Transifex, the .po files they need/want.
+
# Once translations exist or are up to date, anyone who wants to use them downloads the .po files for their chosen languages and domains, using the Transifex web interface or the tx client.
# After naming the .po files correctly (within the po/ folder, <domain>.<language>.po), 'make install' will regenerate the <domain>.<language>.mo files and install them to lib/LocaleData/<language>/LC_MESSAGES/<domain>.mo.
+
#* Once we have some complete/official translations, their associated .po files might get checked into git as well, so that people who want to use these official translations don't have to download them manually.
# So long as the relevant language is in MB_LANGUAGES in DBDefs and the system has a locale installed for that language (yes, this is annoying, but building locales is easy), anyone sending the appropriate Accept-Language header gets a translated site!
+
# After naming the .po files correctly (within the po/ folder, <domain>.<language>.po), 'make install' will compile the <domain>.<language>.mo files and install them to lib/LocaleData/<language>/LC_MESSAGES/<domain>.mo.
 +
#* If official translations are checked in, they'd obviously be named correctly, such that step 4 would just be irrelevant entirely and those users would go to 'make install' directly.
 +
#* On rika, we have a tx client configuration that downloads things to the right names, which we might consider checking in so it's easy to update translations from Transifex too
 +
# With the relevant language(s) in lib/DBDefs.pm under MB_LANGUAGES, and a system locale for that language on the system running musicbrainz-server anyone sending the appropriate Accept-Language header (configured in browser preferences) gets a translated site!
 +
#* Having to build locales is somewhat of an annoyance, but is incredibly easy on every system, so it's not really a big deal. The advantages of using the system that makes this a requirement cancel out this detriment quite completely!
 +
#* The language-switcher will result in changing the way people customarily set the language, but Accept-Language will still be a fallback method.
  
 
== Notes and open questions ==
 
== Notes and open questions ==
* The language-switcher will result in changing the way people customarily set the language, but Accept-Language will still be a fallback method.
+
* There are some problems with Transifex, and it may eventually get moved away from. In any case, this changes only a few steps from the perspective of a non-translator. The main contender, in my opinion, is [http://weblate.org weblate], which screws up less frequently and has better git integration. It's fairly new, or I suspect we'd have been using it already.
* There are some problems with Transifex, and it may eventually get moved away from. In any case, this changes only steps 2 and 4 from the perspective of a non-translator. The main contender, in my opinion, is [http://weblate.org weblate], which screws up less frequently and has better git integration. It's fairly new, or I suspect we'd have been using it already.
 
* Once we have some complete/official translations, they might get checked into git as well. This would just be the .po files, so it would simply obsolete step 4, leaving that to whoever checks them in/updates the versions in git. Implied question, of course, is when do those get updated.
 
* nikki/I (on rika) have a tx client configuration that downloads things to the right names (see step 5), which we might consider checking in so it's easy to update translations for people who care.
 
 
* nikki also has a 'translations' branch (on github) that backs up all our translations hourly (when there's changes). I copy this sometimes to my server and my github, for further backup goodness. We might eventually integrate this process -- of course, if we switch to weblate, that commits to git ''for us'' and it'd just be pulling/pushing elsewhere occasionally.
 
* nikki also has a 'translations' branch (on github) that backs up all our translations hourly (when there's changes). I copy this sometimes to my server and my github, for further backup goodness. We might eventually integrate this process -- of course, if we switch to weblate, that commits to git ''for us'' and it'd just be pulling/pushing elsewhere occasionally.
 +
* The above is pretty vague about ''when'' these various things happen, because at present we don't really know. We've discussed beta freeze time for when to regenerate the .pot files, and if we check official translations into git they'd presumably be part of the releases.

Latest revision as of 22:43, 3 July 2012

Introduction to gettext and translation

MusicBrainz's translation system is based on gettext, which is a fairly standard way to translate applications that run on *nix platforms (such as our Linux servers). gettext is centered around the notion of "message catalogs" -- at the most basic, lists of bits of text (strings) and their translations. It also supports some special handling for pluralizations (which get weird in some languages) and for providing a certain amount of context for strings that might otherwise get confused (especially very short strings). It also supports using several different catalogs for different parts of a project, calling these 'domains'. MusicBrainz uses a number of different domains, some with plain catalogs and some using pluralizations and/or contexts. A current list is available at https://www.transifex.com/projects/p/musicbrainz/resources/ (Transifex's "resource" is the same as a "domain" in gettext).

gettext uses three file formats, primarily: .po, for catalogs of translated messages for a given language, in human-readable form, .pot (PO Template) for catalogs of messages to be translated (essentially, taking a .pot and adding translations makes it a .po), and finally .mo files, which are the computer-readable equivalent of .po files, akin to the compiled version of software versus the source code. .pot files are generated by extracting strings from the application's code (and in MusicBrainz's case, also from the database), then translators use the source strings in the .pot files and translate them to create .po files, then the .po files are compiled into .mo files and used to display translated strings for users of the application.

At present, we use Transifex, an online platform for translation. MusicBrainz is at https://www.transifex.com/projects/p/musicbrainz/. Transifex provides a command-line client, called 'tx', that allows for automating some of the things possible through the web interface, including downloading and uploading translations.

Process

  1. The various .pot files (one per domain) get generated (by someone or some automatic process) from some branch of the code, at some time (we'd discussed "at beta freeze"). This is done by running 'make pot' in the po/ folder on a machine with a checkout; a working musicbrainz-server & Locale::PO (for extraction of strings from the DB and from templates); and xgettext (for extraction of strings from Perl code).
  2. The .pot files are uploaded to their associated 'resources' on Transifex manually, or if they're on my (ianmcorvidae's) update-pot branch on github with the correct names, Transifex auto-updates from them. (This autoupdate location can be changed, of course.)
  3. Translators translate things, using Transifex's web interface, downloading .po files, or whatever method they choose.
  4. Once translations exist or are up to date, anyone who wants to use them downloads the .po files for their chosen languages and domains, using the Transifex web interface or the tx client.
    • Once we have some complete/official translations, their associated .po files might get checked into git as well, so that people who want to use these official translations don't have to download them manually.
  5. After naming the .po files correctly (within the po/ folder, <domain>.<language>.po), 'make install' will compile the <domain>.<language>.mo files and install them to lib/LocaleData/<language>/LC_MESSAGES/<domain>.mo.
    • If official translations are checked in, they'd obviously be named correctly, such that step 4 would just be irrelevant entirely and those users would go to 'make install' directly.
    • On rika, we have a tx client configuration that downloads things to the right names, which we might consider checking in so it's easy to update translations from Transifex too
  6. With the relevant language(s) in lib/DBDefs.pm under MB_LANGUAGES, and a system locale for that language on the system running musicbrainz-server anyone sending the appropriate Accept-Language header (configured in browser preferences) gets a translated site!
    • Having to build locales is somewhat of an annoyance, but is incredibly easy on every system, so it's not really a big deal. The advantages of using the system that makes this a requirement cancel out this detriment quite completely!
    • The language-switcher will result in changing the way people customarily set the language, but Accept-Language will still be a fallback method.

Notes and open questions

  • There are some problems with Transifex, and it may eventually get moved away from. In any case, this changes only a few steps from the perspective of a non-translator. The main contender, in my opinion, is weblate, which screws up less frequently and has better git integration. It's fairly new, or I suspect we'd have been using it already.
  • nikki also has a 'translations' branch (on github) that backs up all our translations hourly (when there's changes). I copy this sometimes to my server and my github, for further backup goodness. We might eventually integrate this process -- of course, if we switch to weblate, that commits to git for us and it'd just be pulling/pushing elsewhere occasionally.
  • The above is pretty vague about when these various things happen, because at present we don't really know. We've discussed beta freeze time for when to regenerate the .pot files, and if we check official translations into git they'd presumably be part of the releases.