History:Wiki Migration Bot

From MusicBrainz Wiki
Jump to: navigation, search

The script that was used to port this wiki from UseMod to MoinMoin saved the ported pages under the username WikiMigrationBot.

The script reports any broken markup at the bottom of each broken page, and links to the WikiMigrationBotReport page. Editors can find pages that need to be fixed by following backlinks from that page.

Download:

The wiki migration bot is a perl script, and is available for download here (direct download).

The code has been released under an Apache 2.0 license.

Features:

  • Converts markup from UseMod to MoinMoin.
  • Lines that need manual work (mainly links or images in titles) are flagged in a section at the bottom of the page, and link to WikiMigrationBotReport.
    • ImageInTitles are moved to the line above the title so that the image is displayed (looks ugly, but it's better than a broken link).
  • UseMod definition lists (SomeTerm: a definition) are converted to bulleted lists, because this allows links in the SomeTerm part to work. Two sets of square braces at the start of the line are also stripped out, to removed the anchor'ed bullet point used in the EditTypeTemplate.
  • Fixes some markup because MoinMoin is more strict than UseMod - for example UseMod allows any number of ='s at the end of a title, while MoinMoin insists on the same number as at the start.
  • Replaces html literals (eg namp, ndash, bull) with text.
  • Converts html entity codes to Unicode characters.
    • There was an issue with a missing encode() (fixed by DaveEvans) that caused the following error message:
      Use of uninitialized value in substitution iterator at /usr/local/share/perl/5.8.4/URI/_query.pm line 16.
  • Empty pages are not ported.

Current Issues:

  • The bot has made a bit of a hash of pages with
    lots ofpre-formatted textlike this
    (for example, source code or terminal dumps). In these cases I've been referring to the OldWiki and copying+pasting the content back across by hand.
  • Quite a few pages also have <nowiki> markup which probably needs to be replaced by `stuff here`
  • Likewise &fullsearch=Text <code> crops up now and again
    • only two pages, both fixed

Ported Pages with Broken Markup

Pages that need work can be found by following the backlinks from the WikiMigrationBotReport. Any other pages that need special attention should be listed here.

Resolved Issues

  • Definition list titles don't link in MoinMoin. For example, the Old/New list headings on MusicBrainzGuideline wouldn't link if they were in MoinMoin.
    • I think the neat way to fix this is to replace definition lists with Title: \n Indented Text. It looks pretty much the same, and allows linking. --JohnCarter
  • Images & links in title aren't supported by MoinMoin, flag them up so that they can be fixed by hand.
    • How do you handle links that use images, e.g. [[[Image:something.gif]]]? Not sure how many of these there are, but there might be some. @alex
  • <nowiki></nowiki> tags can encapsulate multiple words, and there's no direct equivalent in MoinMoin.
    • While the <nowiki></nowiki> pseudo-HTML that Usemod supports can be used to de-activate a single word, in which case the MoinMoin equivalent would be ! (e.g. WikiName), but it can also de-activate an entire region of text, in which case the only MoinMoin equivalent would be the <code> (which is actually the same as <code><nowiki> </code></nowiki>). Can the bot flag the latter case, as it is likely to need editor attention? @alex
      • OK, the bot will flag up these cases (should be quite rare).
  • UseMod seems to interpret lines with only spaces and tabs in them as blank lines, but your script puts a pre line. There are such lines on DonRedman which renders like this. Of course this is a rather minor bug. --DonRedman
  • Some pages use anchors like this <span id="anchorname"></span> and [http:#anchorname link to anchor]. Have you thought of them? --DonRedman
  • I wish the bot didn't complain about all those occasions when the WikiName "MusicBrainz" has been used in a heading :-)