Development/Summer of Code/2020/MusicBrainz
- 1 Ideas
- 1.1 MusicBrainz data visualization with React
- 1.2 Robustify search index rebuilder
- 1.3 Spam detection with online learning
- 1.4 Automate areas management in MusicBrainz
- 1.5 Integrate Internet Archive in MusicBrainz
- 1.6 Improve editing interface for event setlist on the MusicBrainz website
- 1.7 Add support for in-place localization of the MusicBrainz website
- 1.8 Embed documentation into the MusicBrainz website
- 1.9 Improve the UX of voting for edits of the MusicBrainz database
- 1.10 Add social features to MusicBrainz
- 1.11 Integrate more *Brainz in more *Brainz
MusicBrainz data visualization with React
Proposed mentors: bitmap, reosarevok, yvanzo
Forum for discussion
MusicBrainz has a timeline about global statistics which has to be rewritten from Template Toolkit and jQuery/Flot to React and Viz.js (or any other data visualization library) to start with.
Moreover, timeline (or other kind of chart) would be a very helpful to visualize relationships for famous artists such as Mstislav Rostropovich who had a great career. It can probably apply to famous works too, e.g. Over the Rainbow.
Note that data is already available to React components that render artist/work Relationship tabs, thus you won’t have to bother about the PostgreSQL/Perl data layer MusicBrainz is running.
This would lay the foundations of data visualization in the React-rendered website of MusicBrainz.
Robustify search index rebuilder
Proposed mentors: ruaok', yvanzo
Languages/skills: Python, PostgreSQL, RabbitMQ, Solr, threading/multiprocessing
Forum for discussion
MusicBrainz has a search feature (webpage / web service doc) which is based on Apache Solr. This search engine holds its own indexes that must be built and updated to follow changes made to the PostgreSQL that actually holds the MusicBrainz database. This is SIR (Search Index Rebuilder) duty. It works fine in production, but it needs some improvements to be able to:
- Self-adapt to available resources so as to avoid thrashing
- Mix both manual reindexing and live indexing
- Deploy PostgreSQL extension and triggers
- Report progress while reindexing
- Selectively reindex a given set of MB entities
- Check status of search indexes, message queue, and database
For now its scope is limited to MusicBrainz but the project is planned to handle other MetaBrainz projects in the future as well.
Spam detection with online learning
MusicBrainz is plagued by a lot of automatic spam. Since 2017, a new MetaBrainz project called SpamBrainz has been started to automate spam detection so as to help admins with handling spam.
An initial machine learning system has been implemented by LeoVerto in 2018, see blog post. Its Keras model named Lodbrok reached an outstanding accuracy after some offline training. Unfortunately, it has not yet been integrated and deployed due to a lack of time. Two main steps remain: spam ninjas system (that has been designed but never been completed), and online learning (that is the ability to update the models according to human feedback).
For now its scope is limited to MusicBrainz but the project has been designed to be able to handle other MetaBrainz projects in the future as well.
Automate areas management in MusicBrainz
Areas are used in MusicBrainz to locate concert hall, recording studio, artist's place of birth, and so on. But MusicBrainz is not curating geographical metadata and thus should rely on external database instead. Currently, editing areas is reserved to editor dr_saunders who voluntarily addresses issues reported via AREQ tickets provided that references are given, usually Wikidata or GeoNames. This worked for years but has some issues:
- Takes a fair amount of time to area editor to maintain it manually;
- Requested areas are not created immediately, thus are not immediately available to link to;
- Areas data is missing localized names that are added into references later on;
- Areas data becomes silently outdated, except when editor reports issues.
Integrate Internet Archive in MusicBrainz
The Internet Archive offers many resources that can mix very well with MusicBrainz.
- The Wayback Machine is able to take and render a snapshot of public webpages at any given time. It can be used for URL relationships added to the MusicBrainz database, and for references given in edit notes and annotations.
- The 78 RPMs and Cylinder Recordings is a collection of digitized recordings from physical releases of the early 20th century. Each recording comes with audio streaming, and metadata web service. It can be used to retrieve metadata automatically and to embed a player in MusicBrainz website. A lot of similar music collections are hosted by the Internet Archive.
The Internet Archive team specifically offered assistance with supporting such project.
Improve editing interface for event setlist on the MusicBrainz website
Since summer 2015, concerts can be added to MusicBrainz with their detailed setlist. However, this initial support requires editors to know yet another specific syntax for setlist without any preview. This is all but handy.
It is more generally related to the UX redesign of the event setlist editing UI, which is monitored from MBS-9533. Thus, it must follow our UX redesign process. Until now, there are two potential ways to improve the situation:
Possible extensions to this idea:
Add support for in-place localization of the MusicBrainz website
MusicBrainz website was originally published in English only. Since 2016, it is available in three more languages and beta now features six more half-completed translations. Technically, it is based on files in the GNU Gettext format which are updated from/to Transifex. However, this localization platform is not fully satisfactory regarding the context of messages, communication with translators, and some other things such as the review workflow and the glossary for example.
- Make the MusicBrainz Server work with a local instance of Pontoon
- Update the policy for localized messages containing links
- Deploy an instance of Pontoon at pontoon.metabrainz.org with project MusicBrainz
Possible extension to this idea is to do the same migration for other *Brainz projects.
Embed documentation into the MusicBrainz website
Most of user documentation for the MusicBrainz project is held on the MusicBrainz Wiki and made available to MusicBrainz Server through the WikiDocs transclusion mechanism. This has some drawbacks: relevant bits of documentation cannot be directly displayed within the MusicBrainz website, localization is not enabled and would use a distinct format from the rest of the MusicBrainz website, updating code and related documentation are two distinct processes.
At the latest MetaBrainz summit, we decided to improve the situation by embedding more documentation directly into the user interface, instead of current help links that redirect to static pages transcluded from the wiki. Most of consist into a descriptive paragraph followed by an descriptive enumeration of properties, for example Release Group. These property descriptions should be embedded directly into the website pages were this property is used. The full documentation page should be build by gathering these properties and their descriptions automatically. WikiDocs pages that cannot be generated should be moved as MarkDown files in the code repository.
- Embed user documentation bits into the MusicBrainz website
- Generate automatically full documentation pages for entity types and property types
- Move the rest of documentation pages as MarkDown files in the code repository
- Integrate both documentation bits/pages into the localization process
Improve the UX of voting for edits of the MusicBrainz database
Edits made to the MusicBrainz database are either automatically applied or can be voted, usually for 7 days, depending on edit type. Combined with the Subscription feature, this process allows for editors to review the edits made by other editors.
The current issue is that many editors don’t receive any vote for their own edits. There are several ways we can imagine to try to address this issue: redesign of edits pages, gamification of the voting system, gamification of subscriptions, more gain from subscriptions, round-robin notification for edits made by editors missing votes for a long time, and so on.
The goal is to select a few suggestions (given above or of your own) with the community and to implement it into the MusicBrainz Server. The main part of the implementation will be the user interface to be coded using React/JSX.
Proposed mentor: ruaok
Languages/skills: Perl and/or Python, SQL (Postgres)
Forum for discussion
We recently added event (read: concerts) support to MusicBrainz. Our main motivation was to add this feature for historical concerts, but it can also be used for future concerts. In the past the crowd-sourced concerts on last.fm were the best place to find concerts, but in the past few years last.fm has begun to fade from people's awareness. There is a possibility that MusicBrainz can take the former place of last.fm and become the best crowd source concert information site on the net. In order for this to happen, we would need to add a few more features to MusicBrainz:
- Social notifications: MB users should be able to post to Facebook/Twitter when they do plan to attend a concert.
- Other features: What features should we add to build a community around concert information curation?
These social features are important for building a community of users around concerts. The goal is to engage users to enter information about concerts and venues and then talk about upcoming concerts. The more people use MusicBrainz to talk about concerts publicly, more people will get drawn in to improve the concert listings in MusicBrainz.
Integrate more *Brainz in more *Brainz
Languages/skills: Perl and/or Python and/or Node.js, probably SQL (Postgres)
Forum for discussion
We have a bunch of different projects under the MetaBrainz umbrella by now, but they do not necessarily utilise each other to their fullest extent. MusicBrainz in particular is lacking utilisation of features/data from e.g., AcousticBrainz and ListenBrainz.
I don't have any specific things to do or not do with this, but a prospective student thinking about this should definitely approach us on IRC and talk with us about what they have in mind and if there's anything the community can think of.