Development/Summer of Code/2022/MusicBrainz

From MusicBrainz Wiki
< Development‎ | Summer of Code‎ | 2022
Revision as of 18:34, 7 February 2022 by AmCap1712 (talk | contribs) (Created page with " <div id="Perl">[*] Don't let the idea of writing Perl discourage you from checking out some of these projects! The MusicBrainz Server is writ...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigationJump to search
[*] Don't let the idea of writing Perl discourage you from checking out some of these projects! The MusicBrainz Server is written in readable, well-structured Perl, using the web application framework Catalyst. If you're comfortable in e.g. Python or Ruby web frameworks, then you'll probably be able to jump in and understand this codebase with only a little extra effort.


MusicBrainz data visualization with React

Proposed mentors: bitmap, reosarevok, yvanzo
Languages/skills: JavaScript (React), data visualization
Forum for discussion

MusicBrainz has a timeline about global statistics which is being rewritten to React and d3.js.

Moreover, timeline (or other kind of chart) would be a very helpful to visualize relationships for famous artists such as Mstislav Rostropovich who had a great career. For example, an history of band members would be nice to see when there are many changes over time, to better visualize which musicians played together. It can probably apply to famous works too, e.g. Over the Rainbow.

Note that data is already available to React components that render artist/work Relationship tabs, thus you won’t have to bother about the PostgreSQL/Perl data layer MusicBrainz is running.

This would lay the foundations of data visualization in the React-rendered website of MusicBrainz.

Push the URL relationship editor to the next level

Proposed mentors: reosarevok, yvanzo
Languages/skills: JavaScript (React)
Forum for discussion

URLs can be added to the MusicBrainz database to link entities related to external websites: artists’ official homepages, other databases, lyrics websites, streaming websites, online stores, and more. Relationships define which of these roles URLs take for related entities in MusicBrainz. It can be defined with a date period when it is known, or just set as ended when it is no longer current. External links are mostly handled by URLCleanup.js which comes with a good coverage by unit tests.

The current implementation has been reworked in 2016 and needs to be reworked again to overcome some limitations:

  • MBS-9902: Only one relationship type can be associated to an entered URL;
Allowing for a set of relationship types would help with handling links to websites which offer different ways to get the music: free/paid download/mail-order/streaming
  • As a consequence of the above, the relationship type selector offers either only one relationship type or all relationship types, even invalid ones;
The selector should rather restrict choices to the subset of valid relationship types when more than one relationship type is allowed.
  • Another consequence of the above is that the relationship type selector allows for selecting at most one relationship type at a time;
The selector should allow selecting multiple relationship types at a time, for example if a link is for both download and mail-order.
  • MBS-3774: Date period and ended flag can be set from the URL editing page only;
Allowing to set these relationship attributes from the editing page for linked entity too would prevent a lot of URL deletions.
  • MBS-11391: Automatic URL cleanup overwrites the originally entered URL;
Having a separate field for the clean URL would help editors with understanding changes being made.
Allowing to bypass rules for external links would help editors with working around bugs and changes from other websites that have not been taken into account yet.

Possible extensions to this work can be either:

  • Support querying external link to get updated data (e.g. getting wikidata item from wikipedia link)
  • Support replacing smart links with multiple destination links


  • MBS-9778: Make external links more intuitive and accessible
  • Support caching icons for external links from DDG favicon service

Complete Rust binding for the MusicBrainz API

Proposed mentors: bitmap, okno, yvanzo
Languages/skills: Rust
Forum for discussion

A Rust binding to the MusicBrainz API is being developed by okno; See the crate musicbrainz_rs. However, it is not complete yet and would also need to be updated.

The main missing features are MusicBrainz_API/Search and Cover_Art_Archive/API.

Full-feature binding would make easier and more reliable to query the MusicBrainz API in Rust.

Possible extension to this work can be to make use of this binding for a client application, it’s up to your creativity!

Create Rust binding for the MusicBrainz database

Proposed mentors: bitmap, okno, yvanzo
Languages/skills: Rust
Forum for discussion

The MusicBrainz database has an SQL schema defined under MusicBrainz Server’s admin/sql/ directory.

Rust binding would help Rustaceans with querying a local replication of the datbase, just like mbdata provides SQLAlchemy models for Python.

Ideally it would be generated from the SQL schema, for example using the crate sql_db_mapper, so that binding can just be regenerated when a schema change occurs; See blog posts about schema changes.

Possible extension to this work can be to make use of this binding for a server-sided task, and there are many!

Create Rust binding for the MusicBrainz XML schema

Proposed mentors: bitmap, okno, yvanzo
Languages/skills: Rust
Forum for discussion

The MusicBrainz web service returns MusicBrainz XML Meta Data that matches a Relax NG schema defined in mmd-schema repository.

This repository has an auto-generated Java binding in it, and mb-rngpy is an auto-generated Python binding.

Rust binding would help with writing program that must build valid data according to this schema.

Ideally it would be generated from the SQL schema for example using the crate xgen, so that binding can just be regenerated when a schema change occurs; See mmd-schema releases.

Automate areas management in MusicBrainz

Proposed mentors: yvanzo, bitmap, reosarevok
Languages/skills: SQL (Postgres), Python
Forum for discussion

Areas (such as cities, regions and countries) are used in MusicBrainz to indicate the location of concert halls and recording studios, the place of birth of artists, and so on. But the goal of MusicBrainz is curating music metadata, not geographical metadata, and thus we should rely on an external database instead.

Originally areas were automatically added from Wikidata with our old Perl bot, but that was dropped when some editors started making bad edits on Wikidata to ensure some specific area was added by the bot.

Currently, area editing is mostly reserved to editor dr_saunders who voluntarily addresses issues reported via AREQ tickets provided that references are given, usually Wikidata or GeoNames. This worked for years but has some issues:

  • Takes a fair amount of time for the area editor to maintain it manually;
  • Requested areas are not created immediately, thus are not immediately available to link to, causing delays;
  • Area data is missing localized names that are added into references later on;
  • Area data becomes silently outdated, except when an editor reports issues to be fixed by hand.

Nowadays Wikidata has much stronger anti-vandalism tools and we have the ability to report, admonish and temporarily ban any users we find trying to game the system, so we can probably go back to an automatic system using Wikidata. The old Perl bot is complicated and mostly abandoned, so ideally this would be done via the somewhat more active Python bot. This has the benefit of being able to use existing Python libraries for dealing with the Wikidata side of the task as well.

The first and main task for a student who picks this should be to look through the Python bot, add an "add_area" function to it, and find and import relevant areas in Wikidata that are still missing in MusicBrainz (with their Wikidata and Geonames links, and marked as a part of the appropriate area already in MusicBrainz). Once this is working, the bot should also add missing aliases in other languages to the areas and keep them updated by regularly checking that they haven't been removed from Wikidata (to avoid keeping old or incorrect data).

Integrate Internet Archive in MusicBrainz

Proposed mentors: yvanzo, bitmap
Languages/skills: RabbitMQ, JavaScript (React), Perl (Catalyst) [*]
Forum for discussion

The Internet Archive offers many resources that can mix very well with MusicBrainz.

  • The Wayback Machine is able to take and render a snapshot of public webpages at any given time. It can be used for URL relationships added to the MusicBrainz database, and for references given in edit notes and annotations.
  • Full DB dumps could be created and automatically uploaded to the collection of MusicBrainz Data Dumps for the IA to process them directly.
  • Periodically re-archiving some links (mainly the ones where artists can uncontrollably change stuff without versioning, i.e. bandcamp, spotify, itunes, ...) would be pretty cool too.
  • The 78 RPMs and Cylinder Recordings is a collection of digitized recordings from physical releases of the early 20th century. Each recording comes with audio streaming, and metadata web service. It can be used to retrieve metadata automatically and to embed a player in MusicBrainz website. A lot of similar music collections are hosted by the Internet Archive.

The Internet Archive team specifically offered assistance with supporting such project.

Improve editing interface for event setlist on the MusicBrainz website

Proposed mentors: yvanzo, bitmap, reosarevok
Languages/skills: JavaScript (React), Perl (Catalyst) [*]
Forum for discussion

Since the summer of 2015, concerts can be added to MusicBrainz, including a detailed setlist. However, the implementation requires editors to know a very specific syntax for setlists, and doesn't even provide a preview option to make sure they're doing it right. This causes a lot of problems. In general, having to put the setlist together by hand is fairly user-unfriendly.

The task here is to build a new editing interface, ideally similar to the Tracklist page of the release editor, that allows users to add the information through a form that doesn't require them to learn any syntax. This will also allow us to potentially change the way we store the data in the background later on, without actually requiring big differences in the way the user-facing form works.

Add a basic in-site messaging/notification system

Proposed mentors: yvanzo, bitmap, reosarevok
Languages/skills: JavaScript (React), Perl (Catalyst) [*], SQL (Postgres)
Forum for discussion

One of the most requested features in the history of MusicBrainz is to have a way to communicate with other editors without sending an email, and to receive notifications when people comment on your edits without getting your email inbox flooded (unless you choose to).

The task here is to build a basic notification and messaging system. This has three required components: 1) Implement a backend to store the messages (a PostgreSQL table) and a basic way to send and reply to messages. 2) Implement a way to receive the equivalent of the notifications currently sent via email via the messaging system (the most basic option would probably be having it as "messages" from a system user with replying disabled). 3) Implement a notification icon and counter (probably somewhere in the top menu) to let users know they have new messages or notifications.

Implement a way to thank users for their edits and edit notes

Proposed mentors: yvanzo, bitmap, reosarevok
Languages/skills: JavaScript (React), Perl (Catalyst) [*], SQL (Postgres)
Forum for discussion

Editing MusicBrainz can sometimes feel pretty lonely, even though there's a relatively large community. A lot of that is because the only reason people generally interact with each other is to notify them when something in their edits is incorrect; as such, the more experienced you are, the better your edits and the less you end up getting in contact with other users!

The task here is to put together a system in which users can thank each other for a particularly good or useful edit ("You fixed my bad addition!" / "I was going to have to spend a lot of time adding this CD I bought but you already did it yesterday!"), and also for a particularly helpful edit note ("Thanks for helping me as a new user" / "Thanks for keeping a cool head and giving a helpful answer in this potential argument"). The number of users who gave thanks for a particular edit or edit note could be shown to everyone, but who specifically did it should probably only be shown to the author of the edit or note in question.

Improve the UX of voting for edits of the MusicBrainz database

Proposed mentors: yvanzo, bitmap
Languages/skills: JavaScript (React), Perl (Catalyst) [*], SQL (Postgres)
Forum for discussion

Edits made to the MusicBrainz database are either automatically applied or can be voted, usually for 7 days, depending on edit type. Combined with the Subscription feature, this process allows for editors to review the edits made by other editors.

The current issue is that many editors don’t receive any vote for their own edits. There are several ways we can imagine to try to address this issue: redesign of edits pages, gamification of the voting system, gamification of subscriptions, more gain from subscriptions, round-robin notification for edits made by editors missing votes for a long time, and so on.

The goal is to select a few suggestions (given above or of your own) with the community and to implement it into the MusicBrainz Server. The main part of the implementation will be the user interface to be coded using React/JSX.

Integrate more *Brainz in more *Brainz

Languages/skills: Perl and/or Python and/or Node.js, probably SQL (Postgres)
Forum for discussion

We have a bunch of different projects under the MetaBrainz umbrella by now, but they do not necessarily utilise each other to their fullest extent. MusicBrainz in particular is lacking utilisation of features/data from e.g., AcousticBrainz and ListenBrainz.

I don't have any specific things to do or not do with this, but a prospective student thinking about this should definitely approach us on IRC and talk with us about what they have in mind and if there's anything the community can think of.