Development/Summer of Code/2022/MusicBrainz

From MusicBrainz Wiki
< Development‎ | Summer of Code‎ | 2022
Revision as of 11:08, 28 February 2022 by YvanZo (talk | contribs) (→‎Ideas: add difficulty rating)
Jump to navigationJump to search
[*] Don't let the idea of writing Perl discourage you from checking out some of these projects! The MusicBrainz Server is written in readable, well-structured Perl, using the web application framework Catalyst. If you're comfortable in e.g. Python or Ruby web frameworks, then you'll probably be able to jump in and understand this codebase with only a little extra effort.

Ideas

MusicBrainz data visualization with React

Proposed mentors: bitmap, reosarevok, yvanzo
Languages/skills: JavaScript (React), data visualization
Forum for discussion
Estimated Project Length: 350 hours Difficulty: easy

MusicBrainz has a timeline about global statistics which is being rewritten to React and d3.js.

Moreover, timeline (or other kind of chart) would be a very helpful to visualize relationships for famous artists such as Mstislav Rostropovich who had a great career. For example, an history of band members would be nice to see when there are many changes over time, to better visualize which musicians played together. It can probably apply to famous works too, e.g. Over the Rainbow.

Note that data is already available to React components that render artist/work Relationship tabs, thus you won’t have to bother about the PostgreSQL/Perl data layer MusicBrainz is running.

This would lay the foundations of data visualization in the React-rendered website of MusicBrainz.

Create Rust binding for the MusicBrainz database

Proposed mentors: bitmap, yvanzo
Languages/skills: Rust
Forum for discussion
Estimated Project Length: 350 hours Difficulty: medium

The MusicBrainz database has an SQL schema defined under MusicBrainz Server’s admin/sql/ directory.

Rust binding would help Rustaceans with querying a local replication of the datbase, just like mbdata provides SQLAlchemy models for Python.

Ideally it would be generated from the SQL schema, for example using the crate sql_db_mapper, so that binding can just be regenerated when a schema change occurs; See blog posts about schema changes.

Possible extension to this work can be to make use of this binding for a server-sided task, and there are many!

Create Rust binding for the MusicBrainz XML schema

Proposed mentors: bitmap, yvanzo
Languages/skills: Rust
Forum for discussion
Estimated Project Length: 350 hours Difficulty: medium

The MusicBrainz web service returns MusicBrainz XML Meta Data that matches a Relax NG schema defined in mmd-schema repository.

This repository has an auto-generated Java binding in it, and mb-rngpy is an auto-generated Python binding.

Rust binding would help with writing program that must build valid data according to this schema.

Ideally it would be generated from the SQL schema for example using the crate xgen, so that binding can just be regenerated when a schema change occurs; See mmd-schema releases.

Automate areas management in MusicBrainz

Proposed mentors: yvanzo, bitmap, reosarevok
Languages/skills: SQL (Postgres), Python
Forum for discussion
Estimated Project Length: 350 hours Difficulty: hard

Areas (such as cities, regions and countries) are used in MusicBrainz to indicate the location of concert halls and recording studios, the place of birth of artists, and so on. But the goal of MusicBrainz is curating music metadata, not geographical metadata, and thus we should rely on an external database instead.

Originally areas were automatically added from Wikidata with our old Perl bot, but that was dropped when some editors started making bad edits on Wikidata to ensure some specific area was added by the bot.

Currently, area editing is mostly reserved to editor dr_saunders who voluntarily addresses issues reported via AREQ tickets provided that references are given, usually Wikidata or GeoNames. This worked for years but has some issues:

  • Takes a fair amount of time for the area editor to maintain it manually;
  • Requested areas are not created immediately, thus are not immediately available to link to, causing delays;
  • Area data is missing localized names that are added into references later on;
  • Area data becomes silently outdated, except when an editor reports issues to be fixed by hand.

Nowadays Wikidata has much stronger anti-vandalism tools and we have the ability to report, admonish and temporarily ban any users we find trying to game the system, so we can probably go back to an automatic system using Wikidata. The old Perl bot is complicated and mostly abandoned, so ideally this would be done via the somewhat more active Python bot. This has the benefit of being able to use existing Python libraries for dealing with the Wikidata side of the task as well.

The first and main task for a student who picks this should be to look through the Python bot, add an "add_area" function to it, and find and import relevant areas in Wikidata that are still missing in MusicBrainz (with their Wikidata and Geonames links, and marked as a part of the appropriate area already in MusicBrainz). Once this is working, the bot should also add missing aliases in other languages to the areas and keep them updated by regularly checking that they haven't been removed from Wikidata (to avoid keeping old or incorrect data).

Integrate Internet Archive in MusicBrainz

Proposed mentors: yvanzo, bitmap
Languages/skills: RabbitMQ, JavaScript (React), Perl (Catalyst) [*]
Forum for discussion
Estimated Project Length: 350 hours Difficulty: medium

The Internet Archive offers many resources that can mix very well with MusicBrainz.

  • The Wayback Machine is able to take and render a snapshot of public webpages at any given time. It can be used for URL relationships added to the MusicBrainz database, and for references given in edit notes and annotations.
  • Full DB dumps could be created and automatically uploaded to the collection of MusicBrainz Data Dumps for the IA to process them directly.
  • Periodically re-archiving some links (mainly the ones where artists can uncontrollably change stuff without versioning, i.e. bandcamp, spotify, itunes, ...) would be pretty cool too.
  • The 78 RPMs and Cylinder Recordings is a collection of digitized recordings from physical releases of the early 20th century. Each recording comes with audio streaming, and metadata web service. It can be used to retrieve metadata automatically and to embed a player in MusicBrainz website. A lot of similar music collections are hosted by the Internet Archive.

The Internet Archive team specifically offered assistance with supporting such project.

Improve editing interface for event setlist on the MusicBrainz website

Proposed mentors: yvanzo, bitmap, reosarevok
Languages/skills: JavaScript (React), Perl (Catalyst) [*]
Forum for discussion
Estimated Project Length: 350 hours Difficulty: medium

Since the summer of 2015, concerts can be added to MusicBrainz, including a detailed setlist. However, the implementation requires editors to know a very specific syntax for setlists, and doesn't even provide a preview option to make sure they're doing it right. This causes a lot of problems. In general, having to put the setlist together by hand is fairly user-unfriendly.

The task here is to build a new editing interface, ideally similar to the Tracklist page of the release editor, that allows users to add the information through a form that doesn't require them to learn any syntax. This will also allow us to potentially change the way we store the data in the background later on, without actually requiring big differences in the way the user-facing form works.

Add a basic in-site messaging/notification system

Proposed mentors: yvanzo, bitmap, reosarevok
Languages/skills: JavaScript (React), Perl (Catalyst) [*], SQL (Postgres)
Forum for discussion
Estimated Project Length: 350 hours Difficulty: hard

One of the most requested features in the history of MusicBrainz is to have a way to communicate with other editors without sending an email, and to receive notifications when people comment on your edits without getting your email inbox flooded (unless you choose to).

The task here is to build a basic notification and messaging system. This has three required components: 1) Implement a backend to store the messages (a PostgreSQL table) and a basic way to send and reply to messages. 2) Implement a way to receive the equivalent of the notifications currently sent via email via the messaging system (the most basic option would probably be having it as "messages" from a system user with replying disabled). 3) Implement a notification icon and counter (probably somewhere in the top menu) to let users know they have new messages or notifications.

Implement a way to thank users for their edits and edit notes

Proposed mentors: yvanzo, bitmap, reosarevok
Languages/skills: JavaScript (React), Perl (Catalyst) [*], SQL (Postgres)
Forum for discussion
Estimated Project Length: 350 hours Difficulty: easy

Editing MusicBrainz can sometimes feel pretty lonely, even though there's a relatively large community. A lot of that is because the only reason people generally interact with each other is to notify them when something in their edits is incorrect; as such, the more experienced you are, the better your edits and the less you end up getting in contact with other users!

The task here is to put together a system in which users can thank each other for a particularly good or useful edit ("You fixed my bad addition!" / "I was going to have to spend a lot of time adding this CD I bought but you already did it yesterday!"), and also for a particularly helpful edit note ("Thanks for helping me as a new user" / "Thanks for keeping a cool head and giving a helpful answer in this potential argument"). The number of users who gave thanks for a particular edit or edit note could be shown to everyone, but who specifically did it should probably only be shown to the author of the edit or note in question.

Improve the UX of voting for edits of the MusicBrainz database

Proposed mentors: yvanzo, bitmap
Languages/skills: JavaScript (React), Perl (Catalyst) [*], SQL (Postgres)
Forum for discussion
Estimated Project Length: 350 hours Difficulty: medium

Edits made to the MusicBrainz database are either automatically applied or can be voted, usually for 7 days, depending on edit type. Combined with the Subscription feature, this process allows for editors to review the edits made by other editors.

The current issue is that many editors don’t receive any vote for their own edits. There are several ways we can imagine to try to address this issue: redesign of edits pages, gamification of the voting system, gamification of subscriptions, more gain from subscriptions, round-robin notification for edits made by editors missing votes for a long time, and so on.

The goal is to select a few suggestions (given above or of your own) with the community and to implement it into the MusicBrainz Server. The main part of the implementation will be the user interface to be coded using React/JSX.

Integrate more *Brainz in more *Brainz

Languages/skills: Perl and/or Python and/or Node.js, probably SQL (Postgres)
Forum for discussion
Estimated Project Length: 350 hours Difficulty: medium

We have a bunch of different projects under the MetaBrainz umbrella by now, but they do not necessarily utilise each other to their fullest extent. MusicBrainz in particular is lacking utilisation of features/data from e.g., AcousticBrainz and ListenBrainz.

I don't have any specific things to do or not do with this, but a prospective student thinking about this should definitely approach us on IRC and talk with us about what they have in mind and if there's anything the community can think of.