Development/Summer of Code/2024/MusicBrainz

From MusicBrainz Wiki
Jump to navigationJump to search

MusicBrainz is a community-maintained open source music encyclopaedia that collects music metadata and makes it available to the public. Try it out.

Getting Started

(see also: Getting started with GSoC)


Metadata recognition from cover art

Proposed mentors: bitmap, reosarevok, yvanzo
Languages/skills: React.js, WebAssembly
Forum for discussion
Estimated Project Length: 175 hours (or 350 hours if machine learning)
Difficulty: medium (or hard if machine learning)

MusicBrainz gathers metadata about releases and their cover art through the Cover Art Archive. Very often editors have to type the data contained in the cover art images. A drastic boost for them would be to programmatically parse these images to extract as much metadata as possible: free text, title, artist credit, label code, barcode, tracklist…

The optical character recognition engine Tesseract can be used through either Naptha’s port in JavaScript Tesseract.js or Knight’s build in WebAssembly tesseract-wasm. In either case, the web user interface has to be written in React.js to allow a future integration to the website.

Tesseract has a lot of parameters that allow tuning it for specific usage, or focusing on some selected areas. However the main part of the project might be to turn its output into something useful. The parsing/mapping can potentially be achieved through machine learning but that would likely double the project length.

MJML-based email renderer

Proposed mentors: bitmap, reosarevok, yvanzo
Languages/skills: React.js, Rust
Forum for discussion
Estimated Project Length: 175 hours
Difficulty: medium

MusicBrainz Server can send emails to users in different occasions: email verification, edit notes, subscription edits, autoeditor election… So far these emails are generated in text only format using Template Toolkit which has had its day.

A modern replacement can be MJML through either the React.js wrapper mjml-react or the Rust reimplementation MRML. The first option would allow to reuse some components from the website frontend which has been mostly converted to React (jira:MBS-8609), while the second option would be blazing fast (170× faster than the original Node.js implementation) but would require much work for components.

However a multipart email (text/HTML + text/plain) is still wanted here, while MJML is focused on generating HTML only. So the plain text alternative part should probably be generated using a tool like html2text.

Create Rust binding for the MusicBrainz database

Proposed mentors: bitmap, yvanzo
Languages/skills: Rust
Forum for discussion
Estimated Project Length: 350 hours
Difficulty: medium

The MusicBrainz database has an SQL schema defined under MusicBrainz Server’s admin/sql/ directory.

Rust binding would help Rustaceans with querying a local replication of the database, just like mbdata provides SQLAlchemy models for Python.

Ideally it would be generated from the SQL schema, for example using the crate sql-gen which needs some improvements to fully carry on our schema, so that binding can just be regenerated when a schema change occurs; See blog posts about schema changes.

The code should obviously be backed with continuous integration tests, and developer documentation.

Additionally, an application should be created to make use of it. It can be an extensible Musicbrainz editing bot, or a server-sided task (generating data reports…). But many other applications are possible. Propositions by the candidate will be happily considered.

New display of external links

Proposed mentors: bitmap, reosarevok, yvanzo
Languages/skills: React.js
Forum for discussion
Estimated Project Length: 90 hours
Difficulty: easy

MusicBrainz database stores external links related to entities (artists, labels…) as URL relationships. Only some of these links are displayed in the sidebar of the entity page. The other links are visible from the “Relationships” tab for this entity.

The main idea is to have an independent page focused on displaying these links, with a totally different layout, as shown in the mockups for jira:MBS-13444. This new page should be implemented using TanStack Table (was react-table) to allow grouping and sorting the URL relationships either by URL or by relationship type. Its display has to be mobile-ready, localizable, and accessible, and has to link to the related pages in the *Brainz projects. Relationship type attributes (such as video) should be taken into account, ended relationships should be optionally available through the Wayback Machine. A minified version of its content should be made to replace the existing section for external links in the sidebar. A third and last version should be made for use in the Relationships tab.