Development/Summer of Code/2024/MusicBrainz: Difference between revisions

From MusicBrainz Wiki
Jump to navigationJump to search
(Drop the unneeded disclaimer about Perl since no idea is requiring it)
(Added an idea for review (LinkBrainz/ShareBrainz))
Line 42: Line 42:
However a multipart email (text/HTML + text/plain) is still wanted here, while MJML is focused on generating HTML only.
However a multipart email (text/HTML + text/plain) is still wanted here, while MJML is focused on generating HTML only.
So the plain text alternative part should probably be generated using a tool like [https://crates.io/crates/html2text html2text].
So the plain text alternative part should probably be generated using a tool like [https://crates.io/crates/html2text html2text].


'''LinkBrainz/ShareBrainz'''

An idea for the mentors to review for inclusion -

I think this might be a candidate for a enjoyable and less complex/entangled than usual project: https://tickets.metabrainz.org/browse/MBS-13444

(I have also added this to the LB 2024 ideas list) - ''aerozol''

Revision as of 01:09, 11 January 2024

MusicBrainz is a community-maintained open source music encyclopaedia that collects music metadata and makes it available to the public. Try it out.

Getting Started

(see also: Getting started with GSoC)

Ideas

Metadata recognition from cover art

Proposed mentors: bitmap, reosarevok, yvanzo
Languages/skills: React.js, WebAssembly
Forum for discussion
Estimated Project Length: 175 hours (or 350 hours if machine learning)
Difficulty: medium (or hard if machine learning)

MusicBrainz gathers metadata about releases and their cover art through the Cover Art Archive. Very often editors have to type the data contained in the cover art images. A drastic boost for them would be to programmatically parse these images to extract as much metadata as possible: free text, title, artist credit, label code, barcode, tracklist…

The optical character recognition engine Tesseract can be used through either Naptha’s port in JavaScript Tesseract.js or Knight’s build in WebAssembly tesseract-wasm. In either case, the web user interface has to be written in React.js to allow a future integration to the website.

Tesseract has a lot of parameters that allow tuning it for specific usage, or focusing on some selected areas. However the main part of the project might be to turn its output into something useful. The parsing/mapping can potentially be achieved through machine learning but that would likely double the project length.

MJML-based email renderer

Proposed mentors: bitmap, reosarevok, yvanzo
Languages/skills: React.js, Rust
Forum for discussion
Estimated Project Length: 175 hours
Difficulty: medium

MusicBrainz Server can send emails to users in different occasions: email verification, edit notes, subscription edits, autoeditor election… So far these emails are generated in text only format using Template Toolkit which has had its day.

A modern replacement can be MJML through either the React.js wrapper mjml-react or the Rust reimplementation MRML. The first option would allow to reuse some components from the website frontend which has been mostly converted to React (jira:MBS-8609), while the second option would be blazing fast (170× faster than the original Node.js implementation) but would require much work for components.

However a multipart email (text/HTML + text/plain) is still wanted here, while MJML is focused on generating HTML only. So the plain text alternative part should probably be generated using a tool like html2text.


LinkBrainz/ShareBrainz

An idea for the mentors to review for inclusion -

I think this might be a candidate for a enjoyable and less complex/entangled than usual project: https://tickets.metabrainz.org/browse/MBS-13444

(I have also added this to the LB 2024 ideas list) - aerozol