Difference between revisions of "Development/Summer of Code/2021/ListenBrainz"

From MusicBrainz Wiki
Jump to navigationJump to search
(Created page with "ListenBrainz is one of the newest MetaBrainz projects. Read more information on [https://listenbrainz.org its homepage]. == Getting started == (see also: Development/Summer...")
 
Line 14: Line 14:


==Ideas==
==Ideas==

Projects for March hack week:
* Use follow/following feature
* Timeline
* Compatible users design and early version hacking.

Possible summer projects:
* Tweet length reviews (backed by CB)
* Pin my jam
* Implement a troi plugin
* Import Genres to MB from curated sources


=== Create a high performance listen ingester ===
=== Create a high performance listen ingester ===

Revision as of 11:36, 4 February 2021

ListenBrainz is one of the newest MetaBrainz projects. Read more information on its homepage.

Getting started

(see also: Getting started with GSoC)

If you want to work on ListenBrainz you should show that you are able to set up the server software and understand how some of the infrastructure works. Here are some things that we might ask you about

  • Show that you understand the goals that ListenBrainz wants to achieve, which are written on its homepage
  • Create an oauth application on the MusicBrainz website and add the configuration information to your ListenBrainz server. Use this to log in to your server with your MusicBrainz details
  • Use the import script that is part of the ListenBrainz server to load scrobbles from last.fm to your ListenBrainz server, or the main ListenBrainz server
  • Use your preferred programming language to write a submission tool that can send Listen data to ListenBrainz. You could make up some fake data for song names and artists. This data doesn't have to be real.
  • Try and delete the ListenBrainz database on your local server to remove the fake data that you added.
  • Look at the list of tickets that we have open for ListenBrainz and see if you understand what tasks the tickets involve
  • If you want to, see if you can contribute to fixing a ticket. Either add a comment to the ticket or ask in IRC for clarification if you don't understand what the ticket means

Ideas

Projects for March hack week:

  • Use follow/following feature
  • Timeline
  • Compatible users design and early version hacking.

Possible summer projects:

  • Tweet length reviews (backed by CB)
  • Pin my jam
  • Implement a troi plugin
  • Import Genres to MB from curated sources

Create a high performance listen ingester

Proposed mentors:mayhem, iliekcomputers
Languages/skills: Rust or Go, Python, Protobuf, RabbitMQ, Timescale DB

ListenBrainz currently processes incoming listens using pure python and this processing a listen requires parsing JSON, data validation, and re-serializing JSON and sending it to the database component for deduplication and writing to our datastore. The current process takes up too many resources and simply isn't very scalable; also the code isn't perfectly laid out causing us to serialize and deserialize each listen more than once.

For this summer of code project we would like a student to implement a single API endpoint (submit listen) and to port our existing ingestion pipeline to use Protocol Buffers. The new ingester should parse the JSON, validate the data, handle and report errors in exactly the same manner that is currently in use in our production system. Furthermore, the incoming listen pipeline should be converted to use the new protobuf based format for internal communication in order to make the new ingester as performant as possible.

This will require the creation of a very small Go/Rust server that handles the submit listens endpoint (ingester) and a tool that will read incoming listens from the RabbitMQ queue, write them to the Timescale DB and then pass on the unique listens down another RabbbitMQ pipeline (was influx_writer, will soon be timescale_writer).

At this point we haven't quite settled on Rust or Go for this project. Do you have a feeling for that?

Relevant links: