Development/Summer of Code/2023/ListenBrainz

From MusicBrainz Wiki
< Development‎ | Summer of Code‎ | 2023
Revision as of 11:18, 17 January 2023 by Alastairp (talk | contribs) (2023 page!)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigationJump to search

ListenBrainz allows users to store a list of songs that they listen to and get personalised recommendations. Read more information on its homepage.

Getting started

(see also: Getting started with GSoC)

If you want to work on ListenBrainz you should show that you are able to set up the server software and understand how some of the infrastructure works. Here are some things that we might ask you about

  • Show that you understand the goals that ListenBrainz wants to achieve, which are written on its homepage
  • Create an oauth application on the MusicBrainz website and add the configuration information to your ListenBrainz server. Use this to log in to your server with your MusicBrainz details
  • Use the import script that is part of the ListenBrainz server to load scrobbles from last.fm to your ListenBrainz server, or the main ListenBrainz server
  • Use your preferred programming language to write a submission tool that can send Listen data to ListenBrainz. You could make up some fake data for song names and artists. This data doesn't have to be real.
  • Try and delete the ListenBrainz database on your local server to remove the fake data that you added.
  • Look at the list of tickets that we have open for ListenBrainz and see if you understand what tasks the tickets involve
  • If you want to, see if you can contribute to fixing a ticket. Either add a comment to the ticket or ask in IRC for clarification if you don't understand what the ticket means

We're adding a number of new social features to ListenBrainz that we hope will enable people discover more music they like and users who have similar music tastes to their own. We're working on some of these features now, but we will need to get help for other features:

Create 'More Tracks Like This' music recommendation plugin for the Troi toolkit

Proposed mentors: mayhem
Languages/skills: Python, possibly Postgres.
Estimated Project Length: Likely 175 hours unless we decide to expand the scope of the project.
Expected outcomes: One or more finished, debugged and tested plugins for Troi.

Our troi recommendation toolkit is our playground for developing recommendation algorithms. The toolkit already knows how to fetch data from ListenBrainz for stats, collaborative filtered recommended tracks, similar artists and similar recordings. From MusicBrainz it can fetch needed metadata such as genres and tags. The goal of this project is to take in one of more seed song MBIDs and then use the above listed data sets to attempt to find recordings that are similar enough in order to make a playlist of tracks that have a similar sound and feel to the given seed tracks.

This project could be a little tricky -- the quality of the playlists generated by this project depend very heavily on the quality of the datasets that feed into it. In particular, the artist and recording similarity data sets will play a very important role, but these datasets may not be up to the needed standards to create good playlists. However, this does not invalidate this project nor would it cause us to fail the student -- it is understood that the output of this project will improve as the underlying data improves.

Create a 'Release Radar' plugin for the Troi toolkit

Proposed mentors: mayhem
Languages/skills: Python, postgres possibly
Estimated Project Length: 175 hours
Expected outcomes: One or more finished, debugged and tested plugins for Troi.

Our troi recommendation toolkit is our playground for developing recommendation algorithms. The toolkit already knows how to fetch data from ListenBrainz for stats, collaborative filtered recommended tracks, similar artists and similar recordings. From MusicBrainz it can fetch needed metadata such as genres and tags. This project should generate a playlist every Friday that is a collection of selected tracks that have been recently released (last 2 weeks or so) by artists that are in a given users top artists list. We will have an API endpoint that will list recent releases for a given user, which will be implemented by a MetaBrainz team member, and your Troi plugin should select tracks from these releases and make an exploration playlist from these tracks.

However, some care must be taken to not select ALL the tracks from a new release, but instead to pick some tracks that we think might be interesting to the user. How would you do this? This question is hard to answer on your own -- you will be required to engage with the ListenBrainz team in IRC to discuss this feature in detail before you make your proposal. Any proposal that does not engage the community to design this feature will not be considered for acceptance, due to the nature of this project.

Integrate more music services for recording listens and playing music

Proposed mentors: lucifer
Languages/skills: Python/Flask, Typescript/React
Estimated Project Length: Can be 175 or 350 hours depending on the integration/service chosen.
Difficulty: Easy
Expected Outcomes: A new music service integration for users to play and record listens on ListenBrainz.

LB has a number of music discovery features that use BrainzPlayer to facilitate track playback. BrainzPlayer (BP) is a custom React component in LB that uses multiple data sources to search and play a track. As of now, it supports Spotify, Youtube and Soundcloud as a backend. LB also supports linking a Spotify account to record listening history. Currently, we are reworking the integration of external music service in LB to make adding other music services easier. We have looked into some other services and found that Deezer and Apple Music also provide the music playback and recording listening history capability. Integrating these services into LB would make for a good SoC project.

Create a Spotify metadata cache

Proposed mentors: mayhem
Languages/skills: Python, Postgres.
Estimated Project Length: 350 hours
Expected outcomes: Debugged and tested code that loads and maintains this cache.

The BrainzPlayer, our cross-service embedded web music player, supports playing from Spotify. However for most of the tracks that are played via this player, we query the Spotify Metadata API to find appropriate tracks to play. This process is less than ideal, since the logic for resolving which tracks to play resides in the player. It would be much better if this data resided on the server, in the form of a cache of the Spotify metadata, which would allows us to resolve the tracks on the server when we load a BrainzPlayer page.

This metadata cache comprises of a new set of postgres tables in a new schema for the data and a process that runs continuously and listens to a RabbitMQ queue for new Spotify artists to cache. When a new artist ID is received, the process should fetch all of the releases for this artist and save it to postgres. Periodically the cache should also check to see if any records have expired or have been marked as dirty and for those records re-fetch the data and update the expiration timestamps.

We have a slightly more detailed write-up of this project -- please come to IRC and ask us for a link to the document that describes this if you are interesting in working on this project.

If this project does not take the full 350 hours, we can start to build the lookup portion of this project as well, where given an Artist and Recording name, find the best track in Spotify to play.

Coalesce feature in ListenBrainz

Proposed mentors: akshaaatt, monkey, lucifer, mayhem
Languages/skills: Python, React, Postgres, Troi.
Estimated Project Length: 175/350 hours
Difficulty: Medium
Expected outcomes: A finished feature ready to be merged into production code.

Our troi recommendation toolkit is our playground for developing recommendation algorithms. The toolkit already knows how to fetch data from ListenBrainz for stats, collaborative filtered recommended tracks, similar artists and similar recordings. From MusicBrainz it can fetch needed metadata such as genres and tags. The goal of this project is to create a plugin to generate playlists based on the listening habits of two or more users (similar to the Spotify blend feature).

Unlike other Troi projects, this project involves a fair deal of frontend and backend work in the ListenBrainz server as well. A UI will be needed to allow multiple users to consent to creating a playlist with their combined interests. The UI should allow users to optionally configure different parameters for creating such a playlist. New database tables and API endpoints will be needed on the backend side to store these parameters and requests to generate combined playlists.

To connect the Troi and the ListenBrainz server, a background running process could read the database and invoke Troi patches accordingly. Finally, the actual Troi patch to generate the playlist needs to be written.

There are a lot of parts in this project so it is fine if the contributor only wants to do some of those. These details should be discussed with the mentors beforehand so that the appropriate project schedule and length can be worked out.