Development/Summer of Code/2016

From MusicBrainz Wiki
< Development‎ | Summer of Code
Revision as of 20:17, 15 February 2016 by LordSputnik (talk | contribs) (Added BB import proposal)


This year Robert Kaye, Michael Wiencek and Alastair Porter will probably be amongst our mentors. That's ruaok (Robert), alastairp (Alastair Porter) and bitmap (Michael) on IRC, if you want to come and speak to us first. Some potential mentors are listed by each project; this is far from a normative list, but it might give you somebody to ask about the project.


This is our set of starting ideas for 2016. Add more ideas if you have them!


Proposed mentor: ruaok or alastairp
Languages/skills: Python, Postgres, Flask

AcousticBrainz is our new project that aims to crowdsource acoustic information for all music in the world and to make it available to the public. We already have low-level information about more than a million tracks. What we need is a good way for users and developers to interact with all this data and help improve algorithms that are used to analyze it.

It would suit someone with experience or an interest in machine learning algorithms, though the majority of the project will probably involve creating instructure around our existing algorithms.

Ideas for this project are described on a separate page: AcousticBrainz/Ideas.

You can read more information about AcousticBrainz and some of the existing models that we have created on our blog.

Add social features to MusicBrainz

Proposed mentor: ruaok
Languages/skills: Perl and/or Python, Postgres

We recently added event (read: concerts) support to MusicBrainz. Our main motivation was to add this feature for historical concerts, but it can also be used for future concerts. In the past the crowd-sourced concerts on were the best place to find concerts, but in the past few years has begun to fade from people's awareness. There is a possibility that MusicBrainz can take the former place of and become the best crowd source concert information site on the net. In order for this to happen, we would need to add a few more features to MusicBrainz:

  • Social notifications: MB users should be able to post to Facebook/Twitter when they do plan to attend a concert.
  • Other features: What features should we add to build a community around concert information curation?

These social features are important for building a community of users around concerts. The goal is to engage users to enter information about concerts and venues and then talk about upcoming concerts. The more people use MusicBrainz to talk about concerts publicly, more people will get drawn in to improve the concert listings in MusicBrainz.

Performance improvements for CritiqueBrainz

Languages/skills: Python, SQL, PostgreSQL

Currently CritiqueBrainz uses MusicBrainz web service to get information about release groups, artists, etc. CritiqueBrainz depends on this information heavily. Basically, every time we show a review, it needs to be accompanied by information about an entity (event or release group depending on what was reviewed). Unfortunately requests to the web service take significant amount of time, and there is no way to request info about multiple entities in one request. This slows down the website significantly, especially on pages where we show multiple (10-40) reviews.

One way to improve this is to query MusicBrainz database directly. Caching can help as well, and we already use it in some places. Once this problem is solved it should allow us to do more advanced things.

Replace [multiple] language with proper multiple languages

Languages/skills: Perl, SQL, PostgreSQL, Python

A variety of entities (at least Works and Releases) currently support linking it to a specific language, but a lot of entities are really composed of multiple different languages. This is currently "solved" by using '[Multiple languages]', but this leaves a lot of information left out: you can't tell exactly which languages are involved programmaticly.

Changing this would require a lot of changes however, not only for the database schema, but also in the web service, our tagger Picard, and other things using the web service (programming libraries etc.). Not all of this needs necessarily be included in the GSoC project, but the impact of the project should be considered.

Integrate more *Brainz in more *Brainz

Languages/skills: Perl and/or Python and/or Node.js, probably SQL/PostgreSQL

We have a bunch of different projects under the MetaBrainz umbrella by now, but they do not necessarily utilise each other to their fullest extent. MusicBrainz in particular is lacking utilisation of features/data from e.g., AcousticBrainz and ListenBrainz.

I don't have any specific things to do or not do with this, but a prospective student thinking about this should definitely approach us on IRC and talk with us about what they have in mind and if there's anything the community can think of.

BookBrainz Data Importing

Languages/skills: Browser JS, Node.js or Python, SQL/PostgreSQL

At last year's summit, the two BookBrainz lead developers, Leftmost and LordSputnik worked on a plan for importing third party data into BookBrainz. This plan has several stages. First, data sources need to be identified, including mass import sources with freely available data, such as libraries, and manual import sources, such as online book stores and other user-contributed databases. The next stage is to update the database to introduce an "Import" object, which can be used to distinguish mass imported data from (usually better quality) user contributions. Then, actual import bots for mass import and userscripts for manual import will need to be written. Finally, it would desirable (but not necessary if time is short) to introduce an interface to the BookBrainz site to allow users to review automatically imported data, and approve it.

About proposals

Before you dive in and send a proposal to us through Google, it's a good idea to take some time and learn about the MusicBrainz community. At MusicBrainz we pride ourselves for having a strong community - most of us know each other in some way, and some of us know each other face to face from development summits.

A good way to get a feel of this would be to talk about your ideas and proposals on IRC. However, starting off by sending private messages to potential mentors is not a good way to introduce yourself to the community. Please don't do that!

If you're not sure where to start, Development/Summer of Code/Getting started might help.