Difference between revisions of "Development/Summer of Code/2016"

From MusicBrainz Wiki
(Mentors)
 
(21 intermediate revisions by 6 users not shown)
Line 8: Line 8:
 
<dd>[[wikipedia:Linked_data|Linked open data article]] on wikipedia</dd>
 
<dd>[[wikipedia:Linked_data|Linked open data article]] on wikipedia</dd>
 
<dt>Ready to apply?</dt>
 
<dt>Ready to apply?</dt>
<dd>[https://community.metabrainz.org/c/metabrainz/gsoc-applications GSoC applications @ communinity.metabrainz.org]</dd>
+
<dd>[https://community.metabrainz.org/c/metabrainz/gsoc-applications GSoC applications @ community.metabrainz.org]</dd>
 
<dd>Be aware of the content of our [[Development/Summer of Code/Application Template]]</dd>
 
<dd>Be aware of the content of our [[Development/Summer of Code/Application Template]]</dd>
 
</dl>
 
</dl>
  
== Mentors ==
+
=== Mentors ===
This year Robert Kaye, Michael Wiencek, Alastair Porter, Ben Ockmore, and Sean Burke will probably be amongst our mentors. That's ruaok (Robert), bitmap (Michael), alastairp (Alastair Porter), LordSputnik (Ben Ockmore), and Leftmost (Sean Burke) on IRC, if you want to come and speak to us first. Some potential mentors are listed by each project; this is far from a normative list, but it might give you somebody to ask about the project.
 
  
== Suggestions ==
+
{| class="wikitable"
 +
|+ Mentor list
 +
! Name
 +
! [[IRC]] nick
 +
! Project
 +
|-
 +
|[[discourse_user:rob|Robert Kaye]]
 +
|ruaok
 +
|AcousticBrainz, ListenBrainz, MusicBrainz
 +
|-
 +
|[[discourse_user:bitmap|Michael Wiencek]]
 +
|bitmap
 +
|MusicBrainz
 +
|-
 +
|[[discourse_user:alatairp|Alastair Porter]]
 +
|alastairp
 +
|AcousticBrainz, ListenBrainz
 +
|-
 +
|[[discourse_user:lordsputkin|Ben Ockmore]]
 +
|LordSputnik
 +
|BookBrainz
 +
|-
 +
|[[discourse_user:leftmostcat|Sean Burke]]
 +
|Leftmost
 +
|BookBrainz
 +
|-
 +
|[[discourse_user:gentlecat|Roman Tsukanov]]
 +
|Gentlecat
 +
|CritiqueBrainz, AcousticBrainz, ListenBrainz
  
This is our set of starting ideas for 2016. Add more ideas if you have them!
+
|}
  
=== AcousticBrainz ===
+
Some potential mentors are listed by each project; this is far from a normative list, but it might give you somebody to ask about the project.
  
Proposed mentor: ''ruaok'' or ''alastairp''<br>
+
{{Note|Contacting the mentors privately (e.g., via e-mail or private IRC messages) will get you off to a very, very bad start in your relations with us and any application you send us is now almost definitely going to not get accepted.}}
Languages/skills: Python, Postgres, Flask
 
  
[http://acousticbrainz.org/ AcousticBrainz] is our new project that crowdsources acoustic information for all music in the world and to make it available to the public. We already have low-level information about more than three million tracks. What we need is a good way for users and developers to interact with all this data and help improve algorithms that are used to analyze it.
+
=== About proposals ===
 +
Before you dive in and send a proposal to us through Google, it's a good idea to take some time and [[How_to_Contribute|learn about the MusicBrainz community]]. At MusicBrainz we pride ourselves for having a strong community - most of us know each other in some way, and some of us know each other face to face from development summits.
  
It would suit someone with experience or an interest in machine learning algorithms, though the majority of the project will probably involve creating infrastructure around our existing algorithms.
+
A good way to get a feel of this would be to talk about your ideas and proposals on IRC. However, starting off by sending private messages to potential mentors '''is not''' a good way to introduce yourself to the community. '''Please don't do that!'''
  
Ideas for this project are described on a separate page: '''[[AcousticBrainz/Ideas]]'''.
+
If you're not sure where to start, [[Development/Summer of Code/Getting started]] might help.
  
You can read more information about AcousticBrainz and some of the existing models that we have created on our [http://blog.musicbrainz.org/category/acousticbrainz/ blog].
 
  
==== Getting started ====
+
==Projects==
If you want to work on AcousticBrainz you should show that you are able to set up the server software and understand how some of the infrastructure works. Here are some things that we might ask you about
+
===[[AcousticBrainz]]===
* Install the server on your computer or use the Vagrant setup scripts to build a virtual machine
 
* Download the AcousticBrainz submission tool and configure it to compute features for some of your audio files and submit them to the local server that you configured
 
* Use your preferred programming language to access the API to download the data that you submitted to your server, or other data from the main AcousticBrainz server
 
* Create an oauth application on the MusicBrainz website and add the configuration information to your AcousticBrainz server. Use this to log in to your server with your MusicBrainz details
 
* Look at the system to build a Dataset (accessible from your profile page on the AcousticBrainz server) and try and build a simple dataset
 
* Look at the [http://tickets.musicbrainz.org/browse/AB list of tickets] that we have open for AcousticBrainz and see if you understand what some of them mean. Feel free to ask questions about what they mean - some ticket descriptions don't have much detail
 
  
=== Add social features to MusicBrainz ===
+
{| style="width:60%"
 +
|-
 +
|rowspan="3"|[[file:AcousticBrainz logo small notext.png ]]
 +
|AcousticBrainz is our new project that crowdsources acoustic information for all music in the world and to make it available to the public. We already have low-level information about more than three million tracks. What we need is a good way for users and developers to interact with all this data and help improve algorithms that are used to analyze it.
  
Proposed mentor: ''ruaok''<br>
+
It would suit someone with experience or an interest in machine learning algorithms, though the majority of the project will probably involve creating infrastructure around our existing algorithms.
Languages/skills: Perl and/or Python, Postgres
+
|-
 
+
|style="background-color:ghostwhite" | '''Languages/skills''': Python, PostgreSQL, Flask
We recently added event (read: concerts) support to MusicBrainz. Our main motivation was to add this feature for historical concerts, but it can also be used for future concerts. In the past the crowd-sourced concerts on last.fm were the best place to find concerts, but in the past few years last.fm has begun to fade from people's awareness. There is a possibility that MusicBrainz can take the former place of last.fm and become the best crowd source concert information site on the net. In order for this to happen, we would need to add a few more features to MusicBrainz:
+
|-
 
+
| style="text-align:center" | [[Development/Summer of Code/2016/AcousticBrainz|Ideas page]] | [http://acousticbrainz.org/ Main page] | [https://community.metabrainz.org/c/acousticbrainz Forums] | [http://blog.musicbrainz.org/category/acousticbrainz/ Blog]
* Social notifications: MB users should be able to post to Facebook/Twitter when they do plan to attend a concert.
+
|}
* Other features: What features should we add to build a community around concert information curation?
 
 
 
These social features are important for building a community of users around concerts. The goal is to engage users to enter information about concerts and venues and then talk about upcoming concerts. The more people use MusicBrainz to talk about concerts publicly, more people will get drawn in to improve the concert listings in MusicBrainz.
 
 
 
=== Performance improvements for CritiqueBrainz ===
 
 
 
Proposed mentor: ''Gentlecat''<br>
 
Languages/skills: Python, Flask, SQL, PostgreSQL
 
 
 
Currently CritiqueBrainz uses MusicBrainz web service to get information about release groups, artists, etc. CritiqueBrainz depends on this information heavily. Basically, every time we show a review, it needs to be accompanied by information about an entity (event or release group depending on what was reviewed). Unfortunately requests to the web service take significant amount of time, and there is no way to request info about multiple entities in one request. This slows down the website significantly, especially on pages where we show multiple (10-40) reviews.
 
 
 
One way to improve this is to query MusicBrainz database directly. Caching can help as well, and we already use it in some places. Once this problem is solved it should allow us to do more advanced things.
 
 
 
=== Improve database access in CritiqueBrainz ===
 
 
 
Proposed mentor: ''Gentlecat''<br>
 
Languages/skills: Python, Flask, SQL, PostgreSQL
 
 
 
From the start CritiqueBrainz server has been using SQLAlchemy ORM to interact with the database. Unfortunately, we started to notice that it adds too many constraints that we have to work around: writing complex queries and updating old ones is harder, caching becomes more complicated. Apart from this, there are a lot of implicit things that happen in background when you use an ORM. Database access code ended up spread out all over the place (even in templates).
 
 
 
It might be worth replacing all ORM usage in CritiqueBrainz with raw SQL queries, and improving code around it. We already have a similar implementation in AcousticBrainz project, which can be used as a reference.
 
 
 
=== Replace [multiple] language with proper multiple languages ===
 
 
 
Languages/skills: Perl, SQL, PostgreSQL, Python
 
 
 
A variety of entities (at least Works and Releases) currently support linking it to a specific language, but a lot of entities are really composed of multiple different languages. This is currently "solved" by using '[Multiple languages]', but this leaves a lot of information left out: you can't tell exactly which languages are involved programmaticly.
 
 
 
Changing this would require a lot of changes however, not only for the database schema, but also in the web service, our tagger Picard, and other things using the web service (programming libraries etc.). Not all of this needs necessarily be included in the GSoC project, but the impact of the project should be considered.
 
 
 
* Related ticket: http://tickets.musicbrainz.org/browse/MBS-5452 (only for works)
 
* Related IRC discussion: http://chatlogs.metabrainz.org/brainzbot/musicbrainz/msg/3503635/
 
 
 
=== Integrate more *Brainz in more *Brainz ===
 
 
 
Languages/skills: Perl and/or Python and/or Node.js, probably SQL/PostgreSQL
 
 
 
We have [https://metabrainz.org/projects a bunch of different projects] under the MetaBrainz umbrella by now, but they do not necessarily utilise each other to their fullest extent. MusicBrainz in particular is lacking utilisation of features/data from e.g., AcousticBrainz and ListenBrainz.
 
 
 
I don't have any specific things to do or not do with this, but a prospective student thinking about this should definitely approach us on [[Communication/IRC|IRC]] and talk with us about what they have in mind and if there's anything the community can think of.
 
 
 
=== BookBrainz Data Importing ===
 
 
 
Proposed mentors: ''LordSputnik'' or ''Leftmost''<br>
 
Languages/skills: Browser JS, Node.js or Python, SQL/PostgreSQL
 
 
 
At last year's summit, the two BookBrainz lead developers, Leftmost and LordSputnik worked on a plan for importing third party data into BookBrainz. This plan has several stages. First, data sources need to be identified, including mass import sources with freely available data, such as libraries, and manual import sources, such as online book stores and other user-contributed databases. The next stage is to update the database to introduce an "Import" object, which can be used to distinguish mass imported data from (usually better quality) user contributions. Then, actual import bots for mass import and userscripts for manual import will need to be written. Finally, it would desirable (but not necessary if time is short) to introduce an interface to the BookBrainz site to allow users to review automatically imported data, and approve it.
 
 
 
=== ListenBrainz ===
 
ListenBrainz is one of the newest MetaBrainz projects. Read more information on [https://listenbrainz.org its homepage].
 
  
==== Getting started ====
+
<hr />
If you want to work on ListenBrainz you should show that you are able to set up the server software and understand how some of the infrastructure works. Here are some things that we might ask you about
 
* Show that you understand the goals that ListenBrainz wants to achieve, which are written on its homepage
 
* Install the server on your computer or use the Vagrant setup scripts to build a virtual machine
 
* Create an oauth application on the MusicBrainz website and add the configuration information to your ListenBrainz server. Use this to log in to your server with your MusicBrainz details
 
* Use the import script that is part of the ListenBrainz server to load scrobbles from last.fm to your ListenBrainz server, or the main ListenBrainz server
 
* Use your preferred programming language to write a submission tool that can send Listen data to ListenBrainz. You could make up some fake data for song names and artists. This data doesn't have to be real.
 
* Try and delete the ListenBrainz database on your local server to remove the fake data that you added.
 
* Look at the [http://tickets.musicbrainz.org/browse/LB list of tickets] that we have open for ListenBrainz and see if you understand what tasks the tickets involve
 
* If you want to, see if you can contribute to fixing a ticket. Either add a comment to the ticket or ask in IRC for clarification if you don't understand what the ticket means
 
  
=== ListenBrainz: A submission API compatible with Last.fm scrobblers ===
+
===[[BookBrainz]]===
 +
{| style="width:60%"
 +
|-
 +
|rowspan = 3| [[file:BookBrainz logo small notext.png]]
 +
|BookBrainz is a database of book metadata.
  
Proposed mentors: ruaok, alastairp<br>
+
This year we're interested in projects that help us get more data. The three suggested ideas to build proposals around are data importing, a web API and gamification of editing. Please see our sub-project ideas page for information on getting started and more details about the ideas themselves.
Languages/skills: Python
+
|-
 +
| style="background-color:ghostwhite" | '''Top 3 Desired Skills''': Node.js, Python, SQL
 +
|-
 +
| style="text-align:center"|[[Development/Summer of Code/2016/BookBrainz|Ideas page]] | [[bb:|Main page]] | [https://community.metabrainz.org/c/bookbrainz Forums]
 +
|}
  
Right now ListenBrainz has its own API documented at https://listenbrainz.readthedocs.org/en/latest/. It'd be great if there were an additional web service layered on top of that one which spoke the Last.fm API, so it could be used as a proxy for existing Last.fm clients (ideally submitting the listens to both sites).
+
<hr />
  
=== ListenBrainz: Statistics ===
+
===[[CritiqueBrainz]]===
 +
{| style="width:60%"
 +
|-
 +
| rowspan = 3| [[file:CritiqueBrainz logo small notext.png]]
 +
|Fills the gap between music critics and raw data by providing a platform created for the sole purpose of Creative Commons licensed reviews.
 +
|-
 +
| style = "background-color:ghostwhite" | '''Languages/skills''': Python, Flask, SQL, PostgreSQL
 +
|-
 +
| style="text-align:center"|[[Development/Summer of Code/2016/CritiqueBrainz|Ideas page]] | [[cb:|Main page]] | [https://community.metabrainz.org/c/metabrainz Forums]
 +
|}
  
Proposed mentors: ruaok, alastairp<br>
+
<hr />
Languages/skills: Python
 
  
User profiles on ListenBrainz are basically just a flat list of all your listens right now. We need to generate stats based on these listens: top artists (artist credits?), albums (release groups might be more useful than releases), recordings (works?), etc. Besides users, stats for the artists themselves would be nice (what was already mentioned, but as an aggregate across all users). This should be accomplished by streaming all of our data to Google's BigQuery and then building the needed systems to create statistics from that.
+
===[[ListenBrainz]]===
 +
{| style="width:60%"
 +
|-
 +
| rowspan = 3 | [[file:ListenBrainz logo small notext.png ]]
 +
||An open source music website that allows users to import their listen history. One of the goals is for this data to be used for building open music recommendation systems.
 +
|-
 +
|-
 +
| style = "background-color:ghostwhite" | '''Languages/skills''': Python
 +
|-
 +
| style="text-align:center"|[[Development/Summer of Code/2016/ListenBrainz|Ideas page]] | [https://listenbrainz.org/ Main page]
 +
|}
  
=== ListenBrainz: A way to associate listens with MBIDs ===
+
<hr />
  
Proposed mentors: ruaok, alastairp<br>
+
===[[MusicBrainz]]===
Languages/skills: Python
+
{| style="width:60%
 +
|-
 +
| rowspan = 3 | [[file:MusicBrainz logo small notext.png]]
 +
|A community-maintained open source music encyclopedia that collects music metadata and makes it available to the public.
 +
|-
 +
| style = "background-color:ghostwhite" | '''Languages/skills''': JavaScript (React), Perl, Python, PostgreSQL, SQL
 +
|-
 +
| style="text-align:center"|[[Development/Summer of Code/2016/MusicBrainz|Ideas page]] | [[mb:|Main page]] | [https://community.metabrainz.org/c/musicbrainz Forums]
 +
|}
  
Last.fm is broken because of the terrible way it handles metadata (artists with the same name are jumbled into a single page; at the same time, there are often multiple pages for the same artist/album/track due to spelling variations). ListenBrainz is smarter by taking advantage of MBIDs. But there needs to be some sort of interface for identifying listens as being for a particular track (or recording) MBID. This could allow the user to identify an album they listened to on Spotify as the same one they listen to in iTunes a few days later. Then they wouldn't remain separate artists or albums in the stats due to differences in metadata alone.
+
<hr />
  
== About proposals ==
 
Before you dive in and send a proposal to us through Google, it's a good idea to take some time and [[How_to_Contribute|learn about the MusicBrainz community]]. At MusicBrainz we pride ourselves for having a strong community - most of us know each other in some way, and some of us know each other face to face from development summits.
 
  
A good way to get a feel of this would be to talk about your ideas and proposals on IRC. However, starting off by sending private messages to potential mentors '''is not''' a good way to introduce yourself to the community. '''Please don't do that!'''
 
  
If you're not sure where to start, [[Development/Summer of Code/Getting started]] might help.
 
  
 
[[Category:Development]]
 
[[Category:Development]]

Latest revision as of 07:26, 22 March 2016

Where to start

New to MetaBrainz?
List of MetaBrainz projects
New to MetaBrainz development and/or GSoC?
Getting started with GSoC
New to the idea of linked open data?
Linked open data article on wikipedia
Ready to apply?
GSoC applications @ community.metabrainz.org
Be aware of the content of our Development/Summer of Code/Application Template

Mentors

Mentor list
Name IRC nick Project
Robert Kaye ruaok AcousticBrainz, ListenBrainz, MusicBrainz
Michael Wiencek bitmap MusicBrainz
Alastair Porter alastairp AcousticBrainz, ListenBrainz
Ben Ockmore LordSputnik BookBrainz
Sean Burke Leftmost BookBrainz
Roman Tsukanov Gentlecat CritiqueBrainz, AcousticBrainz, ListenBrainz

Some potential mentors are listed by each project; this is far from a normative list, but it might give you somebody to ask about the project.

Note: Contacting the mentors privately (e.g., via e-mail or private IRC messages) will get you off to a very, very bad start in your relations with us and any application you send us is now almost definitely going to not get accepted.

About proposals

Before you dive in and send a proposal to us through Google, it's a good idea to take some time and learn about the MusicBrainz community. At MusicBrainz we pride ourselves for having a strong community - most of us know each other in some way, and some of us know each other face to face from development summits.

A good way to get a feel of this would be to talk about your ideas and proposals on IRC. However, starting off by sending private messages to potential mentors is not a good way to introduce yourself to the community. Please don't do that!

If you're not sure where to start, Development/Summer of Code/Getting started might help.


Projects

AcousticBrainz

AcousticBrainz logo small notext.png AcousticBrainz is our new project that crowdsources acoustic information for all music in the world and to make it available to the public. We already have low-level information about more than three million tracks. What we need is a good way for users and developers to interact with all this data and help improve algorithms that are used to analyze it.

It would suit someone with experience or an interest in machine learning algorithms, though the majority of the project will probably involve creating infrastructure around our existing algorithms.

Languages/skills: Python, PostgreSQL, Flask
Ideas page | Main page | Forums | Blog

BookBrainz

BookBrainz logo small notext.png BookBrainz is a database of book metadata.

This year we're interested in projects that help us get more data. The three suggested ideas to build proposals around are data importing, a web API and gamification of editing. Please see our sub-project ideas page for information on getting started and more details about the ideas themselves.

Top 3 Desired Skills: Node.js, Python, SQL
Ideas page | Main page | Forums

CritiqueBrainz

CritiqueBrainz logo small notext.png Fills the gap between music critics and raw data by providing a platform created for the sole purpose of Creative Commons licensed reviews.
Languages/skills: Python, Flask, SQL, PostgreSQL
Ideas page | Main page | Forums

ListenBrainz

ListenBrainz logo small notext.png An open source music website that allows users to import their listen history. One of the goals is for this data to be used for building open music recommendation systems.
Languages/skills: Python
Ideas page | Main page

MusicBrainz

MusicBrainz logo small notext.png A community-maintained open source music encyclopedia that collects music metadata and makes it available to the public.
Languages/skills: JavaScript (React), Perl, Python, PostgreSQL, SQL
Ideas page | Main page | Forums