Development/Summer of Code/2017
This page captures our ideas for Google Summer of Code projects for 2017:
Bitmap: Please add ideas!
BigQuery upload and statistics
New machine learning infrastructure
Storage for detailed analysis files
Direct access to MusicBrainz database
Proposed mentor: Gentlecat
Languages/skills: Python, Flask, SQL (PostgreSQL, SQLAlchemy), Docker, Consul
So far, the biggest cause for slowdown in CritiqueBrainz are requests to MusicBrainz web service. It's not that MusicBrainz WS is slow, it's just that some pages on CritiqueBrainz require a lot of MusicBrainz data, which might take a very long time to retrieve. This can be caused by the complexity of a request, or by a number of them (when showing multiple items, since there's no way to do batch-requests).
New infrastructure allows us to easily read data directly from the MusicBrainz database. Doing this in CritiqueBrainz will probably be a significant speedup.
Create charts/graphs for user behaviour
Proposed mentors:mayhem, alastairp
Languages/skills: Python, Flask, BiqQuery, InfluxDB, data science, graphing, visualization, data architecture
ListenBrainz is preparing to stream its listen data to Big Query where anyone can have access to it in real time. From this data that is stored in BigQuery we wish to have a student build a general charting/graphing system that allows future contributors to explore the data with BigQuery. Any user should be able to craft a query that can be turned into a graph/visualization on the ListenBrainz site, with minimal effort. If a user crafts an interesting query, they should be able to open a pull request and supply the details of the query in order for the LB team to add this graph to the site.
This project requires building the behind the scenes BigQuery access, caching, periodic updates and synchronization between the ListenBrainz server and the BigQuery data store.
A way to associate listens with MBIDs
Proposed mentors: ruaok, alastairp, gentlecat
Forum for discussion
Last.fm is broken because of the terrible way it handles metadata (artists with the same name are jumbled into a single page; at the same time, there are often multiple pages for the same artist/album/track due to spelling variations). ListenBrainz is smarter by taking advantage of MBIDs. But there needs to be some sort of interface for identifying listens as being for a particular track (or recording) MBID. This could allow the user to identify an album they listened to on Spotify as the same one they listen to in iTunes a few days later. Then they wouldn't remain separate artists or albums in the stats due to differences in metadata alone.
Proposed mentors: LordSputnik or Leftmost
Languages/skills: Browser JS, Node.js or Python, SQL/PostgreSQL
Forum for discussion
At last year's summit, the two BookBrainz lead developers, Leftmost and LordSputnik worked on a plan for importing third party data into BookBrainz. This plan has several stages. First, data sources need to be identified, including mass import sources with freely available data, such as libraries, and manual import sources, such as online book stores and other user-contributed databases. The next stage is to update the database to introduce an "Import" object, which can be used to distinguish mass imported data from (usually better quality) user contributions. Then, actual import bots for mass import and userscripts for manual import will need to be written. Finally, it would desirable (but not necessary if time is short) to introduce an interface to the BookBrainz site to allow users to review automatically imported data, and approve it.
Proposed Mentors: LordSputnik/Leftmost
Languages/skills: Node.js, ES6, Python, Redis, OAuth
Forum for discussion
We’re currently in the process of switching to using Node.js for all server side code. As part of this, our schema has been redesigned, and the current Python-based web API will no longer work.
We'd like a new and improved JSON web API to be designed and implemented. The design would clearly describe the result of each different query to the web API, and give examples of output. It would also describe the workings of any additional features to be implemented - for example, authentication, caching and rate limiting. Authentication in the web API is a particular challenge, since the current MB OAuth setup requires a GUI.
The web API should be written using the koa.js Node.js server framework, so that the resulting code is as clean and minimal as possible. Tests should be written in parallel with the implementation, adapting and expanding on the tests for the existing Python web API. The priority for this task is a solid plan and quality code, not a complete implementation (although that would be nice!)