Difference between revisions of "Development/Summer of Code/2014"

From MusicBrainz Wiki
(Write a Geordi data matching client for the Internet Archive)
Line 45: Line 45:
 
We would like to see a student write a data matching client for Geordi. Geordi, our third party data store contains a large collection of music metadata from the Internet Archive. Large chunks of the data in this archive has never been matched to MusicBrainz data. This project would involve writing a Geordi API client that match the Internet Archive data to data in MusicBrainz. The code that would be created for this project should be as generic as possible to enable us to match other data sets in the future. This project would need to query Geordi for unmatched data, attempt to match it to MusicBrainz and then record possible matches inside Geordi. It's quite probable that some of this project would involve working with Geordi itself to add features desirable or necessary for a client of this sort.
 
We would like to see a student write a data matching client for Geordi. Geordi, our third party data store contains a large collection of music metadata from the Internet Archive. Large chunks of the data in this archive has never been matched to MusicBrainz data. This project would involve writing a Geordi API client that match the Internet Archive data to data in MusicBrainz. The code that would be created for this project should be as generic as possible to enable us to match other data sets in the future. This project would need to query Geordi for unmatched data, attempt to match it to MusicBrainz and then record possible matches inside Geordi. It's quite probable that some of this project would involve working with Geordi itself to add features desirable or necessary for a client of this sort.
  
In contrast to the above project, this project works with Geordi not from the perspective of an editor but from the perspective of someone with an external dataset who would like to use geordi's faculties to connect into MusicBrainz.
+
In contrast to the above project, this project works with Geordi not from the perspective of an editor but from the perspective of someone with an external dataset who would like to use geordi's facilties to connect into MusicBrainz.
  
 
The student working on this project would need strong skills in whatever language they choose to use for the project (python is highly recommended, as we have some code to start from) and general knowledge of MusicBrainz. Knowledge of data matching algorithms and fuzzy string comparison algorithms would be a plus. Python would also be useful for work on Geordi proper.
 
The student working on this project would need strong skills in whatever language they choose to use for the project (python is highly recommended, as we have some code to start from) and general knowledge of MusicBrainz. Knowledge of data matching algorithms and fuzzy string comparison algorithms would be a plus. Python would also be useful for work on Geordi proper.

Revision as of 22:39, 24 February 2014

Mentors

This year Robert Kaye, Ian McEwen and Michael Wiencek will probably be amongst our mentors. That's ruaok (Robert), ianmcorvidae (Ian) and bitmap (Michael) on IRC, if you want to come and speak to us first. Some potential mentors are listed by each project; this is far from a normative list, but it might give you somebody to ask about the project.

Suggestions

This is our set of starting ideas for 2014. Add more ideas if you have them!

Add Events to MusicBrainz

Proposed mentors: ianmcorvidae

After many years of wanting "Events" (concerts, performances, etc) we're finally in a position to take on that project. We'd like a student who is already familiar with MusicBrainz and how our dev team works, to implement Events in MusicBrainz. We are unlikely to consider students who are new to MusicBrainz for this project, due to its involved nature.

Read-only, browser-oriented site

Proposed mentors: ruaok, navap, Freso

There's been some discussion of implementing a site better geared toward casual browsing users, rather than editors, containing stuff like Wikipedia bios, reviews, embedded streaming for those recordings we have relationships for, etc. This is very open-ended at the moment, but some likely issues are in terms of maintainability (potentially two codebases!?), what exactly needs to be shown, effect on the existing site, etc. It's an open question whether this would constitute changes to the current site (when not logged-in, most likely) or whether this would be an additional site/codebase/view on our data.

Move MusicBrainz Search to SOLR

Proposed mentors: ruaok

Currently MusicBrainz uses custom search code that rebuilds indexes every few hours. We'd like someone to work on replacing our custom code with Apache SOLR and also work out a way to implement in place index updates to give us near real-time index updating capabilities. Students who work on this should be familiar with SOLR, JSON, Perl, Postgres and Python. Understanding how MusicBrainz works and having contributed to the project before GSoC is a great plus.

Finish & Deploy CritiqueBrainz

Proposed mentors: ruaok

Last year we had a student write the CritiqueBrainz project, which allows editors to write non-neutral point of view music reviews. Our student did a great job and finished everything he set out to do; however there wasn't enough time to actually deploy the project and fix initial bugs. In this proposed project one student would spend roughly half of the summer finishing up the project, adding more styling, fixing open bugs and writing documentation for users and developers. The latter half of the summer the student would work to deploy the project on MusicBrainz' servers and run a short alpha testing period. During this phase the student would fix bugs that appear and generally work to get the site stable and running well. The student who works on this should be familiar with Linux, Python, Postgres, nginx. Any experience hosting sites would be a big plus. This project would be a great opportunity for someone who already knows how to code, but would love to learn more about finishing and deploying a web site.

Give Picard a website

Currently Picard's "website" is https://musicbrainz.org/doc/MusicBrainz_Picard - a doc page buried in the MusicBrainz site. It's hard to navigate and the surrounding context is all MusicBrainz, not Picard. One idea would be to give Picard its own smaller site, similar to http://metabrainz.org/. This would have its own menu with a link back to MusicBrainz's main site and links to downloads, plugins, documentation and the tagger support section of the forums.

A separate site could be used to improve plugin support. Right now if you want to add a plugin to MusicBrainz_Picard/Plugins you need to figure out that you can go to the wiki from the /doc/ page in the first place and also find somewhere host your plugin. If you want to download a new plugin, you have to go to the page, download the plugin and then install it manually. If we had a site for it, we could have a database to store plugins in, users could log in (using their MB account details), upload a plugin and set various details about it (license, compatible versions, compatible OSes). It could then have an API for Picard to call instead of users having to download and install manually, and could track how many downloads a plugin has (which could help us decide which features to add to Picard itself) and it would make it possible for Picard to notify users when a newer version of a plugin they use is available.


Write a Geordi data matching client for the Internet Archive

Proposed mentors: ianmcorvidae

Please see the project "Improving Geordi" above for some background on this project.

We would like to see a student write a data matching client for Geordi. Geordi, our third party data store contains a large collection of music metadata from the Internet Archive. Large chunks of the data in this archive has never been matched to MusicBrainz data. This project would involve writing a Geordi API client that match the Internet Archive data to data in MusicBrainz. The code that would be created for this project should be as generic as possible to enable us to match other data sets in the future. This project would need to query Geordi for unmatched data, attempt to match it to MusicBrainz and then record possible matches inside Geordi. It's quite probable that some of this project would involve working with Geordi itself to add features desirable or necessary for a client of this sort.

In contrast to the above project, this project works with Geordi not from the perspective of an editor but from the perspective of someone with an external dataset who would like to use geordi's facilties to connect into MusicBrainz.

The student working on this project would need strong skills in whatever language they choose to use for the project (python is highly recommended, as we have some code to start from) and general knowledge of MusicBrainz. Knowledge of data matching algorithms and fuzzy string comparison algorithms would be a plus. Python would also be useful for work on Geordi proper.

Proposals

About proposals

Before you dive in and send a proposal to us through Google, it's a good idea to take some time and learn about the MusicBrainz community. At MusicBrainz we pride ourselves for having a strong community - most of us know each other in some way, and some of us know each other face to face from development summits.

A good way to get a feel of this would be to lurk around in IRC, or to talk about your proposals on the mailing lists.