Development/Summer of Code/2013: Difference between revisions

From MusicBrainz Wiki
Jump to navigationJump to search
No edit summary
 
(10 intermediate revisions by 4 users not shown)
Line 1: Line 1:
== Mentors ==
== Mentors ==
This year Robert Kaye, Oliver Charles and Kuno Woudt will probably be amongst our mentors. That's ruaok (Robert), warp (Kuno) and ocharles (Oliver) on IRC, if you want to come and speak to us first. Some potential mentors are listed by each project; this is far from a normative list, but it might give you somebody to ask about the project.
This year Robert Kaye, Oliver Charles, Kuno Woudt, and Ian McEwen will probably be amongst our mentors. That's ruaok (Robert), warp (Kuno), ocharles (Oliver), and ianmcorvidae (Ian) on IRC, if you want to come and speak to us first. Some potential mentors are listed by each project; this is far from a normative list, but it might give you somebody to ask about the project.


== Suggestions ==
== Suggestions ==
Line 14: Line 14:
=== Read-only, browser-oriented site ===
=== Read-only, browser-oriented site ===


'''Proposed mentors''': ''[[User:Reosarevok|reosarevok]],'' ''[[User:PavanChander|navap]]''
'''Proposed mentors''': ''[[User:Reosarevok|reosarevok]],'' ''[[User:PavanChander|navap]]'', ''[[User:Freso|Freso]]''


There's been some [[MusicBrainz_Summit/2012_Mini-Summit/Notes|discussion]] of implementing a site better geared toward casual browsing users, rather than editors, containing stuff like Wikipedia bios, reviews, embedded streaming for those recordings we have relationships for, etc. This is very open-ended at the moment, but some likely issues are in terms of maintainability (potentially two codebases!?), what exactly needs to be shown, effect on the existing site, etc. It's an open question whether this would constitute changes to the current site (when not logged-in, most likely) or whether this would be an additional site/codebase/view on our data.
There's been some [[MusicBrainz_Summit/2012_Mini-Summit/Notes|discussion]] of implementing a site better geared toward casual browsing users, rather than editors, containing stuff like Wikipedia bios, reviews, embedded streaming for those recordings we have relationships for, etc. This is very open-ended at the moment, but some likely issues are in terms of maintainability (potentially two codebases!?), what exactly needs to be shown, effect on the existing site, etc. It's an open question whether this would constitute changes to the current site (when not logged-in, most likely) or whether this would be an additional site/codebase/view on our data.
Line 23: Line 23:


Currently we don't have a proper seperation between mbserver and the database, we have a webservice that can be used to retrieve most things from the database but it is tied into mbserver.
Currently we don't have a proper seperation between mbserver and the database, we have a webservice that can be used to retrieve most things from the database but it is tied into mbserver.
You cannot currently deploy only the webservice. We also have a seperate Search Server application that searches over Lucene indexes built from the database, this provides another part of the webservice but does not have access to the actual database at search time.
You cannot currently deploy only the webservice. We also have a separate Search Server application that searches over Lucene indexes built from the database, this provides another part of the webservice but does not have access to the actual database at search time.


[http://www.hibernate.org/subprojects/search.html|[Hibernate Search]] is a library that brings together Lucene for searching and Hibernate ORM for retrieving data. I think it would be a perfect fit for us allowing us to provide a single web service with search and lookup in one. Data could be returned in multiple formats or as objects, and the advanced mapping features
[http://www.hibernate.org/subprojects/search.html|[Hibernate Search]] is a library that brings together Lucene for searching and Hibernate ORM for retrieving data. I think it would be a perfect fit for us allowing us to provide a single web service with search and lookup in one. Data could be returned in multiple formats or as objects, and the advanced mapping features
Line 30: Line 30:


IMO would be very useful to try out using Hibernate Search for serving one entity (i.e Label) and see if it is worth pursuing further.
IMO would be very useful to try out using Hibernate Search for serving one entity (i.e Label) and see if it is worth pursuing further.

=== Rearchitect/Improve Release Editor ===

'''Proposed mentors:''' ''[[User:OliverCharles|ocharles]],'' ''[[User:Kuno|warp]],'' ''?''

The Release Editor is complex code which is currently pretty bug-ridden and hard to work with. It could use rearchitecting to be a lot more stable, performant, and easier to improve; probably its UI could use work too.

'''NOTE: this would be a ''huge'' project and probably requires some significant expertise. Especially UI-related changes are not well-defined and significant design would be needed. Apply with caution!'''

=== Give Picard a website ===

Currently Picard's "website" is https://musicbrainz.org/doc/MusicBrainz_Picard - a doc page buried in the MusicBrainz site. It's hard to navigate and the surrounding context is all MusicBrainz, not Picard. One idea would be to give Picard its own smaller site, similar to http://metabrainz.org/. This would have its own menu with a link back to MusicBrainz's main site and links to downloads, plugins, documentation and the tagger support section of the forums.

A separate site could be used to improve plugin support. Right now if you want to add a plugin to [[MusicBrainz_Picard/Plugins]] you need to figure out that you can go to the wiki from the /doc/ page in the first place and also find somewhere host your plugin. If you want to download a new plugin, you have to go to the page, download the plugin and then install it manually. If we had a site for it, we could have a database to store plugins in, users could log in (using their MB account details), upload a plugin and set various details about it (license, compatible versions, compatible OSes). It could then have an API for Picard to call instead of users having to download and install manually, and could track how many downloads a plugin has (which could help us decide which features to add to Picard itself) and it would make it possible for Picard to notify users when a newer version of a plugin they use is available.

=== Improve Geordi ===

'''Proposed mentors''': ''[[User:Ianmcorvidae|ianmcorvidae]]''

[[Geordi]] is our newly-minted, beta tool for connecting with arbitrary, likely external, structured data sources. At present, we have a bunch of metadata from the Internet Archive; sources such as Discogs and Jamendo are planned. This project would look at improving geordi from the perspective of a MusicBrainz editor, a task which could take a lot of forms:
* fleshing out further the importer, which is functional but sparse, making its interface more consistent, and adding features
* writing tools to connect geordi and the main musicbrainz site -- userscripts (on MB, on geordi, on other sites?), geordi patches, musicbrainz-server patches, whatever's appropriate
* mapping and displaying more information that can be extracted from data sets
* all sorts of other stuff! Make a proposal early and discuss what you'd like to see out of geordi.

The student working on this project would probably need strong skills in python and knowledge of editor behavior (probably by being one!) and MusicBrainz features that can be exploited/connected to geordi. Skills in javascript and perl would help a lot too.

=== Write a Geordi data matching client for the Internet Archive ===

'''Proposed mentors''': ''[[User:Ianmcorvidae|ianmcorvidae]]''

Please see the project "Improving Geordi" above for some background on this project.

We would like to see a student write a data matching client for Geordi. Geordi, our third party data store contains a large collection of music metadata from the Internet Archive. Large chunks of the data in this archive has never been matched to MusicBrainz data. This project would involve writing a Geordi API client that match the Internet Archive data to data in MusicBrainz. The code that would be created for this project should be as generic as possible to enable us to match other data sets in the future. This project would need to query Geordi for unmatched data, attempt to match it to MusicBrainz and then record possible matches inside Geordi. It's quite probable that some of this project would involve working with Geordi itself to add features desirable or necessary for a client of this sort.

In contrast to the above project, this project works with Geordi not from the perspective of an editor but from the perspective of someone with an external dataset who would like to use geordi's faculties to connect into MusicBrainz.

The student working on this project would need strong skills in whatever language they choose to use for the project (python is highly recommended, as we have some code to start from) and general knowledge of MusicBrainz. Knowledge of data matching algorithms and fuzzy string comparison algorithms would be a plus. Python would also be useful for work on Geordi proper.


== Proposals ==
== Proposals ==
== About proposals ==
== About proposals ==
Before you dive in and send a proposal to us through Google, it's a good idea to take some time and learn about the MusicBrainz community. At MusicBrainz we pride ourselves for having a strong community - most of us know each other in same way, and some of us know each other face to face from development summits.
Before you dive in and send a proposal to us through Google, it's a good idea to take some time and learn about the MusicBrainz community. At MusicBrainz we pride ourselves for having a strong community - most of us know each other in some way, and some of us know each other face to face from development summits.


A good way to get a feel of this would be to lurk around in [[IRC]], or to talk about your proposals on the [[Communication/Mailing Lists|mailing lists]].
A good way to get a feel of this would be to lurk around in [[IRC]], or to talk about your proposals on the [[Communication/Mailing Lists|mailing lists]].

Latest revision as of 09:53, 24 August 2013

Mentors

This year Robert Kaye, Oliver Charles, Kuno Woudt, and Ian McEwen will probably be amongst our mentors. That's ruaok (Robert), warp (Kuno), ocharles (Oliver), and ianmcorvidae (Ian) on IRC, if you want to come and speak to us first. Some potential mentors are listed by each project; this is far from a normative list, but it might give you somebody to ask about the project.

Suggestions

Right now these suggestions are primarily copied from our 2012 page. Add more ideas if you have them!

Repository for Creative Commons-licensed reviews

Proposed mentors: ruaok, ocharles, warp

There's been some discussion of implementing a system somewhat similar to the Cover Art Archive for Creative Commons-licensed album reviews. The BBC has a site for reviews which provides some of the functionality desired, but some folks would like us to have something more open and more our own; it's pretty open-ended at the moment what this could mean though.

Read-only, browser-oriented site

Proposed mentors: reosarevok, navap, Freso

There's been some discussion of implementing a site better geared toward casual browsing users, rather than editors, containing stuff like Wikipedia bios, reviews, embedded streaming for those recordings we have relationships for, etc. This is very open-ended at the moment, but some likely issues are in terms of maintainability (potentially two codebases!?), what exactly needs to be shown, effect on the existing site, etc. It's an open question whether this would constitute changes to the current site (when not logged-in, most likely) or whether this would be an additional site/codebase/view on our data.

Implement Webservice/Search using Hibernate Search Library

Proposed mentors: ijabz

Currently we don't have a proper seperation between mbserver and the database, we have a webservice that can be used to retrieve most things from the database but it is tied into mbserver. You cannot currently deploy only the webservice. We also have a separate Search Server application that searches over Lucene indexes built from the database, this provides another part of the webservice but does not have access to the actual database at search time.

[Hibernate Search] is a library that brings together Lucene for searching and Hibernate ORM for retrieving data. I think it would be a perfect fit for us allowing us to provide a single web service with search and lookup in one. Data could be returned in multiple formats or as objects, and the advanced mapping features of Hibernate Core would solve the issues Musicbrainz has serving certain requests (such as all releases by Mozart). HIbernate Core also makes it easy to modify database and this could well as the basis for the editable webservice.

IMO would be very useful to try out using Hibernate Search for serving one entity (i.e Label) and see if it is worth pursuing further.

Rearchitect/Improve Release Editor

Proposed mentors: ocharles, warp, ?

The Release Editor is complex code which is currently pretty bug-ridden and hard to work with. It could use rearchitecting to be a lot more stable, performant, and easier to improve; probably its UI could use work too.

NOTE: this would be a huge project and probably requires some significant expertise. Especially UI-related changes are not well-defined and significant design would be needed. Apply with caution!

Give Picard a website

Currently Picard's "website" is https://musicbrainz.org/doc/MusicBrainz_Picard - a doc page buried in the MusicBrainz site. It's hard to navigate and the surrounding context is all MusicBrainz, not Picard. One idea would be to give Picard its own smaller site, similar to http://metabrainz.org/. This would have its own menu with a link back to MusicBrainz's main site and links to downloads, plugins, documentation and the tagger support section of the forums.

A separate site could be used to improve plugin support. Right now if you want to add a plugin to MusicBrainz_Picard/Plugins you need to figure out that you can go to the wiki from the /doc/ page in the first place and also find somewhere host your plugin. If you want to download a new plugin, you have to go to the page, download the plugin and then install it manually. If we had a site for it, we could have a database to store plugins in, users could log in (using their MB account details), upload a plugin and set various details about it (license, compatible versions, compatible OSes). It could then have an API for Picard to call instead of users having to download and install manually, and could track how many downloads a plugin has (which could help us decide which features to add to Picard itself) and it would make it possible for Picard to notify users when a newer version of a plugin they use is available.

Improve Geordi

Proposed mentors: ianmcorvidae

Geordi is our newly-minted, beta tool for connecting with arbitrary, likely external, structured data sources. At present, we have a bunch of metadata from the Internet Archive; sources such as Discogs and Jamendo are planned. This project would look at improving geordi from the perspective of a MusicBrainz editor, a task which could take a lot of forms:

  • fleshing out further the importer, which is functional but sparse, making its interface more consistent, and adding features
  • writing tools to connect geordi and the main musicbrainz site -- userscripts (on MB, on geordi, on other sites?), geordi patches, musicbrainz-server patches, whatever's appropriate
  • mapping and displaying more information that can be extracted from data sets
  • all sorts of other stuff! Make a proposal early and discuss what you'd like to see out of geordi.

The student working on this project would probably need strong skills in python and knowledge of editor behavior (probably by being one!) and MusicBrainz features that can be exploited/connected to geordi. Skills in javascript and perl would help a lot too.

Write a Geordi data matching client for the Internet Archive

Proposed mentors: ianmcorvidae

Please see the project "Improving Geordi" above for some background on this project.

We would like to see a student write a data matching client for Geordi. Geordi, our third party data store contains a large collection of music metadata from the Internet Archive. Large chunks of the data in this archive has never been matched to MusicBrainz data. This project would involve writing a Geordi API client that match the Internet Archive data to data in MusicBrainz. The code that would be created for this project should be as generic as possible to enable us to match other data sets in the future. This project would need to query Geordi for unmatched data, attempt to match it to MusicBrainz and then record possible matches inside Geordi. It's quite probable that some of this project would involve working with Geordi itself to add features desirable or necessary for a client of this sort.

In contrast to the above project, this project works with Geordi not from the perspective of an editor but from the perspective of someone with an external dataset who would like to use geordi's faculties to connect into MusicBrainz.

The student working on this project would need strong skills in whatever language they choose to use for the project (python is highly recommended, as we have some code to start from) and general knowledge of MusicBrainz. Knowledge of data matching algorithms and fuzzy string comparison algorithms would be a plus. Python would also be useful for work on Geordi proper.

Proposals

About proposals

Before you dive in and send a proposal to us through Google, it's a good idea to take some time and learn about the MusicBrainz community. At MusicBrainz we pride ourselves for having a strong community - most of us know each other in some way, and some of us know each other face to face from development summits.

A good way to get a feel of this would be to lurk around in IRC, or to talk about your proposals on the mailing lists.