Year in review
After an exciting and action packed 2011, 2012 was a little more calm. Since our last major schema update for the Next Generation Schema, we've spent a considerable amount of time making further improvements and ironing out the kinks from that major release. Our development team pushed out updates weekly for a number of months, before falling back to a more reasonable pace of approximately twice a month. After two years with no significant updates, this is a welcome change!
In January we held a mini MusicBrainz summit in London in an effort to get people and companies in London who were interested in MusicBrainz to find out more about how to work with us. For the next couple of months we put our heads down working towards our next schema change release. Then, May was an action packed month! On May 9th, Universal Music UK launched the Artist Gateway using MBIDs. This was the first time that a record label embraced our data and produced a very slick looking site showcasing their own artists.
At the very end of 2011 we established a schema change release schedule of mid May and mid October. This release schedule allows our data consumers more time to plan the work that they need to do in order to prepare for our schema changing. We had our first regularly scheduled schema change release in May and we finished off May by moving from an unclear Public Domain definition to the well defined CC0 license that spells out exactly how our data can be used.
MusicBrainz Picard version 1.0 was released in June (and version 1.1 was released in September). Thank you Lukáš Lalinský (luks), Michael Wiencek (bitmap), and Wieland Hoffmann (mineo) for all your hard work this year!
In July we announced that we're working on a revamp of our edit system, very creatively called the New Edit System. In August we also held our first online hack weekend, where Oliver Charles added support for the recently added cover art to our home page, Robert Kaye created the new Changed MBIDs data feed and Ian McEwen created a new IRC bot for our IRC channels. In September we hired Ian McEwen to be a full time engineer for MusicBrainz and in October we launched the Cover Art Archive, which we'll talk about more in a moment.
Cover Art Archive
The Cover Art Archive is a joint project between the Internet Archive and MusicBrainz whose goal is to make cover art images available to everyone in an organised and convenient way. The Cover Art Archive was announced to the public in October 2012, but has been collecting images since May 2012. In the 8 months since May we have amassed a collection of just under 140,000 pieces of cover art and are currently receiving between 400-600 new pieces each day!
We look forward to the day when the Cover Art Archive is seen as the de facto source of cover art images, leveling the playing field for all consumers!
8tracks uses the MusicBrainz data to power their online radio stations that are curated by humans, not algorithms. The Echo Nest has been using MusicBrainz data for quite a few years now in all of their offerings, including Project Rosetta Stone which allows the translation of IDs from one music provider to another. In 2012 the Echo Nest agreed to give back to MusicBrainz and provide us some support. AOL uses MusicBrainz data to provide info on tracks played in Winamp, and on their online artist pages.
Google Summer of Code
We had five students for Google Summer of Code 2012: Alastair Porter (alastairp), Brandon LeBlanc (demosdemon), Daniel Bali (plaintext), Ian McEwen (ianmcorvidae), and Michael Wiencek (bitmap).
Alastair improved our collections system by fixing long-standing issues and adding new management features. Alastair was mentored by Oliver Charles (ocharles). Brandon worked on the MusicBrainz app for iOS and was mentored by Jamie McDonald (jdamcd). Daniel spent his summer using Splunk to dive into our server logs so that we can better understand our traffic patterns. Daniel was mentored by Robert Kaye (ruaok). Ian broke language barriers and took on internationalisation support. Ian was mentored by Nikki. This is Ian's second Summer of Code with MusicBrainz. Michael created the long sought after relationship editor and was mentored by Kuno Woudt (warp). This is Michael's second Summer of Code with MusicBrainz.
The 12th MusicBrainz summit was held in Barcelona, Spain, from Nov 9-11th. The summit was attended by 22 people with 16 flying in from various parts of the world.
As in previous years, the summit has proven once again to be a fantastic experience, not just for the quality of discussions, but also the level of interaction between participants. With the Saturday group meal, socialising at the apartment and continued discussions during breaks at the summit itself, it was great to see people chatting, laughing and generally having a great time. An overview of what was discussed is available on our blog.
We thank Brewster Kahle, Google, Music Kickup, Oliver Charles, and Spotify for sponsoring our 12th summit. We especially thank the Music Technology Group at Universitat Pompeu Fabra for both sponsoring the summit and being our local host. The summit would not have been possible without all of your contributions.
New Edit System (NES)
The New Generation Schema (NGS) update in 2011 made possible many new features, but it did not touch on our ageing edit system. We will be spending lots of development time in 2013 working on the New Edit System (NES) that will fix many of the weaknesses we've had with our current edit system such as not being able to group edits together or amend an edit once its been submitted.
Oliver Charles talks more about what NES is and how it will be implemented on his blog.
Ian McEwen started coding on our new Geordi project, which aims to be a repository for third party data being contributed to MusicBrainz. Most third party data sources don't quite meet our quality standards or don't cleanly fit into MusicBrainz. Geordi is meant to help with this problem by providing a simple place for us to import new data and make it searchable to our community. Then our community can examine the data and determine its quality and establish a mapping between the data and the MusicBrainz database. Once such a mapping is in place, Geordi can help import data into MusicBrainz and keep track of what data has or has not been imported into MusicBrainz.
Our finances in 2012 are summarized by our Profit & Loss statement:
|Live Data Feed Licenses||$92,300.00|
|Tagger Affiliate Program||$14,875.68|
|CC Data License||$6,450.00|
|GSoC Summit Chocolate||$90.12|
For the first time in the existence of the MetaBrainz Foundation, we've finished a year in the red. Several vendors were quite behind in their payments to MetaBrainz and that caused a shortfall of $26,853.71.
In the course of 2012 we spent $1,547.59 on hardware and $18,775.00 on hosting costs for a total of $20,322.59. In the same time we served 5.0 billion hits, of which 4.73 billion were web service hits. In 2012 we had some users abusing our web service and we had to implement some aggressive throttling. Please note that the graph below is an aggregate of all attempted requests which is greater than the 5.0 billion served responses.
For the purpose of calculating our cost of our web hits, we're only counting the web hits that were successful. With that in mind, our cost of 1 million hits was $4.06 and the cost of 1 million web service hits was $4.29. These costs nearly halved from the previous year, mainly because we spent very little on hardware in 2012, due to the large hardware donation at the end of 2011.
In 2012 we spent $105,587.27 in salaries for developers and our costs for administration and taxes was $77,602.43. We earned $92,300.00 from our Live Data Feed and $6,450.00 from Creative Commons licensed data for a total of $98,750, which remains nearly unchanged from last year. Our end user donations via PayPal come to $9,374.94, which is down from last year's $14,851.60 -- the drop was due to us not holding a specific fundraiser in 2012. Large donations added up to $71,954.40, including $40,000 from Google, $20,000 from AOL and $7,821.59 from an anonymous record label.
In 2012 our traffic continued to grow beyond our means of honoring all of the requests made of us. In early 2012 we started throttling some very demanding client applications and throughout 2012 continued to tweak our traffic shaping in an effort to make MusicBrainz as available to as many people as possible:
This traffic graph conveys only the requests that MusicBrainz served from its servers. However, VLC set up its own MusicBrainz installation, the MusicBrainz Virtual machine allows anyone to have their own complete copy of MusicBrainz running on their own hardware and quite a few companies have their own MusicBrainz setup. Considering how many installations of MusicBrainz exist in the world, the total number of hits that were served is much greater than what is shown in this graph. Sadly, we have no way of knowing how many total MusicBrainz requests were handled in total in 2012, but we know its considerably more than the 5 billion hits we served from our server farm.
In 2012, 20,540 editors made a total of 4,249,916 edits. The total number of editors is down from 23,637 in 2011, but the total number of edits has nearly doubled over the the 2,247,150 value from 2011. In 2012 the number of bots making changes to our database increased, but by far the largest share of the work of editing MusicBrainz is still done by our community of editors. One more year down and still, MusicBrainz would be nothing without its editors! Thank you for another stellar year of editing!
At the end of 2012, MusicBrainz had 17 machines in service. From the top, going down:
- rika: User sandbox machine (mbsandbox.org)
- lolo: An extra front end web server
- scooby: ftp, system statistics, blog
- pino: MetaBrainz, MusicBrainz Classic, Cover Art Archive
- baron: Database failover server
- stimpy: off
- dexter: off
- cartman: Classic search server, index builder
- wiley: New catch all server: SVN, git, jira, wiki, trac, mail, backups
- Gir: network router (not visible)
- lenny/carl: Redundant network gateways
- tails: off currently
- asterix: web server
- astro: web server
- roobarb: search server
- pingu: web server
- dora: search server, memcached server
- totoro: database server
Not shown: Hobbes, continuous integration (jenkins), replicated search indexes
Throughout 2012 we retired old machines and deployed new machines from our large hardware donation at the end of 2011. MusicBrainz uses 8.91mbits of bandwidth per second on average (1.34 inbound, 7.32M outbound) and draws 29 Amps of current for a power consumption of about 3,190 Watts. MusicBrainz physically occupies 20Us of space (half of a rack) at Digital West in San Luis Obispo, CA.
Words of appreciation
Many thanks to our editors, voters, peer reviewers, bug watchers and other members of our community -- without you MusicBrainz would not be what it is today!
We'd like to also thank our developers that pushed out dozens of releases of the site, Picard and our numerous developer libraries. All of your work is critical to enabling the MusicBrainz community to do its job.
We'd also like to thank our awesome Board of Directors, Brewster Kahle and Internet Archive, all of the donors who contributed money and all of our customers. In particular we'd like to thank Google, AOL and an anonymous record label for the large donations made in 2012. These large contributions allow us to carry on with our goal of making the MusicBrainz the most comprehensive music encyclopedia out there!
Thank you to everyone who contributed in 2012!