MetaBrainz:Annual Report/2012

From MusicBrainz Wiki
Jump to navigationJump to search

Year in review

After an exciting and action packed 2011, 2012 was a little more calm. Since our last major schema update for the Next Generation Schema, we've spent a considerable amount of time making further improvements and ironing out the kinks from that major release. Our development team pushed out updates weekly for a number of months, before falling back to a more reasonable pace of approximately twice a month. After two years with no significant updates, this is a welcome change!

In January we held a mini MusicBrainz summit in London in an effort to get people and companies in London who were interested in MusicBrainz to find out more about how to work with us. For the next couple of months we put our heads down working towards our next schema change release. Then, May was an action packed month! On May 9th, Universal Music UK launched the Artist Gateway using MBIDs. This was the first time that a record label embraced our data and produced a very slick looking site showcasing their own artists.

At the very end of 2011 we established a schema change release schedule of mid May and mid October. This release schedule allows our data consumers more time to plan the work that they need to do in order to prepare for our schema changing. We had our first regularly scheduled schema change release in May and we finished off May by moving from an unclear Public Domain definition to the well defined CC0 license that spells out exactly how our data can be used.

MusicBrainz Picard version 1.0 was released in June (and version 1.1 was released in September). Thank you Lukáš Lalinský (luks), Michael Wiencek (bitmap), and Wieland Hoffmann (mineo) for all your hard work this year!

In July we announced that we're working on a revamp of our edit system, very creatively called the New Edit System. In August we also held our first online hack weekend, where Oliver Charles added support for the recently added cover art to our home page, Robert Kaye created the new Changed MBIDs data feed and Ian McEwen created a new IRC bot for our IRC channels. In September we hired Ian McEwen to be a full time engineer for MusicBrainz and in October we launched the Cover Art Archive.

Closing out the year, we held our annual MusicBrainz summit in Barcelona at the Music Technology Group at UPF.

Cover Art Archive

The Cover Art Archive is a joint project between the Internet Archive and MusicBrainz whose goal is to make cover art images available to everyone in an organised and convenient way. The Cover Art Archive was announced to the public in October 2012, but has been collecting images since May 2012. In the 8 months since May we have amassed a collection of just under 140,000 pieces of cover art and are currently receiving between 400-600 new pieces each day!

We look forward to the day when the Cover Art Archive is seen as the de facto source of cover art images, leveling the playing field for all consumers!

New customers

We had three new customers in 2012: 8tracks, AOL, and The Echo Nest.

8tracks uses the MusicBrainz data to power their online radio stations that are curated by humans, not algorithms. The Echo Nest has been using MusicBrainz data for quite a few years now in all of their offerings, including Project Rosetta Stone which allows the translation of IDs from one music provider to another. In 2012 the Echo Nest agreed to give back to MusicBrainz and provide us some support. AOL uses MusicBrainz data to provide info on tracks played in Winamp, and on their online artist pages.

Google Summer of Code

We had five students for Google Summer of Code 2012: Alastair Porter (alastairp), Brandon LeBlanc (demosdemon), Daniel Bali (plaintext), Ian McEwen (ianmcorvidae), and Michael Wiencek (bitmap).

Alastair improved our collections system by fixing long-standing issues and adding new management features. Alastair was mentored by Oliver Charles (ocharles). Brandon worked on the MusicBrainz app for iOS and was mentored by Jamie McDonald (jdamcd). Daniel spent his summer using Splunk to dive into our server logs so that we can better understand our traffic patterns. Daniel was mentored by Robert Kaye (ruaok). Ian broke language barriers and took on internationalisation support. Ian was mentored by Nikki. This is Ian's second Summer of Code with MusicBrainz. Michael created the long sought after relationship editor and was mentored by Kuno Woudt (warp). This is Michael's second Summer of Code with MusicBrainz.

MusicBrainz summit

The 12th MusicBrainz summit was held in Barcelona, Spain, from Nov 9-11th. The summit was attended by 22 people with 16 flying in from various parts of the world.

As in previous years, the summit has proven once again to be a fantastic experience, not just for the quality of discussions, but also the level of interaction between participants. With the Saturday group meal, socialising at the apartment and continued discussions during breaks at the summit itself, it was great to see people chatting, laughing and generally having a great time. An overview of what was discussed is available on our blog.

We thank Brewster Kahle, Google, Music Kickup, Oliver Charles, and Spotify for sponsoring our 12th summit. We especially thank the Music Technology Group at Universitat Pompeu Fabra for both sponsoring the summit and being our local host. The summit would not have been possible without all of your contributions.

Looking forward

New Edit System (NES)

The New Generation Schema (NGS) update in 2011 made possible many new features, but it did not touch on our ageing edit system. We will be spending lots of development time in 2013 working on the New Edit System (NES) that will fix many of the weaknesses we've had with our current edit system such as not being able to group edits together or amend an edit once its been submitted.

Oliver Charles talks more about what NES is and how it will be implemented on his blog.

Ingestion (Geordi)

Ian McEwen started coding on our new Geordi project, which aims to be a repository for third party data being contributed to MusicBrainz. Most third party data sources don't quite meet our quality standards or don't cleanly fit into MusicBrainz. Geordi is meant to help with this problem by providing a simple place for us to import new data and make it searchable to our community. Then our community can examine the data and determine its quality and establish a mapping between the data and the MusicBrainz database. Once such a mapping is in place, Geordi can help import data into MusicBrainz and keep track of what data has or has not been imported into MusicBrainz.

Finances

Our finances in 2012 are summarized by our Profit & Loss statement:

Income
Summit Sponsorship $1,000.00
PayPal Donations $9,374.94
General Donations $71,954.40
Consulting $2,500.00
Live Data Feed Licenses $92,300.00
Bank Credits $0.13
Bank Interest $74.09
Amazon Associates $829.54
Reimbursements $2,212.10
Tagger Affiliate Program $14,875.68
CC Data License $6,450.00
Total Income $199,358.78
Expenses
Officer Salary $69,999.96
Bank $845.00
PayPal $1,235.56
WePay $7.24
GSoC Summit Chocolate $90.12
Rent $4,056.00
Hardware $1,547.59
Travel $6,706.58
Internet $264.38
Marketing $1,000.00
Development $105,587.27
Gifts $37.70
Events $929.80
Hosting $18,775.00
Filing Fees $120.00
Software $488.00
Entertainment $1,297.33
Insurance $2,307.00
Accounting $3,315.49
Payroll Taxes $7,602.47
Total Expenses $226,212.49

For the first time in the existence of the MetaBrainz Foundation, we've finished a year in the red. Several vendors were quite behind in their payments to MetaBrainz and that caused a shortfall of $26,853.71.

In the course of 2012 we spent $1,547.59 on hardware and $18,775.00 on hosting costs for a total of $20,322.59. In the same time we served 5.0 billion hits, of which 4.73 billion were web service hits. In 2012 we had some users abusing our web service and we had to implement some aggressive throttling. Please note that the graph below is an aggregate of all attempted requests which is greater than the 5.0 billion served responses.

For the purpose of calculating our cost of our web hits, we're only counting the web hits that were successful. With that in mind, our cost of 1 million hits was $4.06 and the cost of 1 million web service hits was $4.29. These costs nearly halved from the previous year, mainly because we spent very little on hardware in 2012, due to the large hardware donation at the end of 2011.

In 2012 we spent $105,587.27 in salaries for developers and our costs for administration and taxes was $77,602.43. We earned $92,300.00 from our Live Data Feed and $6,450.00 from Creative Commons licensed data for a total of $98,750, which remains nearly unchanged from last year. Our end user donations via PayPal come to $9,374.94, which is down from last year's $14,851.60 -- the drop was due to us not holding a specific fundraiser in 2012. Large donations added up to $71,954.40, including $40,000 from Google, $20,000 from AOL and $7,821.59 from an anonymous record label.

Traffic

In 2012 our traffic continued to grow beyond our means of honoring all of the requests made of us. In early 2012 we started throttling some very demanding client applications and throughout 2012 continued to tweak our traffic shaping in an effort to make MusicBrainz as available to as many people as possible:

Metabrainz-annual-report-2012-traffic.png

This traffic graph conveys only the requests made to the main http://musicbrainz.org servers.

Several organizations such as VLC have set up their own replicated copy of MusicBrainz. Considering how many installations of MusicBrainz exist in the world, the total number of hits that were served is much greater than what is shown in this graph. Sadly, we have no way of knowing how many total MusicBrainz requests were handled in total in 2012, but we know its considerably more than the 5 billion hits we served from our server farm.

Top contributors

Top editors
1 reosarevok 288689 13 dimpole 53400
2 Senax 128455 14 SultS 51580
3 drsaunde 104881 15 m___ah 50426
4 kaik 94718 16 RocknRollArchivist 49544
5 murdos 92016 17 ListMyCDs.com 46541
6 gswanjord 85843 18 nikki 40795
7 HibiscusKazeneko 73456 19 mat813 39266
8 mudcrow 72122 20 MrH 36363
9 rochusw 63692 21 pankkake 36265
10 monxton 59109 22 salo.rock 34573
11 fmera 55761 23 Hawke 28217
12 Sekhmetouserapis 54607 24 chabreyflint 27643
Top voters
1 pankkake 103003 13 Jazzy Jarilith 27521
2 mudcrow 83578 14 fmera 24868
3 MrH 75480 15 monxton 21741
4 chabreyflint 58593 16 Jokipii 21234
5 mat813 56957 17 sbontrager 21164
6 murdos 49297 18 MeinDummy 20424
7 viper666 34028 19 symphonick 16355
8 reosarevok 33934 20 ianmcorvidae 15980
9 salo.rock 32301 21 gswanjord 14946
10 Ataki 32271 22 nikki 14155
11 ListMyCDs.com 29343 23 drsaunde 12705
12 PhantomOTO 28093 24 mihhkel 11486

In 2012, 20,540 editors made a total of 4,249,916 edits. The total number of editors is down from 23,637 in 2011, but the total number of edits has nearly doubled over the 2,247,150 value from 2011. In 2012 the number of bots making changes to our database increased, but by far the largest share of the work of editing MusicBrainz is still done by our community of editors. One more year down and still, MusicBrainz would be nothing without its editors! Thank you for another stellar year of editing!

Server farm

Metabrainz-annual-report-2012-server-farm.jpg

At the end of 2012, MusicBrainz had 17 machines in service. From the top, going down:

  • rika: User sandbox machine (mbsandbox.org)
  • lolo: An extra front end web server
  • scooby: ftp, system statistics, blog
  • pino: MetaBrainz, MusicBrainz Classic, Cover Art Archive
  • baron: Database failover server
  • [empty]
  • stimpy: off
  • dexter: off
  • cartman: Classic search server, index builder
  • wiley: New catch all server: SVN, git, jira, wiki, trac, mail, backups
  • Gir: Network router (not visible)
  • lenny: Redundant network gateway
  • carl: Redundant network gateway
  • tails: off currently
  • asterix: web server
  • astro: web server
  • roobarb: search server
  • pingu: web server
  • dora: search server, memcached server
  • totoro: database server

Not shown: Hobbes, continuous integration (jenkins), replicated search indexes

Throughout 2012 we retired old machines and deployed new machines from our large hardware donation at the end of 2011. MusicBrainz uses 8.91Mbits of bandwidth per second on average (1.34Mbps inbound, 7.32Mbps outbound) and draws 29 Amps of current for a power consumption of about 3,190 Watts. MusicBrainz physically occupies 20Us of space (half of a rack) at Digital West in San Luis Obispo, CA.

Words of appreciation

Many thanks to our editors, voters, peer reviewers, bug watchers and other members of our community -- without you MusicBrainz would not be what it is today!

We'd like to also thank our developers that pushed out dozens of releases of the site, Picard and our numerous developer libraries. All of your work is critical to enabling the MusicBrainz community to do its job.

We'd also like to thank our awesome Board of Directors, Brewster Kahle and the Internet Archive, all of the donors who contributed money and all of our customers. In particular we'd like to thank Google, AOL and an anonymous record label for the large donations made in 2012. These large contributions allow us to carry on with our goal of making MusicBrainz the most comprehensive music encyclopedia out there!

Thank you to everyone who contributed in 2012!