MusicBrainz Summit/11/Session Notes

From MusicBrainz Wiki
Jump to: navigation, search

Attendees

  • Kuno Woudt (warp)
  • Pavan Chander (navap)
  • Rob Kaye (ruaok)
  • Nikki
  • Oliver Charles (ocharles)
  • Jamie McDonald (jdamcd)
  • Nicolás Tamargo (reosarevok)
  • CatCat
  • Per Øyvind Øygard (Wizzcat)
  • Paul Taylor (ijabz)
  • Mathias Kunter (mathiaskunter)
  • Hilbert Woudt (monedula)

Sponsor representatives

  • musiXmatch: Valerio Paolini
  • Last.fm: Adrian Woodhead (massdosage)
  • Google/Freebase: Micah Saul (micahsaul)
  • Zvooq: Andrey Popp (andreypopp)
  • BBC: Dave Evans (djce)

Customer introductions

Last.fm

  • They have had a lot of personnel changes over the last few years, but would like to re-establish a relationship with MB
  • Are looking to switch to NGS schema by the end of the year
  • They would like to use MBIDs internally to make communication between incoming data sets easier
  • Will consider sharing partial label feed/data
  • Might actually solve their artist disambiguation issue soon..ish

Zvooq

  • They are a Spotify competitor in Russia that focuses on music released worldwide

musiXmatch

  • They are a lyrics database
  • World wide license from Sony, Universal, EMI, Warner, BMG, Kobalt

Freebase

  • Freebase is a big data repository of various data sets covering movies, music, sports, people, locations, and others
  • http://freebase.com

BBC

  • They are looking to finally make the switch to NGS
  • Their music news website now uses ws/2
  • They outsource their album reviews and MB data entry to Unique Broadcasting Company

Discussions

Friday (Oct 14)

Single sign on & password security

Goals

  • Not storing plaintext passwords
  • Not having knowable (i.e. reversible) passwords
  • Not transmitting passwords in the clear
  • Single sign on

Questions

  • What specific password issues are we trying to solve?

Discussed proposals

  • Implement OpenID
  • Using digest authentication (still requires storing and transferring the clear text password)
  • Using SSL (requires updating web service libraries)
  • Using a separate LDAP server (password no longer in MB database and stored elsewhere, also allows for possible single sign on integration)

Conclusion: Use LDAP and phase in SSL to increase password security. Bonus: LDAP makes single sign on possible.

Saturday (Oct 15)

Cover art archive

  • Universal is considering handing over their entire cover art archive to us
  • Labels actually don't own copyright on cover art
  • There are potential messy legal issues to using cover art
  • The Internet Archive functions as a library and can act as a 'cover art shelter' for us
  • Possible process:
    • A release's MBID can be used to receieve a cover art image
    • If you know a release's MBID you can do a GET and receive a cover art image
    • Track cover art uploads by user and also use regular voting process
    • Images will be provided as a hi-res (~15 MB) and as a low-res (500 px)

Questions

  • Does the user have to upload JPEG or can the server transcode?
  • What status code will we return when a 'darkened' image exists but we're not allowed to display it?
  • If we get cover art from Universal, how do we match each image up with a release?
  • How do we handle a release group with many releases (i.e., do we use the same image?)
  • How do we handle multiple images (e.g. front, back, obi, liner notes, cd faces, etc.)

Cover_Art_Archive

Cover_Art_Wishlist

Data quality

  • See Sunday

Edit system

Goals

  • Allow grouping edits together and bulk submitting them
  • Allow editing an edit and resubmitting it without impacting the edit queue
  • Allow editing via the web service, eventually

Bookbrainz

  • Oliver's pet project and testing ground for future MB framework changes
  • BB emulates git
    • it allows building a stack of changes and then submitting all of them together in one 'commit'
    • It takes a snapshot of the data at the time. We don't have that with historical edits so migrating old edits is a problem

Further reading regarding

Web service

  • Roll out 3scale and move all commercial users over to a pay2play system with different packages
  • Non-commercial users would use the free2play rate-limited system with the option of paying for better access

Audio fingerprinting

  • We all hate PUIDs and we need to move forward
  • Acoustid looks very promising, it's open source, file oriented, and has strong ties with MB
  • http://acoustid.org/
  • May be possible to bulk fingerprint some data sources

Concert support

  • Do we go with one provider or several?
    • Start with Songkick, but stay open to the option of different providers - especially to gain global coverage
  • Do we concentrate on future events or archived events?
    • Initially link to Songkick for future events
    • Create a new setlist entity for past events
    • Create a new venue entity
  • Need to consider Location, would be useful for artist as well as for events.

Tracks vs recordings (vs works)

  • Similar to the remaster issue
  • Do we add further levels of abstraction?
    • No. We're already saturated with entities. We need better definitions
    • ...and we still haven't totally defined works
  • Do we count silence as a divergence point?

Service segregation

  • Announce the closing of trac (and all its tickets) and the deprecation of subversion
  • svn.musicbrainz.org will remain as an interface for the search server
  • Consider replacing gitweb with github in a more official capacity

Genres

  • A new field that is to be used specifically for genres
  • Features: autocomplete, canonical names,
  • Micah is offering genre data based on wikipedia

Product offering

This is not a complete or final model and not official!
  • "Drug dealer" model - free the first time, get addicted, pay for easy further access
  • Data dumps (twice a week)
    • Public $100* *suggested
    • CC-NC $250 (Paying for commercial use of NC use data)
  • Live data feed ($/mth)
    • Twice-weekly $500
    • Daily $1500
    • Hourly $2500
  • Web service calls (flat fee)
    • 10K $10
    • 25K $20
    • 50K $30
    • 100K $50
  • Virtual machine
    • VM + Data $300
    • VM + Data + Search $400
  • Tagger Affiliate Program
    • TBD: Clarification of the scope of the program
    • TBD: Web service referral kickbacks

Sunday (Oct 16)

3rd party data set integration

  • Lyrics from musiXmatch
    • daily updates, but will start with weekly ones
    • updates will include all MB/mXm matched lyrics
    • lyrics can be added also from edit interface
    • How do we best use their lyrics data?
      • Solution: Link to mxm via a lyrics icon in the tracklist and a proper link on the recording page
  • See also Monday

Tracklist/medium overhaul with video support

  • Videos are becoming increasingly common as a music release medium (e.g. iTunes)
  • Will require major schema changes and looking at the long term goals of MusicBrainz
  • Solution: Table the discussion for now, reopen in a different setting with developers

Group multiple release events (country+date) together

  • There is a need to group multiple releases together when each release is the exact same - just released in a different country
  • Due to tradition, different countries/regions issue releases on different days of the week
  • Solution: Allow multiple release events per release when the label, barcode, and tracklist is the same

Date improvements

  • Unknown end date (dead/disbanded, but we don't exactly know when)
    • Solution: Add a column to the date table to specifically state that the entity is dead/disbanded, but we don't know when
  • Fuzzy dates (16th century composer edge cases)
    • Solution: Use a 'century' column

Data quality

  • User:Wizzcat/Data_Quality_Extension
  • http://wiki.xabbu.net/Data_quality
  • Current implementation of data quality has a bad name, is poorly defined, and isn't used
  • What do we want to solve?
    • Explicitly state that a release has been reviewed/verified
      • Solution: +1 / -1 votes that decay in weight over some function of time
    • Protect against ignorance (The White Album vs The Beatles)
      • Solution: Add a 'Protected' flag (i.e. edits expire by default)
    • Measure of completeness
      • Solution: "Completed as per liner notes" checkbox that is accessible via the WS
  • Conclusion: High quality is the protected flag, default quality is default, low quality goes away

Release group attributes

  • Currently, 'remix' and 'soundtrack' are at the same meta level as 'album' or 'lp'
  • Conclusion: Postponed till a proposal can be drawn up

Reports

  • Improve the explanation that is shown at the top of each reports' page
  • Improve report flow (e.g. ability to hide items from reports)
  • Allow marking an entry as 'done'
  • Default report list should filter out all entries marked as done with more than X votes
  • Allow viewing the report with the filtered out entries

Site notifications + subscriptions

  • List all emails in a site inbox
  • Create a dynamic list of subscribed artists with open edits

Testing

  • As finances improve, employ a dedicated person that will lead the testing

Pagination

  • Filter on release group properties
  • Use infinite scroll
  • Be able to reorder, add, remove, and sort columns

Medium attributes (12" vinyl, 8 cm CD)

  • Switch from a hierarchical tree to attributes

Music dashboard

Instrument tree

  • Change from a tree to a graph
    • Flatten the graph into a tree and allow an instrument to have multiple parents
  • Add model support to the instrument tree
  • Importing freebase data
    • How often do we sync the data?
    • How do we reconcile differences in data?
    • How often do deletes/merges/changes happen?
  • Going forward, if we need a new instrument we would add it to freebase

Universal Music Group International

  • "I am very happy to declare Universal's support for MusicBrainz and its community" - Innovation Manager at Universal Music Group International

Release editor

  • Default tracklist page shows the advanced view
  • For new releases you see the add disc dialog
  • The track parser moves into the add disc dialog
  • There needs to be a way to reparse from the advanced view

Wiki

  • Remove unneeded extensions
  • Update to Ubuntu's MediaWiki package
  • Get the API working
  • Install wiki at /wiki/Article and then redirect to /Article
  • Write a wiki test suite

Monday (Oct 17th)

Initial dates on release group

  • Last.fm would like to create 'best of the decade' lists and filter out data such as the 2009 re-release of The Beatles
  • Currently, release group dates match the date of the earliest release in that group, but in the case of re-releases we often only have data on the modern release and are missing (for example) the original '70s vinyl release
  • Solution: Add an editable initial date field at the release group level
    • The date field will default to empty because anyone who wants the group date can guess via its earliest release (like MB does now)

musiXmatch

  • short description of musiXmatch expectations
  • feedback from MB Editors on musiXmatch contributions
  • Editors' willingness to help musiXmatch (IRC channel)
  • musiXmatch will report unexpected Edit Interface behaviour (for example Split Artists while adding a Release)
  • change usernames to make them easily identifiable (add customer name to username)
  • provide guidelines for interactions between MB Editors and external Editors


3rd party data set integration

  • How do we properly link to different data sets? (e.g., musiXmatch, soundunwound, last.fm, etc.)
    • Solution: Build a generic framework that allows us to import any external data set and reconcile it with the data we have
    • Use a second "integration database" that contains all raw data from external sources (label feeds, partners, etc.)
    • Import data into the main database with a de-duplication script, but do not remove any of the original raw data (this allows further parsing in the future)
    • Also look into Google Refine for manual reconciliation: http://code.google.com/p/google-refine/
  • A long term goal is to create an editing API that we can gradually open up to our data partners and the ecosystem
    • This will allow partners like Zvooq to edit data on their website, but feed the changes back to the rest of the MB ecosystem

Feature prioritization

Feature (votes)

  1. Edit system (9)
  2. Group multiple release events together (6)
  3. Data quality (6)
  4. 3rd party data set integration (5)
  5. Single sign on & password security (5)
  6. Instrument tree (4)
  7. Genres (4)
  8. Medium attributes (4)
  9. Release group attributes (3)
  10. Music dashboard (2)
  11. Tracklist/medium overhaul with video support (1)
  12. Pagination (1)
  13. Site notifications (1)
  14. Report improvements (1)
  15. Date improvements (0)
  16. Auto-editor elections (0)
  17. Full classical support