History:Development/XML Web Service/Version 1

From MusicBrainz Wiki
Revision as of 23:24, 18 April 2006 by RobertKaye (talk | contribs) (Added a couple of examples from Kevin McGrail (Imported from MoinMoin))
Jump to navigationJump to search

Attention.png Status: Beta testing in progress. Please do not change unless you are told to!




Introduction

The web service discussed in this document is an interface to the MusicBrainz database which contains a huge amount of music metadata, all maintained by the MusicBrainz community. It is aimed at developers of media players, CD rippers, taggers and other applications requiring music metadata. The service's architecture follows the REST design principles. Interaction with the web service is done using HTTP and all content is served in a simple but flexible XML format.

This document first describes how the data in MusicBrainz is organized. Users who already have experience in using the website can safely skip this section and start with the specification sections.

The MusicBrainz Metadata Model

MusicBrainz uses an object oriented schema to model music metadata. The main classes are artist, release and track, each with a different set of attributes and relations. Apart from their traditional relations (artists have releases, releases contain tracks), a more powerful schema was introduced (sometimes called AdvancedRelationships).

It allows users to link an object of one class to an object of any other class (URLs are permitted, too). Many different link types exist (see AdvancedRelationshipType for a list), which can be used to specify the artist who did background vocals on a release or track, who is married to whom, where an artist's offical homepage is, and a lot more. Those links themselves may have attributes, with their semantics depending on the link type.

The following sections discuss the main classes in more detail.

The Artist Class

In MusicBrainz, artists always have a unique ID, a name and a SortName. If the artist name isn't unique, a disambiguation comment is used to provide more information about the artist (see IdenticallyNamedArtists). Additionally, an artist may also be flagged as Person or Group and it can have a begin date and an end date. For persons, these are the dates of birth and death, for groups they are the founding and dissolving dates.

An artist can have any number of releases and relations to other artists, releases, tracks and URLs.

The Release Class

All releases have a unique ID, a title and one or more tracks. Each release has a type ("Album", "Compilation", "Single" etc.), a status ("Official", "Promotion", "Bootleg") and language information. A release may also have additional release information, which are represented as a list of (country, date) tuples, also called release events.

A common use case is to look up a release using a DiscID generated from an Audio CD's table of contents (TOC). A release can have any number of DiscIDs (including none), mostly due to different pressings. In rare cases, where two different CDs have the same TOC, a DiscID may map to more than one release.

Relations are used to link a release to other releases, artists, tracks and URLs.

The Track Class

Tracks have a unique ID, a title and one main artist. They may also have a duration attribute indicating the play time. There can be any number of PUIDs, which are used to lookup tracks. PUIDs are audio fingerprints generated from music files, but they are not unique, so a PUID can be associated with many tracks.

As with the other classes, a track object can have any number of relations to artists, releases, tracks and URLs.

The URL Schema

All MusicBrainz objects (artists, releases, tracks) are modeled as resources. Resources have unique URLs and can be accessed using standard HTTP. Each resource is also part of a collection. This is a special resource which represents all objects of a type.

For this version of the web service, the http://musicbrainz.org/ws/1/ namespace has been reserved. It is further structured like this:

http://musicbrainz.org/ws/1/artist/ Collection of all artists
http://musicbrainz.org/ws/1/artist/MBID An individual artist
http://musicbrainz.org/ws/1/release/ Collection of all releases
http://musicbrainz.org/ws/1/release/MBID An individual release
http://musicbrainz.org/ws/1/track/ Collection of all tracks
http://musicbrainz.org/ws/1/track/MBID An individual track

Basically, there are two different ways to access MusicBrainz data. If you know the MBID (a globally unique identifier assigned to each object in the database), you can request the resource directly. To access the artist "Tori Amos" for example, the resource http://musicbrainz.org/ws/1/artist/c0b2500e-0cef-4130-869d-732b23ed9df5 may be used.

Another option is to use the artist collection. Since this collection is huge, it is unfeasible to request all of it and then extract the data you need. Instead, collections support filters, which allow to limit the amount of data based on some criteria. For example, you can use a filter to only request artists with the name "Tori Amos": http://musicbrainz.org/ws/1/artist/?type=xml&name=Tori+Amos. The Filters supported depend on the collection and are described below.

In REST, HTTP methods are used to create (PUT), retrieve (GET), modify (POST) and delete (DELETE) resources. The most important method for this web service is GET, which returns a representation for the requested resource. Several different representations are possible, but at this point only the XML format discussed later in this document is supported.

By default, the web service only returns a basic representation of a resource. Additional information can be requested using the inc parameter, which depends on the resource. If you want to request a release including all tracks and additional release information for example, you can use this URL: http://musicbrainz.org/ws/1/release/02232360-337e-4a3f-ad20-6cdd4c34288c?type=xml&inc=tracks+release-events

The following sections discuss the parameters available for each type of resource. The type parameter is required for all web service queries:

type Selects the representation of the resource. Currently only xml is supported. This is mandatory!

The inc parameter is only allowed for individual resources (but not for collections):

inc A list of space separated values describing how much detail should be included in the output. If there is no inc parameter, just the basic data for a resource is returned. For artists that would be name, sort-name, and disambiguation.

The limit parameter is supported for all resource collections (but not for individual resources):

limit An integer value defining how many entries should be returned. Only values between 1 and 100 (both inclusive) are allowed. If not given, this defaults to 25.

Note that multiple parameters with the same name are not permitted.

The following HTTP status codes are used:

code cause
200 OK Resource retrieved successfully.
400 Bad Request Syntactically invalid MBID requested.
Invalid parameter value (ie. invalid inc tag)
Missing required parameter value (ie. type not set)
401 Unauthorized Client requested a resource which requires authentication via HTTP Digest Authentication.
If sent even though user name and password were given: user name and/or password are incorrect.
404 Not Found Wrong web service prefix. /ws/ would be correct for the MusicBrainz server.
Invalid version number. Only version 1 is currently supported.
Invalid entity name. Only artist, release, track, or user are permitted.
Resource not found. There is no resource having this ID (maybe it was merged or deleted).

artist resources

Parameters for http://musicbrainz.org/ws/1/artist/MBID:

inc Supported: 'aliases', 'sa-'*, 'va-'*, 'artist-rels', 'release-rels', 'track-rels', 'url-rels'

To get an artist's releases, the 'sa-' and 'va-' prefixes (for single artist and various artist releases, respectively) together with the desired release type have to be used. For example, the release tag sa-Album requests single artist albums, while va-Bootleg requests various artists bootlegs. Multiple tags may be used, so inc=sa-Compilation+sa-Official is valid and returns all official compilations by that artist (AND conjunctions are used for release types!).

Note: Only 'sa-' or 'va-' may be used in an inc parameter.

Parameters for http://musicbrainz.org/ws/1/artist/:

name Fetch a list of artists with a matching name
limit The maximum number of artists returned. Defaults to 25, the maximum allowed value is 100.

release resources

Parameters for http://musicbrainz.org/ws/1/release/MBID:

inc Supported: 'artist', 'counts', 'release-events', 'discs', 'tracks', 'artist-rels', 'release-rels', 'track-rels', 'url-rels'

Parameters for http://musicbrainz.org/ws/1/release/:

title Fetch a list of releases with a matching title
discid Fetch all releases matching to the given DiscID
artist The returned releases should match the given artist name
artistid The returned releases should match the given artist ID (36 character ASCII representation). If this is given, the artist parameter is ignored.
releasetypes The returned releases must match all of the given release types. This is a list of space separated values like Official, Bootleg, Album, Compilation, etc.
limit The maximum number of releases returned. Defaults to 25, the maximum allowed value is 100.

For the releasetypes parameter, the MusicBrainz release type and release status values are used (see AlbumAttribute). Note that the current MusicBrainz server only supports one release type and one release status value, so for example releasetypes=Live+Compilation won't work because for releasetypes an AND conjunction is used.

track resources

Parameters for http://musicbrainz.org/ws/1/track/MBID:

inc Supported: 'artist', 'releases', 'puids', 'artist-rels', 'release-rels', 'track-rels', 'url-rels'

Parameters for http://musicbrainz.org/ws/1/track/:

title Fetch a list of tracks with a matching title
artist The returned tracks have to match the given artist name.
release The returned tracks have to match the given release title.
duration The length of the track in milliseconds
tracknum The track number
artistid The artist's MBID. If this is given, the artist parameter is ignored.
releaseid The release's MBID. If this is given, the release parameter is ignored.
puid The returned tracks have to match the given PUID.
limit The maximum number of tracks returned. Defaults to 25, the maximum allowed value is 100.

PUID submission works using POST on the collection of tracks. The parameters are sent url-encoded, that means with a content type of application/x-www-form-urlencoded.

Parameters for posting to http://musicbrainz.org/ws/1/track/:

client The ID of the client software submitting the PUIDs. This has to be the application's name and version number, not that of a client library (client libraries should use HTTP's User-Agent header)! The required format: application-version, where version must not contain a - character.
puid A (TrackId, PUIDId) pair, separated by a single space character. Both TrackId and PUID are in their 36 character ASCII representation. This parameter may appear multiple times.

Users have to be logged in to submit PUIDs. This is independent from the website login and works via HTTP Digest Authentication. The realm is 'musicbrainz.org'.

Examples

Below are a few user contributed examples to illustrate the use of this web service:

To get the Official XTC Releases that best match the album title "Wasp Star Apple Venus Volume 2", this would be the query: http://musicbrainz.org/ws/1/release/?type=xml&artist=XTC&releasetypes=Official&limit=10&title=Wasp+Star+Apple+Venus+Vol+2

Tracks listing can then be gotten by choosing the release id and querying for tracks: http://musicbrainz.org/ws/1/release/6d931ac2-e389-4e99-8a01-1da65162c372?type=xml&inc=tracks

The XML Format

To represent web service responses, which are basically representations of some part of the MusicBrainz database, a new XML format has been developed. It is easy to read, powerful and extensible. Unfortunately, there is no documentation on it yet, but some examples and a Relax NG schema are available via subversion:

svn co http://svn.musicbrainz.org/mmd-schema/trunk mmd-schema

This is also available via the trac source browser.

Bugs in the schema should be reported on the IRC channel or posted to mb-devel. Please note that we're not going to make major changes to the format, only remaining mistakes will be corrected.

IDs and Types

The IDs and types used in the XML format are URIs. To keep the transmission overhead low, all URIs in the MusicBrainz namespace may be used in their relative form. So if a track's fully qualified id is http://musicbrainz.org/track/d6118046-407d-4e06-a1ba-49c399a4c42f, it may be shortened to d6118046-407d-4e06-a1ba-49c399a4c42f in the XML. Note that this shortening is only allowed for URIs from the MusicBrainz namespace.

The following rules apply to create a fully qualified URI from a relative one:

  • The id attributes:
    • artist: http://musicbrainz.org/artist/relative URI
    • release: http://musicbrainz.org/release/relative URI
    • track: http://musicbrainz.org/track/relative URI
  • The type attributes:
    • artist: http://musicbrainz.org/ns/mmd-1.0#relative URI
    • release: http://musicbrainz.org/ns/mmd-1.0#relative URI (for each relative URI in the list)

Due to their large number, relations are in a namespace on their own to avoid clashes:

  • Various relation attributes:
    • type: http://musicbrainz.org/ns/rel-1.0#relative URI
    • attributes: http://musicbrainz.org/ns/rel-1.0#relative URI (for each relative URI in the list)

Note: Don't confuse the URIs, especially the id URIs with URLs. The URIs are just names, they should not be used to query data from the server. But they are in a permanent format which will always be valid and can easily be transformed to URLs. Example:

The following is an absolute, permanent MusicBrainz artist identifier which is the preferred representation. Shorter representations may be used for storing IDs in file tags or databases.

    http://musicbrainz.org/artist/c0b2500e-0cef-4130-869d-732b23ed9df5

This one is a URL created from the URI above, using a simple transformation. It can be used to request data from the MusicBrainz server via the webservice. URLs may change over time, the URIs will not.

    http://musicbrainz.org/ws/1/artist/c0b2500e-0cef-4130-869d-732b23ed9df5

Using the XML format for other applications

The XML format format uses URIs for several IDs and types. The artist element, for example, has a type attribute which accepts a URI. Users of the format may use the URIs defined by MusicBrainz or use one from their own namespace. This example uses a definition from MusicBrainz:

    <artist id="c0b2500e-0cef-4130-869d-732b23ed9df5" type="http://musicbrainz.org/ns/mmd-1.0#Group"/>

If an application needs "Orchestra" as an artist type, a different namespace may be used:

    <artist id="c0b2500e-0cef-4130-869d-732b23ed9df5" type="http://example.org/ext-7.2#Orchestra"/>

This method may be used in all places where the schema accepts an anyURI datatype. As mentioned earlier, there is a special rule for all URIs defined by MusicBrainz: They may be relative, as you can see in the examples above. The complete artist ids have the form http://musicbrainz.org/artist/c0b2500e-0cef-4130-869d-732b23ed9df5 and the type attribute in the first example could also be written as Group, without the namespace prefix.

To extend the format even further, the schema has several extension points (see def_extension) which allows adding arbitrary XML elements from a user-defined namespace. Using the mmd-namespace or no namespace at all is not permitted. There are no namespace restrictions inside that element, however, and unlimited nesting is possible, too.

If your private namespace is http://example.org/ext-9.1# and you want to add data from a rating system, for example, it could be coded like this:

<?xml version="1.0" encoding="UTF-8"?>
<metadata xmlns="http://musicbrainz.org/ns/mmd-1.0#" xmlns:ext="http://example.org/ext-9.1#">
    <track id="d6118046-407d-4e06-a1ba-49c399a4c42f">
        <title>Silent All These Years</title>
        <duration>253466</duration>
        <ext:rating value="9"/>
    </track>
</metadata>

Even more complicated things, like nested tags are possible. Note that the em doesn't belong to the ext namespace.

<?xml version="1.0" encoding="UTF-8"?>
<metadata xmlns="http://musicbrainz.org/ns/mmd-1.0#" xmlns:ext="http://example.org/ext-9.1#">
    <track id="d6118046-407d-4e06-a1ba-49c399a4c42f">
        <title>Silent All These Years</title>
        <duration>253466</duration>
        <ext:annotation>This is a <em>very</em> nice song.</ext:annotation>
    </track>
</metadata>

This is still valid according to the schema, but inside the extension elements, only well-formedness can be checked.

A Note to Application Developers

The PythonMusicBrainz2 package is the reference implementation of a client library for this web service. It has been designed to be as simple to use as possible but still provides access to all parts of the service. If you are planning to write bindings for another programming language, you are encouraged to follow the programming model, object oriented schema, and terminology of PythonMusicBrainz2 as far as it makes sense for your language.



Development Notes

This section has to be integrated into the spec at some point.

Use Cases

This new web service will have the following use cases:

  • Retrieve artist (via mbid)/album (via mbid/cdid)/track (via mbid)
    • Each of these should return a minimal amount of data and have options to return more detailed data:
      • artist: optional list of albums
      • album: optional artist info, cdids, release info, list of tracks
      • track: optional artist info
  • Retrieve a list of tracks that match a given PUID
    • same arguments that apply to track info should be used here
  • Full text search of the DB (via lucene query)
  • Login to MB
  • Submit PUIDs (after login)
  • Check donation status of user (for showing pop-ups in Picard)
  • Lookup track (like MBQ_FileLookup)

Collection filtering issues

The exact collection filtering hasn't been defined yet. I think tying query fields together using logical OR is probably the best choice. Lucene will automatically move the best matches to the top of the result list. However, there should be one exception: If IDs are given (artistid, releaseid, discid, puid), then the results should always match those IDs. An example how a filter on the track collection could be built:

AND artistid AND releaseid AND ( title OR duration OR ... )

If both artist and artistid are given, the artist parameter should be ignored (as with releaseid and release). The lucene query syntax is never used. That means, you can always put data read from file tags into your queries without having to escape special characters.

Discussion

  • When using ?name=... it says it retrieves resources with a matching name (or title). What kind of "matching"?
  • How exactly does ?duration=... work - does it support range or fuzzy matching for example?
  • If multiple filter arguments are given, how do they combine? Is it allowed to repeat a filter argument (e.g. ?artist=X&artist=Y) ?
  • AFAICT nowhere does it say what exactly all the ?inc options do. I can guess most of them, in a vague sort of way, but for example is ?inc=artist-rel artist releases or artist relationships (or something else)? If these have already been defined, then a pointer to that definition would be helpful.