History:How Advanced Relationships Works

From MusicBrainz Wiki

Advanced MusicBrainz Data Relationships

Status: This is the (slightly updated) original proposal of AdvancedRelationships. It explains quite well how AR works. The AdvancedRelationshipTypes in this text are however outdated proposals.



Notes:

  • Artistst will have to be split into persons and performers.The details are not clear yet.
  • In this proposal albums are named releases. Note however that this here is different thant the ReleaseGroups proposal.

The interesting bit of this page is "So what can we represent using this model?", so please skip on ahead if this first bit is too technical and boring.

Relationships

Please help brainstorm about missing relationships that we should keep track of in MB. These relationship types as well as the relationships themselves should be user moderatable.

What a Relationship Is

Relationships would primarily be between exactly two "things", where things can be any other the existing data we keep, e.g. contributor, release, track. Also there will be new "things", e.g. URL, and maybe we should even include other stuff such as TRM and discid, and, let's be brave, moderator and moderation.

There will be relationships between more than two things, but that gets really complex quite soon, and you can achieve a lot just by sticking to two.

Relationship Ordering

Two-entity relationships then will be defined in database tables called "link_A_B", where A and B are the types of things being linked together, and A sorts alphabetically not after B.

Examples:

  • link_release_release
  • link_contributor_release
  • link_contributor_contributor
  • link_contributor_track

but not:

  • link_track_release (that's the wrong way round)

About Link Types

Each link will have a "link type", describing in what way "thing A" is linked to "thing B". The name of the link type should observe the "direction" of the link, e.g. you think of the link type always as "A link-type B", not "B link-type A".

Example:

  • link_contributor_release might have a link type of "performed" (contributor performed release) but not "performed by" (since "contributor performed by release" is wrong).
  • link_track_track - to represent cover versions, we should pick EITHER "is a cover of" OR "is covered by". Note that, whichever we pick, it's clear which of "A" and "B" is the older/newer track in each case - for "is a cover of", A is newer than B; and for "is covered by", it's the opposite. -- djce
  • linguistic usage note: maybe it's my UK english but I would never say song A "is covered by" song B. I would either say "song A is covered by band B" or "song A is the original version of song B". ( I might say "covered as" if the title was changed but only if I also referenced the band name, which may be implicit in this system. e.g. He's Gonna Step On You Again by John Kongos was covered as Step On by The Happy Mondays.) -- bawjaws
  • The issue of covers is also much more complicated than this model allows for. See WhatIsACover.

So what can we represent using this model?

This is the interesting bit :-)

For each of the pairs of entity types, what link types could we create?

link_contributor_contributor

  • "is a member of". contributors could be created which are collaborations of two or more other contributors, e.g. "vs", "feat." etc. Those collaboration contributors would maybe be marked as such (i.e. a "is collaboration" yes/no flag). The individual contributors could be linked to the collaboration contributor with a "is a member of" link.

Example: "ATB vs Faithless". "ATB" is-a-member-of "ATB vs Faithless". Ditto Faithless. Then it's easy to find relationships between contributors.

(2004-08-29 -- TarragonAllen) VersusMeansDifferentThings in this context: two artists collaborating, one artist remixing another, or two tracks mixed together to make a "new" song (usually by a third artist). All three of these things can be mapped using other links. "Versus" should not be treated as anything particularly special, as it's not - it's just a different way that a link between artists or tracks is expressed. A simpler example of how links between artists can be expressed in different ways would be for the track "Under Pressure" which is a collaboration between Queen and David Bowie. This can be expressed as "Queen & David Bowie" or as "Queen (feat. David Bowie)". In both cases the link is a collaboration, but they are expressed differently. How can we retain this type of expressed as information without creating new collaboration types for every new combination?

Does there need to be an "is collaboration" yes/no flag? Surely if a contributor has other contributors linked to it via "is a member of" then it is a collaboration. If it doesn't, it isn't. Also, is this only for special cases e.g. versus mixes or guest vocalists or would this generalise so that Neil Young "is a member of" Buffalo Springfield & CSNY as well as Neil Young and Crazy Horse. Then does the relationship recurse so that each individual band member "is a member of" Crazy Horse, which (as a unit) "is a member of" Neil Young and Crazy Horse or do they have to be linked directly. And to go totally RockFamilyTrees does "is a member of" not need to be linked to a timespan (possibly discontiguous--if that is a real word--i.e. people leaving and joining again) -- bawjaws

Don't think of having flags in these kinds of relationships. We're going to allow the moderators to create arbitrary new relationships between the basic data elements as described in ArbitraryURLFeature . Thus you should think of classifying collaboration has one link type that indicates cooperation and one that does not. For example, you could define cooperation as follows:

cooperation

  • guestArtist
    • shortTerm
    • mediumTerm
  • cooperation
  • versus
  • memberOf
    • pastMember
    • currentMember

To express a current member of a band you'd have a contributor-contributor link with the link type: "cooperation/memberOf/currentMember" We will need to think up the basic starting relationships to give it some thought out structure to start with, but moderators can tune and expand the relationship types that MB can capture. -- Ruaok

  • influenced by (better expressed like role/event?)

(why might this be better expressed this way? "cont. A was influenced by cont. B" seems fine to me - djce)

link_release_release

  • sequence (CDs in boxset)

Mmmm, that seems about-face to me. How about having a "meta" release which represents the whole box set, then having a "is part of" relationship? e.g. "Remasters II (disc 2) is part of Remasters II". The only thing missing from that is any idea of sequence - the sequence is encoded within the release name (as it is in the current schema). Is that a bad thing? - djce

  • edition (e.g. to link the 12-track version of some album to the 12-track-plus-two-bonus-tracks version of it). Not sure in that case what the link would be... in the case of many editions of the same release, you'd need to either nominate one release as the "master" release, and link all the others to it as "editions"; or, create another "meta-release", and link the various editions to the meta-release is "is an edition of". Or something. - djce

link_track_track

  • Covered by (I'd prefer the opposite link, "A is a cover of B" - picky details :) - djce)
  • Remixed by (ditto)
  • Sampled (e.g. "'Ice Ice Baby' samples 'Under Pressure'")

link_contributor_release

  • various roles (see below)

link_release_track

  • includes (as in, "release X includes track Y" ? Where does the track sequence number go? - djce)

Track sequence numbers are the example I couldn't remember. I see a few possible solutions:

  1. Not convert the current album-track relationships into the new model. But given that we have two cases of needing more attribute data, this is probably not an acceptable solution
  2. Allow a link to carry a type, and one data element (date, int or string, or one each)
  3. Allow a link to carry an arbitrary amount of data.

Allowing a link to carry data can also make the events possible, since each event requires at least a date. The other alternative is to break the data down into more atomic chunks. RDF is a good example of how aribtrary relationships with only one data construct, namely the triple. But the number of simple relationships goes up significantly if you express complex relationships in simple terms. Allowing links to carry data makes the data structure more complex, but it keeps the number of relationships in the DB tables to a reasonable limit.

Ruaok

link_contributor_track

  • various roles (see below)

Roles

  • band
  • conductor
  • composer
  • soloist
  • vocals
  • lead guitar
  • audio engineer
  • Could link types be URI's? This would allow for a extendable system... --Scottm not sure what you're getting at... care to elaborate? --djce Sure. Basically, you need a way to identify different types of relationships. You could create some relation_types constants file, but that is hard to extend. On the other hand, if you allow for URI's to represent relationship types, you could include new types really quickly by adding URI's. For the more advanced side of things, by placing a RDF file at the address pointed to by the URI it could be self describing, with information about how to display itself for instance. There are many possibilities for a super-extensible system based upon advanced relationships. Basically, I want the average technically advanced user to be able to add a advanced relationship. In the future, a stand-alone server could be created to host new types of relationships on others servers. --Scottm Kinda. The web interface will allow moderators to create new link types. So if someone decides that they need to define a field with shoe-size for a contributor, they can go ahead and do that. In other words, the community gets to decide what MB keeps track of, not the admins. As part of the RDF schema definition we will have a mapping of each of these roles to a unique URI, so that each of thes roles can be expressed in RDF. --Ruaok

Events and other non-link data

e.g. Joined band, band formed, album released, band member dies, concert, track released

We will need a separate "event" entity which contains things like the event type and at least one date and maybe some prose, and then link other entities to that event. Example: event A is type=band-split date=whenever-it-was; contributor-event links contributor "Spice Girls" to event A with a link type of, err, "subject" (since the named contributor is the subject of the event).

Better example:

  • event "B": type="band formed"
  • contributor-event link: event="event B" link-type="subject" cont.="Audioslave"
  • contributor-event link: event="event B" link-type="initial member" cont.="Chris Cornell"
  • contributor-event link: event="event B" link-type="initial member" cont.="Tom Morello"
  • contributor-event link: event="event B" link-type="initial member" cont.="Tim Commerford"
  • contributor-event link: event="event B" link-type="initial member" cont.="Brad Wilk"

Just as a note, these could all be formatted as RDF triples. A lot of the relationship information could easily be placed into a RDF database. Unfortunately, though, many implementations are not up to the job.

URL Links

In order to also have URL links we should add the following tables: URL, artist_url, album_url, track_url, url_type as described in ArbitraryURLFeature (see 'table structures by djce' section)

Events to model

  • Band formed/disbanded
    • Date
  • Member joined/left/died
    • Member, date
  • Album/single release
    • album, date
  • Concert
    • city, venue, date
  • Concert tour start/end
    • concert name, date
  • Artist sues label
    • label, date, outcome
  • Artist wins award
    • Type of award, date

(more later, I'm tired Ruaok)

This seems a perfect place for FOAF (http://rdfweb.org/foaf/) Things done with FOAF:

-- Svanzoest

Aliases

Additional thoughts triggered by

This is what I think we want to model:

A Person has a RealName. A Person can have one or more Aliases. A Person can play one or more Roles under an Alias. By creating Role relationships against Aliases we can model things like (there are probably some factual mistakes in these examples):

Alias                   Person
"Eminem"'s real name is "Marshall Mathers"

Release                 Role          Alias
"The Slim Shady LP" was "written by"  "Eminem"
"The Slim Shady LP" was "produced by" "Dr. Dre"

    Role                 Recording                Alias
The "mix engineering" on "Bad Meets Evil" was by "Mr. B"

Implementation might be easier if there's always an alias which is the same as Person:RealName so that Role relationships are always made against Alias (and not Person for a Person with no Aliases) e.g.

Role               Release                Alias
"Art direction" on "The Slim Shady LP" by Mark LeRoy    

Alias                       Person
"Mark LeRoy"'s real name is "Mark LeRoy"

Paul