MusicBrainz Database/Schema: Difference between revisions

From MusicBrainz Wiki
Jump to navigationJump to search
(→‎Primary entities: adding 'series')
No edit summary
Line 103: Line 103:


=== tag table, & the *_tag and _tag_raw tables ===
=== tag table, & the *_tag and _tag_raw tables ===
All but the ''area'' and ''url'' primary entities (i.e. ''artist'', ''label'', ''place'', ''recording'', ''release'', ''release_group'', & ''work'') have ''_tag'' and ''_tag_raw'' tables, with the same structure. Both the ''_tag'' and ''_tag_raw'' tables contain two foreign keys, linked to the associated entity and to the ''tag'' table. The ''_raw_tag'' tables contain a foreign key, '''editor''', which specifies who added the tag, while the ''_tag'' tables instead contain a '''count''' of how many times a tag is applied to a particular entity, and a '''last_updated''' timestamp.
All but the ''area'' and ''url'' primary entities (i.e. ''artist'', ''label'', ''place'', ''recording'', ''release'', ''release_group'', & ''work'') have ''_tag'' and ''_tag_raw'' tables, with the same structure. Both the ''_tag'' and ''_tag_raw'' tables contain two foreign keys, linked to the associated entity and to the ''tag'' table. The ''_raw_tag'' tables contain a foreign key, '''editor''', which specifies who added the tag, while the ''_tag'' tables instead contain a '''count''' of how many times a tag is applied to a particular entity, and a '''last_updated''' timestamp. For privacy reasons, the "_raw" tables aren't included in the database dumps.


The ''tag'' table contains the actual names of the tags, and a ''ref_count'' (which is not actually updated anymore, and should be removed).
The ''tag'' table contains the actual names of the tags, and a ''ref_count'' (which is not actually updated anymore, and should be removed).
Line 111: Line 111:


=== *_rating_raw & *_meta tables ===
=== *_rating_raw & *_meta tables ===
There are 5 ''_rating_raw'' tables, one for each of the primary entities that can be rated: ''artist'', ''label'', ''recording'', ''release_group'', & ''work''. The ''_rating_raw'' tables all have the same structure, consisting of the actual rating, expressed as an integer between 0 and 100, and two foreign keys, one linked to the associated entity, and the other linked to the ''editor'' table, to specifying who made the rating.
There are 5 ''_rating_raw'' tables, one for each of the primary entities that can be rated: ''artist'', ''label'', ''recording'', ''release_group'', & ''work''. The ''_rating_raw'' tables all have the same structure, consisting of the actual rating, expressed as an integer between 0 and 100, and two foreign keys, one linked to the associated entity, and the other linked to the ''editor'' table, to specifying who made the rating. For privacy reasons, the "_raw" tables aren't included in the database dumps.


The aggregate rating of each entity is stored in the corresponding ''_meta'' table, in the '''rating''' field. There is also a '''rating_count''' field that specifies how many ratings have been entered for the entity.
The aggregate rating of each entity is stored in the corresponding ''_meta'' table, in the '''rating''' field. There is also a '''rating_count''' field that specifies how many ratings have been entered for the entity.

Revision as of 11:30, 28 September 2017

Primary entities

Each primary entity has a main table with the same name as the entity, containing its basic data.

Area

A country, region, city or the like.

Areas that can be used for filling in the Release country field of releases are listed, by ID, in the country_area table.

Artist

An artist is generally a musician, a group of musicians, or another music professional (composer, engineer, illustrator, producer, etc.)

Event

An event refers to an organised event which people can attend, and is relevant to MusicBrainz. Generally this means live performances, like concerts and festivals.

Label

Labels represent mostly (but not only) imprints.

Place

A venue, studio or other place where music is performed, recorded, engineered, etc.

Recording

Represents a unique mix or edit. Has title, artist credit, duration, list of PUIDs and ISRCs. Examples (all are different Recordings):

  • Album version of the track "Into the Blue" by "Moby"
  • Remix "Into the Blue (Buzz Boys Main Room Mayhem mix)" by "Moby"
  • Remix "Into the Blue (Underground mix)" by "Moby"

Release

Real-world release object you can buy in your music store. It has release date and country, list of catalog number and label pairs, packaging type and release status. Examples:

  • 1984 US release of "The Wall" by "Pink Floyd", release on label "Columbia Records" with catalog number "C2K 36183" and UPC "074643618328", it's an official release and comes with two CDs in jewel case.

Release group

Represents an abstract "album" (or "single", or "EP") entity. Technically it's a group of releases, with a specified type. Examples:

Series

A series is a sequence of separate release groups, releases, recordings, works or events with a common theme. The theme is usually prominent in the branding of the entities in the series and the individual entities will often have been given a number indicating the position in the series.

URL

This entity represents a URL pointing to a resource external to MusicBrainz, i.e. an official homepage, a site where music can be acquired, an entry in another database, etc.

Work

One layer above recordings ("song", "composition", etc.). While a recording represents audio data, a work represents the composition behind the recording. Relationships are used to link recordings and works.

  • Song "Into the Blue" by "Moby" -- all the recordings listed above will be linked to this object

Secondary entities

Each secondary entity has a main table with the same name as the entity, containing its basic data.

Artist credit

List of artists, variations of artist names and pieces of text to join the artist names. Examples:

Medium

This entity represents a piece of media, included in a release. It contains information about the format of the media, its position in the release, an optional title, and most importantly, a list of tracks. CD TOCs are attached to mediums, not releases or release_groups.

Examples:

  • CD1 of the 1984 US release of "The Wall" by "Pink Floyd"
  • CD2 of the 2005 UK release of "Aerial" by "Kate Bush", named "A Sky of Honey"

Track

This entity is not visible to users on its own, only in the context of a release. It has an MBID, and contains a link to a recording, a title, artist credit and position on its associated medium.

Schema

The SQL scripts that create the schema can be found in our source code repository.

This schema diagram shows the core database tables and relationships between them. Core entities are blue, mostly-static lists are yellow, and external identifiers are red. There is also a higher resolution schema diagram.

ngs.png

*_alias tables

The primary entities area, artist, label, place and work have _alias tables, all of which have the same structure. They contain the alternate names for instances of those entities.

annotation table, & the *_annotation tables

All the primary entities except url (i.e. area, artist, label, place, recording, release, release_group, & work) has a corresponding _annotation table that links entities of that type to entries in the main annotation table which contains the actual text of the annotation, along with the changelog and the identity of the editor who created it.

edit table & the edit_* tables

Edits to the database are stored in the edit table, with the edit notes/comments stored in edit_note. The other edit_ tables are used to link edits to the entity they modify; there is one table for each of the 9 primary entities (area, artist, label, place, recording, release, release_group, url, & work). They all have the same structure: just two foreign keys, one to the entity table and the other to edit table.

*_ipi & *_isni tables

The primary entities artist and label can have multiple IPI and ISNI codes attached to them. These tables are used to store those mappings. They contain a foreign key pointing to the entity, the actual value of the code, and edits_pending and created fields with the usual meaning.

tag table, & the *_tag and _tag_raw tables

All but the area and url primary entities (i.e. artist, label, place, recording, release, release_group, & work) have _tag and _tag_raw tables, with the same structure. Both the _tag and _tag_raw tables contain two foreign keys, linked to the associated entity and to the tag table. The _raw_tag tables contain a foreign key, editor, which specifies who added the tag, while the _tag tables instead contain a count of how many times a tag is applied to a particular entity, and a last_updated timestamp. For privacy reasons, the "_raw" tables aren't included in the database dumps.

The tag table contains the actual names of the tags, and a ref_count (which is not actually updated anymore, and should be removed).

*_gid_redirect tables

There are 10 _gid_redirect tables, one for each of the 9 primary entities (area, artist, label, place, recording, release, release_group, url & work) plus one for tracks. They are used to redirect one MBID to another when entities are merged.

*_rating_raw & *_meta tables

There are 5 _rating_raw tables, one for each of the primary entities that can be rated: artist, label, recording, release_group, & work. The _rating_raw tables all have the same structure, consisting of the actual rating, expressed as an integer between 0 and 100, and two foreign keys, one linked to the associated entity, and the other linked to the editor table, to specifying who made the rating. For privacy reasons, the "_raw" tables aren't included in the database dumps.

The aggregate rating of each entity is stored in the corresponding _meta table, in the rating field. There is also a rating_count field that specifies how many ratings have been entered for the entity.

The release_group_meta table also contains other fields, and the release entity, even though it cannot be rated, still has a release_meta table, which is used to store other data.

*_type tables

There are 12 _type tables, which are simply mappings between strings and ID numbers, representing various sets of types; areas, artists, labels, places, and works each have a _type and an _alias_type table. release_groups have release_group_primary_type & release_group_secondary_type tables. Recordings, releases and urls do not have _type tables.

Relationship table structure

ngs-ars.png

l_* tables

There are tables for every possible combination of the 9 primary entities (area, artist, label, place, recording, release, release_group, url & work) all prefixed with l_ and all with the same format. Two of them are shown in the diagram. They contain a field, edits_pending that is a count of pending changes to the AR, a last_updated field, and three foreign keys: link that points back to the associated entry in the link table, and entity0 and entity1 that point to the associated entry in the corresponding entity table (i.e. artist, recording, url, etc.).

There are two tables that help to avoid unnecessary duplication:

link & link_attribute tables

The link table contains the begin and end date info, and the link_type foreign key field that specifies what kind of AR it is. It also has a count of how many other attributes that particular link has in the attribute_count field, and a created field that specifies when it was created.

Each AR attribute either applies to a particular AR or it doesn't. The link_attribute table stores this information, having a record for each attribute_type (a foreign key field for link_attribute_type) of each link.

link_type, link_attribute_type, & link_type_attribute_type tables

The AR types and attributes are defined in these three tables, shown at the top of the diagram. They can only be modified by the AR editors.

The link_type table defines the types of ARs available. AR types are arranged in a number of trees, for ease of finding. This tree structure is expressed with the parent and child_order fields; parent is the id of the parent AR type, or null if it's at the root, and child_order orders the children of a given parent AR type. Each AR type has a unique uuid, stored in the gid field, for use in permalinks and external applications. The link between a particular AR type and the corresponding l_ table is formed by the entity_type0 and entity_type1 fields.

The attributes are themselves defined in the link_attribute_type table. Like AR types, attributes form a number of trees (the vast majority of them are individual musical instruments). Besides the parent and child_order fields shared with the link_type table, the link_attribute_table also has a root field, showing the root of the tree that the attribute is part of. Attributes also have names and descriptions which appear in various places where they are displayed, as well as gids and a last_updated timestamp.

The link_type_attribute_type table specifies what attributes can be applied to particular types of ARs; it has the necessary foreign key fields (link_type and attribute_type) and it also specifies how many instances of the attribute (or one of its children) can be added to the particular AR type in the min and max fields. Currently, most of them allow the attributes to merely be present or absent, while a few allow any number of copies of the attribute, or none. The "creative commons licensed download" attribute has to be included exactly once, while the instrument attribute (of the instrument type AR), requires at least one instance.

Cover Art Archive table structure

cover art.png

The Cover Art Archive table structure is fairly simple. The cover_art table stores the actual cover art and associations to edits. art_type stores the acceptable cover art types, and cover_art_type links the two together. release_group_cover_art links a release group to the release whose cover art should represent the release group.

This image also shows the links to tables in the main diagram (as well as the edit table), but not their full schemas; please see other diagrams or the real schema specification for details.

There is one view not shown, which is the index_listing view -- this makes for a slightly nicer interface than joining the tables manually, by providing an array of cover art types and easy pointers for is_front and is_back. Otherwise it largely resembles the cover_art table.

CDStubs table structure

cdstubs.png

The CDStubs table structure consists of only 3 tables: cdtoc_raw, release_raw and track_raw. The release_raw table contains basic metadata about the stub release, such as the artist and title.

The individual tracks of each stub release are in the track_raw table, which holds the track title and its track number (in the sequence column). A track's artist can be different than the artist of the release, so there's an artist column which can optionally contain the name of the track's artist.

Each CDStub needs an associated Disc ID, which is stored in the cdtoc_raw table, together with the track count of the disc, as well as the leadout and track offset (Disc ID Calculation contains more information about the latter two).

Undocumented tables

The following tables have not yet been documented in this page. Help is gratefully appreciated!

CREATE TABLE application

CREATE TABLE artist_deletion

CREATE TABLE artist_credit_name (

CREATE TABLE autoeditor_election
CREATE TABLE autoeditor_election_vote

CREATE TABLE cdtoc

CREATE TABLE editor
CREATE TABLE editor_language (
CREATE TABLE editor_preference
CREATE TABLE editor_subscribe_artist
CREATE TABLE editor_subscribe_artist_deleted
CREATE TABLE editor_subscribe_collection
CREATE TABLE editor_subscribe_label
CREATE TABLE editor_subscribe_label_deleted
CREATE TABLE editor_subscribe_editor

CREATE TABLE gender (

CREATE TABLE iso_3166_1 (
CREATE TABLE iso_3166_2 (
CREATE TABLE iso_3166_3 (

CREATE TABLE isrc
CREATE TABLE iswc (

CREATE TABLE label_deletion

CREATE TABLE language

CREATE TABLE link_creditable_attribute_type (
CREATE TABLE link_attribute_credit (

CREATE TABLE editor_collection
CREATE TABLE editor_collection_release
CREATE TABLE editor_oauth_token
CREATE TABLE editor_watch_preferences
CREATE TABLE editor_watch_artist
CREATE TABLE editor_watch_release_group_type
CREATE TABLE editor_watch_release_status

CREATE TABLE medium_cdtoc
CREATE TABLE medium_format

CREATE TABLE replication_control

CREATE TABLE release_country (
CREATE TABLE release_unknown_country (
CREATE TABLE release_coverart
CREATE TABLE release_label (
CREATE TABLE release_packaging
CREATE TABLE release_status

CREATE TABLE release_group_secondary_type_join (

CREATE TABLE script
CREATE TABLE script_language

CREATE TABLE tag_relation

CREATE TABLE medium_index

CREATE TABLE vote

CREATE TABLE work_attribute_type (
CREATE TABLE work_attribute_type_allowed_value (
CREATE TABLE work_attribute (