MusicBrainz Database/Schema: Difference between revisions
(→Work: Advanced Realtionships should be used to link recording and work.) |
(→Event Art Archive table structure: Underline the repetition with the CAA section) |
||
(121 intermediate revisions by 20 users not shown) | |||
Line 1: | Line 1: | ||
{{Work in progress}} |
|||
== Objects == |
|||
This page is intended for developers querying the MusicBrainz database directly through '''PostgreSQL'''. |
|||
=== Artist === |
|||
== Overview == |
|||
Same as our current [[Artist|artist]] entity, but with new "country" and "gender" attributes. |
|||
The SQL scripts that create the schema can be found in our [https://github.com/metabrainz/musicbrainz-server/tree/master/admin/sql source code repository]. |
|||
=== Artist Credit === |
|||
The database is structured around primary [[Entity|entities]] which can be edited, searched for, referred to by MBID, and linked to each other through [[Relationship|relationships]] or foreign key constraints in some case. |
|||
List of artists, variations of artist names and pieces of text to join the artist names. Examples: |
|||
Secondary entities cannot be linked through relationships, instead they are directly linked through foreign key constraints only. |
|||
Each primary entity type has a main table sometimes accompanied by complementary tables of which a few are unique but most are common. |
|||
* The first diagram below shows the main tables for primary entity types and unique tables connecting these tables, that is, beyond relationships. The main tables are highlighted. All tables are shortened to focus on foreign key constraints. |
|||
* "''[[Artist:0383dadf-2a4e-4d10-a46a-e9e041da8eb3|Queen]] & [[Artist:5441c29d-3602-4898-b1a1-b77fa23b8e50|David Bowie]]''" -- two artists ("Queen" and "David Bowie"), no name variations, joined with " & " |
|||
* The second diagram additionally shows the main tables for primary entity types and tables unique to some of these, that is, beyond aliases, annotations, edits, redirects, relationships, ratings, tags. |
|||
* "''[[Artist:86e2e2ad-6d1b-44fd-9463-b6683718a1cc|Jean-Michel Jarre]]*''" -- one artist ("Jean Michel Jarre"), name variation "Jean-Michel Jarre" |
|||
* The third diagram shows all the tables but is too dense to just distinguish foreign key constraints for example, thus the many following sections. |
|||
* "''[[Artist:4d78d8f5-f2e4-4eaa-86b2-952307aabd9f|Tracy W. Bush]], [[Artist:ef8d7a94-64a3-4362-b81f-b5999fb246de|Derek Duke]], [[Artist:7fd43bc8-b5ff-45ed-8ce5-1b0c17114a9e|Jason Hayes]] and [[Artist:2465d7da-7da3-42ea-8440-e701775d856b|Glenn Stafford]]''" -- four artists, no name variations, joined with ", ", ", ", ", " and " and ". |
|||
[[Image:entity_network_overview.svg|x600px|class=zoomable|Overview of tables connecting core entities]] |
|||
=== Release Group === |
|||
[[Image:entity_network_details.svg|x600px|class=zoomable|Tables specifically related to core entities]] |
|||
[[Image:soup.svg|x600px|class=zoomable|All tables (exhaustive but indigestible)]] |
|||
== Primary entities == |
|||
Represents an abstract "album" entity. Technically it's a group of releases, with a specified type. The type is the same as current [[Release Type|release type]]. Examples: |
|||
Each primary entity has a main table with the same name as the entity, containing its basic data. |
|||
=== [[Area]] === |
|||
* Single "''Under Pressure''" by "''[[Artist:0383dadf-2a4e-4d10-a46a-e9e041da8eb3|Queen]] & [[Artist:5441c29d-3602-4898-b1a1-b77fa23b8e50|David Bowie]]''" |
|||
* Album "''The Wall''" by "''[[Artist:83d91898-7763-47d7-b03b-b92132375c47|Pink Floyd]]''" |
|||
{| |
|||
=== Release === |
|||
|[[Image:area_entity_details.svg|60px|class=zoomable|Tables for the “area” core entity type properties]] |
|||
|A country, region, city or the like. |
|||
Areas that can be used for filling in the ''Release country'' field of releases are listed, by ID, in the <code>country_area</code> table. |
|||
Real-world release object you can buy in your music store. It has release date and country, list of catalog number and label pairs, [[Next Generation Schema/Release Packaging|packaging type]] and release status. Example: |
|||
The table <code>area_containment</code> is [https://github.com/metabrainz/musicbrainz-server/blob/88e3d5ce55179aa23e9deccf171e4e1a77efe46e/INSTALL.md#build-materialized-tables materialized (m)]. |
|||
* 1984 US release of "''The Wall''" by "''[[Artist:83d91898-7763-47d7-b03b-b92132375c47|Pink Floyd]]''", release on label "''Columbia Records''" with catalog number "C2K 36183" and UPC "074643618328", it's an official release and comes with two CDs in jewel case. |
|||
|} |
|||
=== |
=== [[Artist]] === |
||
{| |
|||
Piece of media, included in a release. Contains information about the format, position in the release and an optional title. Has attached CD TOCs. Example: |
|||
|[[Image:artist_entity_details.svg|60px|class=zoomable|Tables for the “artist” core entity type properties]] |
|||
|An artist is generally a musician, a group of musicians, or another music professional (composer, engineer, illustrator, producer, etc.) |
|||
The tables <code>artist_release</code> and <code>artist_release_group</code> are [https://github.com/metabrainz/musicbrainz-server/blob/88e3d5ce55179aa23e9deccf171e4e1a77efe46e/INSTALL.md#build-materialized-tables materialized (m)]. |
|||
* CD1 of the 1984 US release of "''The Wall''" by "''Pink Floyd''" |
|||
|} |
|||
* CD2 of the 2005 UK release of "''Aerial''" by "''Kate Bush''", named "''A Sky of Honey''" |
|||
=== |
=== [[Event]] === |
||
{| |
|||
Simple list of tracks. Tracklists are used by mediums. |
|||
|[[Image:event_entity_details.svg|60px|class=zoomable|Tables for the “event” core entity type properties]] |
|||
|An event refers to an organised event which people can attend, and is relevant to MusicBrainz. Generally this means live performances, like concerts and festivals. |
|||
|} |
|||
=== |
=== [[Genre]] === |
||
{| |
|||
This object is not visible to users on it's own, only in the context of a tracklist. It contains a link to a recording, title, artist credit and the position on the tracklist. |
|||
|[[Image:genre_entity_details.svg|60px|class=zoomable|Tables for the “genre” core entity type properties]] |
|||
|A genre is a descriptor for the style and conventions followed by a piece of music. |
|||
|} |
|||
=== |
=== [[Instrument]] === |
||
{| |
|||
Represents unique audio data. Has title, artist credit, duration, list of [[PUID]]s and [[ISRC]]s. Examples (all are different Recordings): |
|||
|[[Image:instrument_entity_details.svg|60px|class=zoomable|Tables for the “instrument” core entity type properties]] |
|||
|Instruments are devices created or adapted to make musical sounds. We also list common instrument groupings (such as string quartet) as instruments. |
|||
|} |
|||
=== [[Label]] === |
|||
{| |
|||
|[[Image:label_entity_details.svg|60px|class=zoomable|Tables for the “label” core entity type properties]] |
|||
|Labels represent mostly (but not only) imprints. |
|||
|} |
|||
=== [[Place]] === |
|||
{| |
|||
|[[Image:place_entity_details.svg|60px|class=zoomable|Tables for the “place” core entity type properties]] |
|||
|A venue, studio or other place where music is performed, recorded, engineered, etc. |
|||
|} |
|||
=== [[Recording]] === |
|||
{| |
|||
|[[Image:recording_entity_details.svg|60px|class=zoomable|Tables for the “recording” core entity type properties]] |
|||
|Represents a unique mix or edit. Has title, artist credit, duration, list of [[ISRC]]s. Examples (all are different Recordings): |
|||
* Album version of the track "''Into the Blue''" by "''Moby''" |
* Album version of the track "''Into the Blue''" by "''Moby''" |
||
Line 49: | Line 87: | ||
* Remix "''Into the Blue (Underground mix)''" by "''Moby''" |
* Remix "''Into the Blue (Underground mix)''" by "''Moby''" |
||
The table <code>recording_first_release_date</code> is [https://github.com/metabrainz/musicbrainz-server/blob/88e3d5ce55179aa23e9deccf171e4e1a77efe46e/INSTALL.md#build-materialized-tables materialized (m)]. |
|||
=== Work === |
|||
|} |
|||
=== [[Release]] === |
|||
One layer above recordings ("song", "composition", etc.). While recording represents audio data, work represents the composition behind the recording. Advanced Realtionships should be used to link recording and work. |
|||
{| |
|||
|[[Image:release_entity_details.svg|60px|class=zoomable|Tables for the “release” core entity type properties]] |
|||
|Real-world release object you can buy in your music store. It has release date and country, list of catalog number and label pairs, [[Release/Packaging|packaging type]] and release status. Examples: |
|||
* 1984 US release of "The Wall" by "[[Artist:83d91898-7763-47d7-b03b-b92132375c47|Pink Floyd]]", release on label "Columbia Records" with catalog number "C2K 36183" and UPC "074643618328", it's an official release and comes with two CDs in jewel case. |
|||
The tables <code>artist_release</code> and <code>release_first_release_date</code> are [https://github.com/metabrainz/musicbrainz-server/blob/88e3d5ce55179aa23e9deccf171e4e1a77efe46e/INSTALL.md#build-materialized-tables materialized (m)]. |
|||
|} |
|||
=== [[Release Group|Release group]] === |
|||
{| |
|||
|[[Image:release_group_entity_details.svg|60px|class=zoomable|Tables for the “release group” core entity type properties]] |
|||
|Represents an abstract "album" (or "single", or "EP") entity. Technically it's a group of releases, with a specified type. Examples: |
|||
* Single "Under Pressure" by "[[Artist:0383dadf-2a4e-4d10-a46a-e9e041da8eb3|Queen]] & [[Artist:5441c29d-3602-4898-b1a1-b77fa23b8e50|David Bowie]]" |
|||
* Album "The Wall" by "[[Artist:83d91898-7763-47d7-b03b-b92132375c47|Pink Floyd]]" |
|||
The table <code>artist_release_group</code> is [https://github.com/metabrainz/musicbrainz-server/blob/88e3d5ce55179aa23e9deccf171e4e1a77efe46e/INSTALL.md#build-materialized-tables materialized (m)]. |
|||
|} |
|||
=== [[Series]] === |
|||
{| |
|||
|[[Image:series_entity_details.svg|60px|class=zoomable|Tables for the “series” core entity type properties]] |
|||
|A series is a sequence of separate release groups, releases, recordings, works or events with a common theme. The theme is usually prominent in the branding of the entities in the series and the individual entities will often have been given a number indicating the position in the series. |
|||
|} |
|||
=== [[URL]] === |
|||
{| |
|||
|[[Image:url_entity_details.svg|60px|class=zoomable|Tables for the “URL” core entity type properties]] |
|||
|This entity represents a URL pointing to a resource external to MusicBrainz, i.e. an official homepage, a site where music can be acquired, an entry in another database, etc. |
|||
|} |
|||
=== [[Work]] === |
|||
{| |
|||
|[[Image:work_entity_details.svg|60px|class=zoomable|Tables for the “work” core entity type properties]] |
|||
|One layer above ''recording''s ("song", "composition", etc.). While a ''recording'' represents audio data, a ''work'' represents the composition behind the ''recording''. Relationships are used to link ''recording''s and ''work''s. |
|||
* Song "''Into the Blue''" by "''Moby''" -- all the recordings listed above will be linked to this object |
* Song "''Into the Blue''" by "''Moby''" -- all the recordings listed above will be linked to this object |
||
|} |
|||
== Secondary entities == |
|||
Each secondary entity has a main table with the same name as the entity, containing its basic data. |
|||
=== [[Artist Credit|Artist credit]] === |
|||
List of artists, variations of artist names and pieces of text to join the artist names. Examples: |
|||
* "[[Artist:0383dadf-2a4e-4d10-a46a-e9e041da8eb3|Queen]] & [[Artist:5441c29d-3602-4898-b1a1-b77fa23b8e50|David Bowie]]" -- two artists ("Queen" and "David Bowie"), no name variations, joined with " & " |
|||
* "[[Artist:86e2e2ad-6d1b-44fd-9463-b6683718a1cc|Jean-Michel Jarre]]" -- one artist ("Jean Michel Jarre"), name variation "Jean-Michel Jarre" |
|||
* "[[Artist:4d78d8f5-f2e4-4eaa-86b2-952307aabd9f|Tracy W. Bush]], [[Artist:ef8d7a94-64a3-4362-b81f-b5999fb246de|Derek Duke]], [[Artist:7fd43bc8-b5ff-45ed-8ce5-1b0c17114a9e|Jason Hayes]] and [[Artist:2465d7da-7da3-42ea-8440-e701775d856b|Glenn Stafford]]" -- four artists, no name variations, joined with commas and an "and". |
|||
=== [[Medium]] === |
|||
This entity represents a piece of media, included in a release. It contains information about the format of the media, its position in the release, an optional title, and most importantly, a list of ''track''s. CD TOCs are attached to ''medium''s, not ''release''s or ''release_group''s. |
|||
'''Examples''': |
|||
* CD1 of the 1984 US release of "The Wall" by "Pink Floyd" |
|||
* CD2 of the 2005 UK release of "Aerial" by "Kate Bush", named "A Sky of Honey" |
|||
=== Track === |
|||
This entity is not visible to users on its own, only in the context of a release. It has an MBID, and contains a link to a recording, a title, artist credit and position on its associated medium. |
|||
== Entity complementary data == |
|||
=== Aliases === |
|||
All the primary entities except URL (i.e. area, artist, event, genre, instrument, label, place, recording, release, release group, series and work) have <code>*_alias</code> tables, all of which have the same structure. They contain alternate names for instances of those entities. |
|||
=== Annotations === |
|||
All the primary entities except URL (i.e. area, artist, event, genre, instrument, label, place, recording, release, release group, series and work) have a corresponding <code>*_annotation</code> table that links entities of that type to entries in the main <code>annotation</code> table which contains the actual text of the annotation, along with the changelog and the identity of the editor who created it. |
|||
=== Edits === |
|||
Edits to the database are stored in <code>edit</code> table, with the edit notes/comments stored in <code>edit_note</code>. The 13 other <code>edit_*</code> tables are used to link edits to the entity they modify; there is one such table for each primary entity type. They all have the same structure: just two foreign keys, one to the entity table and the other to <code>edit</code> table. |
|||
=== IPIs and ISNIs === |
|||
The primary entities ''artist'' and ''label'' can have multiple [[IPI]] and [[ISNI]] codes attached to them. These tables <code>*_ipi</code> and <code>*_isni</code> are used to store those mappings. They contain a foreign key pointing to the entity, the actual value of the code, and <code>edits_pending</code> and ''created'' fields with the usual meaning. |
|||
=== MBID redirects === |
|||
There are 15 <code>*_gid_redirect</code> tables, one for each of the 13 primary entity types, plus ones for ''track'' and ''artist_credit''. They are used to redirect one MBID to another when entities are merged. |
|||
=== Ratings === |
|||
There is one <code>*_rating_raw</code> table for each of the primary entities that can be rated: artist, event, label, place, recording, release group and work. The <code>*_rating_raw</code> tables all have the same structure, consisting of the actual rating, expressed as an integer between 0 and 100, and two foreign keys, one linked to the associated entity, and the other linked to the <code>editor</code> table, to specifying who made the rating. For privacy reasons, the <code>*_rating_raw</code> tables aren't included in the database dumps. |
|||
The aggregate rating of each entity is stored in the corresponding <code>_meta</code> table, in the '''rating''' field. There is also a '''rating_count''' field that specifies how many ratings have been entered for the entity. |
|||
The <code>release_group_meta</code> table also contains other fields, and the ''release'' entity, even though it cannot be rated, still has a <code>release_meta</code> table, which is used to store other data. |
|||
=== Tags === |
|||
All primary entities but the genre and URL (i.e. area, artist, event, instrument, label, place, recording, release, release group, series and work) have <code>*_tag</code> and <code>*_tag_raw</code> tables, with the same structure. These tables contain two foreign keys, linked to the associated entity and to the <code>tag</code> table. The <code>*_tag_raw</code> tables contain a foreign key, '''editor''', which specifies who added the tag, while the <code>*_tag</code> tables instead contain a '''count''' of how many times a tag is applied to a particular entity, and a '''last_updated''' timestamp. For privacy reasons, the <code>*_tag_raw</code> tables aren't included in the database dumps. |
|||
The <code>tag</code> table contains the actual names of the tags, and a ''ref_count'' indicating how often the tag has been used. |
|||
=== Types === |
|||
<code>*_type</code> tables are simply mappings between strings and ID numbers, representing various sets of types. Areas, artists, events, instruments, labels, places, series and works are the only primary entities with each have a <code>*_type</code> and an <code>*_alias_type</code> table. Release groups have <code>release_group_primary_type</code> and <code>release_group_secondary_type</code> tables. Genres, recordings and releases only have <code>*_alias_type</code> tables, and URLs do not have <code>*_type</code> tables at all. |
|||
== Relationship table structure == |
|||
The first below diagram shows minimal foreign keys needed to define a relationship between two entities. [https://musicbrainz.org/relationships/artist-work Artist-Work] is just an example of a relationship type. The main table is highlighted. All other tables are shortened to focus on foreign key constraints. |
|||
The second diagram complementarily shows tables for the detailed representation of a link. The main tables are highlighted. |
|||
[[Image:relationship_overview.svg|x300px|class=zoomable|Tables for relationship connections]] |
|||
[[Image:relationship_details.svg|x300px|class=zoomable|Tables for relationship properties]] |
|||
=== l_* tables === |
|||
There are tables for every possible combination of the primary entities (''area'', ''artist'', ''event'', ''instrument'', ''genre'', ''label'', ''place'', ''recording'', ''release'', ''release_group'', ''series'', ''url'' and ''work'') all prefixed with ''l_'' and all with the same format. Two of them are shown in the diagram. They contain a field, '''edits_pending''' that is a count of pending changes to the AR, a '''last_updated''' field, and three foreign keys: '''link''' that points back to the associated entry in the ''link'' table, and '''entity0''' and '''entity1''' that point to the associated entry in the corresponding entity table (i.e. '''artist''', '''recording''', '''url''', etc.). |
|||
There are two tables that help to avoid unnecessary duplication: |
|||
=== link & link_attribute tables === |
|||
The ''link'' table contains the begin and end date info, and the '''link_type''' foreign key field that specifies what kind of AR it is. It also has a count of how many other attributes that particular link has in the '''attribute_count''' field, and a '''created''' field that specifies when it was created. |
|||
Each AR attribute either applies to a particular AR or it doesn't. The ''link_attribute'' table stores this information, having a record for each '''attribute_type''' (a foreign key field for ''link_attribute_type'') of each '''link'''. |
|||
=== link_type, link_attribute_type, & link_type_attribute_type tables === |
|||
The AR types and attributes are defined in these three tables, shown at the top of the diagram. They can only be modified by the AR editors. |
|||
The ''link_type'' table defines the types of ARs available. AR types are arranged in a number of trees, for ease of finding. This tree structure is expressed with the '''parent''' and '''child_order''' fields; '''parent''' is the '''id''' of the parent AR type, or null if it's at the root, and '''child_order''' orders the children of a given parent AR type. Each AR type has a unique uuid, stored in the ''gid'' field, for use in permalinks and external applications. The link between a particular AR type and the corresponding ''l_'' table is formed by the '''entity_type0''' and '''entity_type1''' fields. |
|||
The attributes are themselves defined in the ''link_attribute_type'' table. Like AR types, attributes form a number of trees (the vast majority of them are individual musical instruments). Besides the '''parent''' and '''child_order''' fields shared with the ''link_type'' table, the ''link_attribute_table'' also has a '''root''' field, showing the root of the tree that the attribute is part of. Attributes also have '''name'''s and '''description'''s which appear in various places where they are displayed, as well as '''gid'''s and a '''last_updated''' timestamp. |
|||
The ''link_type_attribute_type'' table specifies what attributes can be applied to particular types of ARs; it has the necessary foreign key fields ('''link_type''' and '''attribute_type''') and it also specifies how many instances of the attribute (or one of its children) can be added to the particular AR type in the '''min''' and '''max''' fields. Currently, most of them allow the attributes to merely be present or absent, while a few allow any number of copies of the attribute, or none. The "creative commons licensed download" attribute has to be included exactly once, while the ''instrument'' attribute (of the ''instrument'' type AR), requires at least one instance. |
|||
== Cover Art Archive table structure == |
|||
{| |
|||
|[[Image:cover_art_details.svg|x300px|class=zoomable|Tables for cover art]] |
|||
|The [[Cover Art Archive]] table structure is fairly simple. The ''cover_art'' table stores the actual cover art and associations to edits. ''art_type'' stores the acceptable cover art types, and ''cover_art_type'' links the two together. ''release_group_cover_art'' links a release group to the release whose cover art should represent the release group. |
|||
This image also shows the links to tables in the main diagram (as well as the ''edit'' table), but not their full schemas; please see other diagrams or the real schema specification for details. |
|||
There is one view not shown, which is the ''index_listing'' view -- this makes for a slightly nicer interface than joining the tables manually, by providing an array of cover art types and easy pointers for is_front and is_back. Otherwise it largely resembles the cover_art table. |
|||
|} |
|||
== Event Art Archive table structure == |
|||
{| |
|||
|[[Image:event_art_details.svg|x300px|class=zoomable|Tables for event art]] |
|||
|The [[Event Art Archive]] table structure is even more simple. The ''event_art'' table stores the actual event art and associations to edits. ''art_type'' stores the acceptable event art types, and ''event_art_type'' links the two together. |
|||
This image also shows the links to tables in the main diagram (as well as the ''edit'' table), but not their full schemas; please see other diagrams or the real schema specification for details. |
|||
There is one view not shown, which is the ''index_listing'' view -- this makes for a slightly nicer interface than joining the tables manually, by providing an array of event art types and easy pointers for is_front and is_back. Otherwise it largely resembles the ''event_art'' table. |
|||
|} |
|||
== CDStubs table structure == |
|||
{| |
|||
|[[Image:cdstub_details.svg|x300px|class=zoomable|Tables for CD stub]] |
|||
|The [[CDStub]]s table structure consists of only 3 tables: cdtoc_raw, release_raw and track_raw. |
|||
The ''release_raw'' table contains basic metadata about the stub release, such as the artist and title. |
|||
The individual tracks of each stub release are in the ''track_raw'' table, which holds the track title and its track number (in the ''sequence'' column). A track's artist can be different than the artist of the release, so there's an ''artist'' column which can optionally contain the name of the track's artist. |
|||
Each CDStub needs an associated [[Disc ID]], which is stored in the ''cdtoc_raw'' table, together with the track count of the disc, as well as the leadout and track offset ([[Disc_ID_Calculation#Audio_CD|Disc ID Calculation]] contains more information about the latter two). |
|||
=== Label === |
|||
|} |
|||
== Frequent use cases == |
|||
Same as our current [[Label|label]] entity. |
|||
== |
=== Tagging audio files === |
||
{| |
|||
|[[Image:for_tagging_audio_files.svg|x100px|class=zoomable|Tables for tagging audio files]] |
|||
|When roughly tagging audio files with basic metadata, you will mainly be looking for: |
|||
* Primary entity types: artist, recording, release, release group |
|||
* Secondary entity types: artist credit, medium, track |
|||
|} |
|||
=== Finding song authors === |
|||
This diagrams shows the core database tables and relationships between them: |
|||
{| |
|||
|[[Image:for_finding_song_authors.svg|x100px|class=zoomable|Tables for finding song authors]] |
|||
|When retrieving the authors (composer, lyricist…) of a recorded song, you will mainly be looking for: |
|||
* Primary entity types: artist, recording, work |
|||
* Secondary entity types: artist credit |
|||
* Relationship types: artist-work, recording-work |
|||
|} |
|||
== Undocumented tables == |
|||
[[Image:ngs.png|500px|border]] |
|||
The following tables have not yet been documented in this page. Help is gratefully appreciated! |
|||
<pre> |
|||
Because we are adding new core entities, and there are plans for adding more, ARs between them need to be optimized. Only the absolutely necessary information is in l_*_* tables, everything else is shared: |
|||
CREATE TABLE artist_credit_name |
|||
CREATE TABLE cdtoc |
|||
[[Image:ngs-ars.png|300px|border]] |
|||
CREATE TABLE gender |
|||
== Data Dictionary == |
|||
CREATE TABLE iso_3166_1 |
|||
* artist = @id + @gid + name + sort name + (type) + (begin date) + (end date) + (gender) + (country) + (comment) + {artist alias} |
|||
CREATE TABLE iso_3166_2 |
|||
* artist alias = @id + name |
|||
CREATE TABLE iso_3166_3 |
|||
* artist credit = @id + 1{artist + name + (join phrase)} |
|||
* label = @id + @gid + name + sort name + (label code) + (type) + (begin date) + (end date) + (country) + (comment) + {label alias} |
|||
* label alias = @id + name |
|||
* release group = @id + @gid + artist credit + name + (type) + (comment) + {release} |
|||
* release = @id + @gid + artist credit + name + (barcode) + (date) + (country) + (status) + (packaging) + (language) + (script) + {label + catalog number} + {medium} + (comment) |
|||
* medium = @id + position + (format) + (name) + tracklist |
|||
* tracklist = @id + 1{recording + name + artist credit + length + position} |
|||
* recording = @id + @gid + artist credit + name + length + (comment) + {puid} + {isrc} |
|||
* work = @id + @gid + artist credit + name + {work alias} |
|||
* work alias = @id + name |
|||
* date = (year) + (month) + (day) |
|||
CREATE TABLE isrc |
|||
== Data conversion == |
|||
CREATE TABLE iswc |
|||
CREATE TABLE language |
|||
See [[Next_Generation_Schema/Migration_Notes]]. |
|||
CREATE TABLE link_creditable_attribute_type |
|||
For [[Advanced_Relationships|ARs]] see [[Next Generation Schema/Release_Relationships_Conversion]] and [[Next Generation Schema/Track Relationships Conversion]] |
|||
CREATE TABLE link_attribute_credit |
|||
CREATE TABLE medium_cdtoc |
|||
Some edge cases are compiled in [[Next Generation Schema/Conversion_Edge_Cases]]. |
|||
CREATE TABLE medium_format |
|||
CREATE TABLE medium_index |
|||
== Code Changes == |
|||
CREATE TABLE release_country |
|||
CREATE TABLE release_unknown_country |
|||
CREATE TABLE release_coverart |
|||
CREATE TABLE release_label |
|||
CREATE TABLE release_packaging |
|||
CREATE TABLE release_status |
|||
CREATE TABLE release_group_secondary_type_join |
|||
TODO |
|||
CREATE TABLE replication_control |
|||
http://java.sun.com/blueprints/corej2eepatterns/Patterns/DataAccessObject.html |
|||
CREATE TABLE script |
|||
=== Caching === |
|||
CREATE TABLE script_language |
|||
CREATE TABLE tag_relation |
|||
* memcache: |
|||
** MBID are cached in an additional layer to avoid duplication, instead of UUID -> object, there is UUID -> rowid, and then rowid -> object. |
|||
** Used to store: |
|||
*** artist credits |
|||
*** artists |
|||
*** labels |
|||
*** links |
|||
*** (maybe release groups?) |
|||
* Local in-process cache: |
|||
** Very fast access, but we have no control over invalidating it in all processes, so there needs to be low TTL for all cached items. |
|||
** Used to store: |
|||
*** release group types |
|||
*** release statuses |
|||
*** release packaging types |
|||
*** medium formats |
|||
*** artist types |
|||
*** label types |
|||
*** countries |
|||
*** scripts |
|||
*** languages |
|||
CREATE TABLE work_attribute_type |
|||
== User Interface == |
|||
CREATE TABLE work_attribute_type_allowed_value |
|||
CREATE TABLE work_attribute |
|||
</pre> |
|||
The following tables are not publicly available for download thus are not relevant in this page. |
|||
* [[Next Generation Schema/Release Editor]] |
|||
* [[Next_Generation_Schema/Edit_User_Interface]] |
|||
<pre> |
|||
== Web Service == |
|||
CREATE TABLE application |
|||
CREATE TABLE autoeditor_election |
|||
* [[Next Generation Schema/XML]] |
|||
CREATE TABLE autoeditor_election_vote |
|||
CREATE TABLE edit_note_change |
|||
* [[Next Generation Schema/SearchServerXML]] |
|||
CREATE TABLE editor |
|||
* [[LinkedBrainz]] |
|||
CREATE TABLE editor_collection |
|||
CREATE TABLE editor_collection_release |
|||
CREATE TABLE editor_language |
|||
CREATE TABLE editor_oauth_token |
|||
CREATE TABLE editor_preference |
|||
CREATE TABLE editor_subscribe_artist |
|||
CREATE TABLE editor_subscribe_artist_deleted |
|||
CREATE TABLE editor_subscribe_collection |
|||
CREATE TABLE editor_subscribe_label |
|||
CREATE TABLE editor_subscribe_label_deleted |
|||
CREATE TABLE editor_subscribe_editor |
|||
CREATE TABLE editor_subscribe_series |
|||
CREATE TABLE editor_subscribe_series_deleted |
|||
CREATE TABLE unreferenced_row_log |
|||
== Roadmap == |
|||
CREATE TABLE vote |
|||
See [[Next Generation Schema/Roadmap]]. |
|||
</pre> |
|||
[[Category:Development]] |
[[Category:Development]] [[Category:To Be Reviewed]] [[Category:WikiDocs Page]] |
Latest revision as of 15:11, 28 June 2024
Status: This page is a work in progress. |
This page is intended for developers querying the MusicBrainz database directly through PostgreSQL.
Overview
The SQL scripts that create the schema can be found in our source code repository.
The database is structured around primary entities which can be edited, searched for, referred to by MBID, and linked to each other through relationships or foreign key constraints in some case. Secondary entities cannot be linked through relationships, instead they are directly linked through foreign key constraints only. Each primary entity type has a main table sometimes accompanied by complementary tables of which a few are unique but most are common.
- The first diagram below shows the main tables for primary entity types and unique tables connecting these tables, that is, beyond relationships. The main tables are highlighted. All tables are shortened to focus on foreign key constraints.
- The second diagram additionally shows the main tables for primary entity types and tables unique to some of these, that is, beyond aliases, annotations, edits, redirects, relationships, ratings, tags.
- The third diagram shows all the tables but is too dense to just distinguish foreign key constraints for example, thus the many following sections.
Primary entities
Each primary entity has a main table with the same name as the entity, containing its basic data.
Area
A country, region, city or the like.
Areas that can be used for filling in the Release country field of releases are listed, by ID, in the The table |
Artist
An artist is generally a musician, a group of musicians, or another music professional (composer, engineer, illustrator, producer, etc.)
The tables |
Event
An event refers to an organised event which people can attend, and is relevant to MusicBrainz. Generally this means live performances, like concerts and festivals. |
Genre
A genre is a descriptor for the style and conventions followed by a piece of music. |
Instrument
Instruments are devices created or adapted to make musical sounds. We also list common instrument groupings (such as string quartet) as instruments. |
Label
Labels represent mostly (but not only) imprints. |
Place
A venue, studio or other place where music is performed, recorded, engineered, etc. |
Recording
Represents a unique mix or edit. Has title, artist credit, duration, list of ISRCs. Examples (all are different Recordings):
The table |
Release
Real-world release object you can buy in your music store. It has release date and country, list of catalog number and label pairs, packaging type and release status. Examples:
The tables |
Release group
Represents an abstract "album" (or "single", or "EP") entity. Technically it's a group of releases, with a specified type. Examples:
The table |
Series
URL
This entity represents a URL pointing to a resource external to MusicBrainz, i.e. an official homepage, a site where music can be acquired, an entry in another database, etc. |
Work
Secondary entities
Each secondary entity has a main table with the same name as the entity, containing its basic data.
Artist credit
List of artists, variations of artist names and pieces of text to join the artist names. Examples:
- "Queen & David Bowie" -- two artists ("Queen" and "David Bowie"), no name variations, joined with " & "
- "Jean-Michel Jarre" -- one artist ("Jean Michel Jarre"), name variation "Jean-Michel Jarre"
- "Tracy W. Bush, Derek Duke, Jason Hayes and Glenn Stafford" -- four artists, no name variations, joined with commas and an "and".
Medium
This entity represents a piece of media, included in a release. It contains information about the format of the media, its position in the release, an optional title, and most importantly, a list of tracks. CD TOCs are attached to mediums, not releases or release_groups.
Examples:
- CD1 of the 1984 US release of "The Wall" by "Pink Floyd"
- CD2 of the 2005 UK release of "Aerial" by "Kate Bush", named "A Sky of Honey"
Track
This entity is not visible to users on its own, only in the context of a release. It has an MBID, and contains a link to a recording, a title, artist credit and position on its associated medium.
Entity complementary data
Aliases
All the primary entities except URL (i.e. area, artist, event, genre, instrument, label, place, recording, release, release group, series and work) have *_alias
tables, all of which have the same structure. They contain alternate names for instances of those entities.
Annotations
All the primary entities except URL (i.e. area, artist, event, genre, instrument, label, place, recording, release, release group, series and work) have a corresponding *_annotation
table that links entities of that type to entries in the main annotation
table which contains the actual text of the annotation, along with the changelog and the identity of the editor who created it.
Edits
Edits to the database are stored in edit
table, with the edit notes/comments stored in edit_note
. The 13 other edit_*
tables are used to link edits to the entity they modify; there is one such table for each primary entity type. They all have the same structure: just two foreign keys, one to the entity table and the other to edit
table.
IPIs and ISNIs
The primary entities artist and label can have multiple IPI and ISNI codes attached to them. These tables *_ipi
and *_isni
are used to store those mappings. They contain a foreign key pointing to the entity, the actual value of the code, and edits_pending
and created fields with the usual meaning.
MBID redirects
There are 15 *_gid_redirect
tables, one for each of the 13 primary entity types, plus ones for track and artist_credit. They are used to redirect one MBID to another when entities are merged.
Ratings
There is one *_rating_raw
table for each of the primary entities that can be rated: artist, event, label, place, recording, release group and work. The *_rating_raw
tables all have the same structure, consisting of the actual rating, expressed as an integer between 0 and 100, and two foreign keys, one linked to the associated entity, and the other linked to the editor
table, to specifying who made the rating. For privacy reasons, the *_rating_raw
tables aren't included in the database dumps.
The aggregate rating of each entity is stored in the corresponding _meta
table, in the rating field. There is also a rating_count field that specifies how many ratings have been entered for the entity.
The release_group_meta
table also contains other fields, and the release entity, even though it cannot be rated, still has a release_meta
table, which is used to store other data.
Tags
All primary entities but the genre and URL (i.e. area, artist, event, instrument, label, place, recording, release, release group, series and work) have *_tag
and *_tag_raw
tables, with the same structure. These tables contain two foreign keys, linked to the associated entity and to the tag
table. The *_tag_raw
tables contain a foreign key, editor, which specifies who added the tag, while the *_tag
tables instead contain a count of how many times a tag is applied to a particular entity, and a last_updated timestamp. For privacy reasons, the *_tag_raw
tables aren't included in the database dumps.
The tag
table contains the actual names of the tags, and a ref_count indicating how often the tag has been used.
Types
*_type
tables are simply mappings between strings and ID numbers, representing various sets of types. Areas, artists, events, instruments, labels, places, series and works are the only primary entities with each have a *_type
and an *_alias_type
table. Release groups have release_group_primary_type
and release_group_secondary_type
tables. Genres, recordings and releases only have *_alias_type
tables, and URLs do not have *_type
tables at all.
Relationship table structure
The first below diagram shows minimal foreign keys needed to define a relationship between two entities. Artist-Work is just an example of a relationship type. The main table is highlighted. All other tables are shortened to focus on foreign key constraints. The second diagram complementarily shows tables for the detailed representation of a link. The main tables are highlighted.
l_* tables
There are tables for every possible combination of the primary entities (area, artist, event, instrument, genre, label, place, recording, release, release_group, series, url and work) all prefixed with l_ and all with the same format. Two of them are shown in the diagram. They contain a field, edits_pending that is a count of pending changes to the AR, a last_updated field, and three foreign keys: link that points back to the associated entry in the link table, and entity0 and entity1 that point to the associated entry in the corresponding entity table (i.e. artist, recording, url, etc.).
There are two tables that help to avoid unnecessary duplication:
link & link_attribute tables
The link table contains the begin and end date info, and the link_type foreign key field that specifies what kind of AR it is. It also has a count of how many other attributes that particular link has in the attribute_count field, and a created field that specifies when it was created.
Each AR attribute either applies to a particular AR or it doesn't. The link_attribute table stores this information, having a record for each attribute_type (a foreign key field for link_attribute_type) of each link.
link_type, link_attribute_type, & link_type_attribute_type tables
The AR types and attributes are defined in these three tables, shown at the top of the diagram. They can only be modified by the AR editors.
The link_type table defines the types of ARs available. AR types are arranged in a number of trees, for ease of finding. This tree structure is expressed with the parent and child_order fields; parent is the id of the parent AR type, or null if it's at the root, and child_order orders the children of a given parent AR type. Each AR type has a unique uuid, stored in the gid field, for use in permalinks and external applications. The link between a particular AR type and the corresponding l_ table is formed by the entity_type0 and entity_type1 fields.
The attributes are themselves defined in the link_attribute_type table. Like AR types, attributes form a number of trees (the vast majority of them are individual musical instruments). Besides the parent and child_order fields shared with the link_type table, the link_attribute_table also has a root field, showing the root of the tree that the attribute is part of. Attributes also have names and descriptions which appear in various places where they are displayed, as well as gids and a last_updated timestamp.
The link_type_attribute_type table specifies what attributes can be applied to particular types of ARs; it has the necessary foreign key fields (link_type and attribute_type) and it also specifies how many instances of the attribute (or one of its children) can be added to the particular AR type in the min and max fields. Currently, most of them allow the attributes to merely be present or absent, while a few allow any number of copies of the attribute, or none. The "creative commons licensed download" attribute has to be included exactly once, while the instrument attribute (of the instrument type AR), requires at least one instance.
Cover Art Archive table structure
The Cover Art Archive table structure is fairly simple. The cover_art table stores the actual cover art and associations to edits. art_type stores the acceptable cover art types, and cover_art_type links the two together. release_group_cover_art links a release group to the release whose cover art should represent the release group.
This image also shows the links to tables in the main diagram (as well as the edit table), but not their full schemas; please see other diagrams or the real schema specification for details. There is one view not shown, which is the index_listing view -- this makes for a slightly nicer interface than joining the tables manually, by providing an array of cover art types and easy pointers for is_front and is_back. Otherwise it largely resembles the cover_art table. |
Event Art Archive table structure
The Event Art Archive table structure is even more simple. The event_art table stores the actual event art and associations to edits. art_type stores the acceptable event art types, and event_art_type links the two together.
This image also shows the links to tables in the main diagram (as well as the edit table), but not their full schemas; please see other diagrams or the real schema specification for details. There is one view not shown, which is the index_listing view -- this makes for a slightly nicer interface than joining the tables manually, by providing an array of event art types and easy pointers for is_front and is_back. Otherwise it largely resembles the event_art table. |
CDStubs table structure
The CDStubs table structure consists of only 3 tables: cdtoc_raw, release_raw and track_raw.
The release_raw table contains basic metadata about the stub release, such as the artist and title. The individual tracks of each stub release are in the track_raw table, which holds the track title and its track number (in the sequence column). A track's artist can be different than the artist of the release, so there's an artist column which can optionally contain the name of the track's artist. Each CDStub needs an associated Disc ID, which is stored in the cdtoc_raw table, together with the track count of the disc, as well as the leadout and track offset (Disc ID Calculation contains more information about the latter two). |
Frequent use cases
Tagging audio files
Finding song authors
Undocumented tables
The following tables have not yet been documented in this page. Help is gratefully appreciated!
CREATE TABLE artist_credit_name CREATE TABLE cdtoc CREATE TABLE gender CREATE TABLE iso_3166_1 CREATE TABLE iso_3166_2 CREATE TABLE iso_3166_3 CREATE TABLE isrc CREATE TABLE iswc CREATE TABLE language CREATE TABLE link_creditable_attribute_type CREATE TABLE link_attribute_credit CREATE TABLE medium_cdtoc CREATE TABLE medium_format CREATE TABLE medium_index CREATE TABLE release_country CREATE TABLE release_unknown_country CREATE TABLE release_coverart CREATE TABLE release_label CREATE TABLE release_packaging CREATE TABLE release_status CREATE TABLE release_group_secondary_type_join CREATE TABLE replication_control CREATE TABLE script CREATE TABLE script_language CREATE TABLE tag_relation CREATE TABLE work_attribute_type CREATE TABLE work_attribute_type_allowed_value CREATE TABLE work_attribute
The following tables are not publicly available for download thus are not relevant in this page.
CREATE TABLE application CREATE TABLE autoeditor_election CREATE TABLE autoeditor_election_vote CREATE TABLE edit_note_change CREATE TABLE editor CREATE TABLE editor_collection CREATE TABLE editor_collection_release CREATE TABLE editor_language CREATE TABLE editor_oauth_token CREATE TABLE editor_preference CREATE TABLE editor_subscribe_artist CREATE TABLE editor_subscribe_artist_deleted CREATE TABLE editor_subscribe_collection CREATE TABLE editor_subscribe_label CREATE TABLE editor_subscribe_label_deleted CREATE TABLE editor_subscribe_editor CREATE TABLE editor_subscribe_series CREATE TABLE editor_subscribe_series_deleted CREATE TABLE unreferenced_row_log CREATE TABLE vote