User:JimDeLaHunt/ArchivePage/Exploring AdvancedRelationship statistics

From MusicBrainz Wiki

This page was my notes from a project to understand the AdvancedRelationships implementation, and to collect some statistics of how the Relationships were used. Originally it was on User:JimDeLaHunt. I'm no longer working on this, and the notes are archived here. User:JimDeLaHunt 03:08, 28 October 2011 (UTC)

Exploring AdvancedRelationship statistics

My goal here is to understand the facts of how AdvancedRelationships are actually being used in MusicBrainz now. Since the statistics pages don't appear to provide counts of ARs, and the search pages don't let you search ARs, I'm using SQL statements to look through the actual database tables implementing ARs in my standalone copy of the MB server, and generate specific statistics.

First results: JimDeLaHunt/AdvancedRelationshipsCensus. Interesting! --JimDeLaHunt 2008-01-26

  • Understand how to list the database tables and browse them.
    • The MusicBrainzDatabase The MusicBrainz database is built on the Postgres relational database engine. Postgres project site is http://postgresql.org/ . A nice Postgres book, Practical PostgreSQL by John Worsley and Joshua Drake of Command Prompt, Inc. is freely available online (under the Open Publication License, v1.0) at http://www.commandprompt.com/ppbook/book1 . It helped me, as a novice to both Postgres and SQL. More books listed at http://www.postgresql.org/docs/books/ . --JimDeLaHunt 2008-01-20
    • The psql tool is command-line client for browsing the database and executing SQL statements. A copy is in the MB code tree at admin/psql. Instructions in the Practical PostgreSQL book and from man psql. Within psql, the \? command gives help on psql commands, and \h gives help on SQL. \q is the command to quit psql. --JimDeLaHunt 2008-01-20
    • In the MB server environment, you need to supply a user name for the database server, and a database name. A straightforward way to do this is via the option psql -U username databasename. You can find the username in cgi-bin/DBDefs.pm. Look there for "MusicBrainz::Server::Database". I won't put the actual username in this page, because the serious student can look it up, and I don't want to make it too easy for vandals. I'll leave out the "-U username databasename" options in examples below, but you should include it. --JimDeLaHunt 2008-01-20
    • To get psql to write results from interactive commands to files, use the command \o PATH to send psql's output to the Linux pathname (relative to the current working directory). Use the command \o with no parameter to return output to the display. --JimDeLaHunt 2008-01-20
    • By default psql formats output nicely into aligned columns with vertical bars separating them, and a pager to give you a page at a time. To get comma-separated values instead, use this sequence of commands \pset format unaligned (\a for short), \pset fieldsep , (\f , for short), \pset pager. You can use \t instead of comma if you want tab-separated instead of comma-separated files. \pset recordsep '\n' sets the record separator to a newline. \H generates HTML table syntax. --JimDeLaHunt 2008-01-20
    • To generate output in wiki format, use these commands: \a \f || \pset recordsep '||\n||' \pset pager. --JimDeLaHunt 2008-01-20
  • Current DatabaseSchema and list of tables
    • The psql command "\l" lists all databases on the server. So does the command line "psql -l". It doesn't tell you much more than you know already from cgi-bin/DBDefs.pm. --JimDeLaHunt 2008-01-20
    • The psql command "\dt" lists all tables in the database. An alternate command that gets just the table names is select tablename from pg_tables where schemaname = 'public' order by tablename;. "\dt+" shows all tables with the Postgres COMMENT field include. --JimDeLaHunt 2008-01-20
    • When I tried this in the 20071014 server code, I got 90 tables. 30 of these are the tables with names beginning "l_" and "lt_", which are for AdvancedRelationships. Compare these 90 to 25 tables in the DatabaseSchema diagram. The table names are listed below. --JimDeLaHunt 2008-01-20
tablename in MB database
select tablename from pg_tables where schemaname = 'public' order by tablename;
Pending PendingData album
album_amazon_asin album_cdtoc albumjoin
albummeta albumwords annotation
artist artist_relation artist_tag
artistalias artistwords automod_election
automod_election_vote cdtoc clientversion
country currentstat editor_subscribe_editor
gid_redirect historicalstat l_album_album
l_album_artist l_album_label l_album_track
l_album_url l_artist_artist l_artist_label
l_artist_track l_artist_url l_label_label
l_label_track l_label_url l_track_track
l_track_url l_url_url label
label_tag labelalias labelwords
language link_attribute link_attribute_type
lt_album_album lt_album_artist lt_album_label
lt_album_track lt_album_url lt_artist_artist
lt_artist_label lt_artist_track lt_artist_url
lt_label_label lt_label_track lt_label_url
lt_track_track lt_track_url lt_url_url
moderation_closed moderation_note_closed moderation_note_open
moderation_open moderator moderator_preference
moderator_subscribe_artist moderator_subscribe_label puid
puid_stat puidjoin puidjoin_stat
release release_tag replication_control
script script_language stats
tag track track_tag
trackwords trm trm_stat
trmjoin trmjoin_stat url
vote_closed vote_open wordlist
(90 rows)
  • Understanding the AR implementation.
    • AdvancedRelationships describe relationships between two entities: Artist, Label, Track, or Release (called "album" here, probably for historical reasons). They are grouped into relationship classes. They can have attributes, such as instrument or vocal part, modifiers like "co-" and "additional", and dates. See the wikidocs starting with AdvancedRelationships for more information. --JimDeLaHunt 2008-01-20
    • The page http://musicbrainz.org/edit/relationships/link_type_roots.html lists the AdvancedRelationshipTypes, starting with the 15 combinations of entity types. Perhaps it's generated from the database? --JimDeLaHunt 2008-01-27
    • The page http://musicbrainz.org/edit/relationships/link_attrs.html lists the attributes of ARs, e.g. "co-" or "executive" or an instrument name. Perhaps it's generated from the database too? --JimDeLaHunt 2008-01-27
    • There are 32 tables that appear to implement the AdvancedRelationships. The main 30 are listed in the table below. --JimDeLaHunt 2008-01-20 Two further tables, "link_attribute" and "link_attribute_type", are covered below. --JimDeLaHunt 2008-01-21
Tables implementing AdvancedRelationships in MB database
select tablename from pg_tables where schemaname = 'public' and (tablename ~ '^l_' or tablename ~ '^lt_');
l_album_album l_album_artist l_album_label l_album_track l_album_url
l_artist_artist l_artist_label l_artist_track l_artist_url
l_label_label l_label_track l_label_url
l_track_track l_track_url
l_url_url
lt_album_album lt_album_artist lt_album_label lt_album_track lt_album_url
lt_artist_artist lt_artist_label lt_artist_track lt_artist_url
lt_label_label lt_label_track lt_label_url
lt_track_track lt_track_url
lt_url_url
(30 rows)
    • Attributes of these tables are summarised below. (Taking l_album_album and lt_album_album as examples.) The l_ tables appear to implement the AR itself. The lt_ tables appear to document the values of link_type in the l_ tables. --JimDeLaHunt 2008-01-21
musicbrainz_db=> \d l_album_album
                              Table "public.l_album_album"
   Column   |     Type      |                         Modifiers
------------+---------------+------------------------------------------------------------
 id         | integer       | not null default nextval('l_album_album_id_seq'::regclass)
 link0      | integer       | not null default 0
 link1      | integer       | not null default 0
 link_type  | integer       | not null default 0
 begindate  | character(10) |
 enddate    | character(10) |
 modpending | integer       | not null default 0
[index and constraint information omitted --ed.]

musicbrainz_db=> \d lt_album_album
                                     Table "public.lt_album_album"
     Column      |          Type          |                          Modifiers
-----------------+------------------------+-------------------------------------------------------------
 id              | integer                | not null default nextval('lt_album_album_id_seq'::regclass)
 parent          | integer                | not null
 childorder      | integer                | not null default 0
 mbid            | character(36)          | not null
 name            | character varying(255) | not null
 description     | text                   | not null
 linkphrase      | character varying(255) | not null
 rlinkphrase     | character varying(255) | not null
 attribute       | character varying(255) | default ''::character varying
 modpending      | integer                | not null default 0
 shortlinkphrase | character varying(255) | not null default ''::character varying
 priority        | integer                | not null default 0
[index and constraint information omitted --ed.]
    • All 15 of the tables whose names begin with "l_" have the same set of attributes. Similarly, all 15 of the tables beginning with "lt_" have the same set of attributes. You can tell this by inspection, repeatedly using \d l_album_album \d l_album_artist etc. Or you can just run this query: select nspname, count(relname), attname from pg_class, pg_attribute, pg_namespace where attrelid = relfilenode and relnamespace = pg_namespace.oid and relname ~ '^lt*_.*[^qy]$' group by nspname, attname order by attname;. The result is a list of 23 attribute names, which are exactly those attributes used in either the 'lt_' or 'l_' tables. Each attribute name occurs either 15 or 30 times in the set of tables. This confirms that the tables have the same attributes. --JimDeLaHunt 2008-01-21
    • Two further tables appear to implement the attributes of the ARs (e.g. co- for co-produced, soprano saxophone for performed, etc.) Each row in the "link_attribute" table stores one attribute for one AR. The name and description of the attributes are stored in rows of the "link_attribute_type" table. The are summarised below. --JimDeLaHunt 2008-01-21
musicbrainz_db=> \d link_attribute                                    Table "public.link_attribute"
     Column     |         Type          |                          Modifiers
----------------+-----------------------+-------------------------------------------------------------
 id             | integer               | not null default nextval('link_attribute_id_seq'::regclass)
 attribute_type | integer               | not null default 0
 link           | integer               | not null default 0
 link_type      | character varying(32) | not null default ''::character varying
[index and constraint information omitted --ed.]

musicbrainz_db=> \d link_attribute_type                                   Table "public.link_attribute_type"
   Column    |          Type          |                            Modifiers
-------------+------------------------+------------------------------------------------------------------
 id          | integer                | not null default nextval('link_attribute_type_id_seq'::regclass)
 parent      | integer                | not null
 childorder  | integer                | not null default 0
 mbid        | character(36)          | not null
 name        | character varying(255) | not null
 description | text                   | not null
 modpending  | integer                | not null default 0
[index and constraint information omitted --ed.]
    • This query illustrates the connection between the tables: select a.id, a.attribute_type, l.name, a.link, a.link_type from link_attribute as a, link_attribute_type as l where link = 82689 and a.attribute_type = l.id; --JimDeLaHunt 2008-01-21
    • @@@ continue from here
  • Query list of ARs of various kinds
  • Which ARs can describe either Tracks and Releases? How often are they used?