User:JimDeLaHunt: Difference between revisions

From MusicBrainz Wiki
Jump to navigationJump to search
(→‎Setting up a private server: log: Move all content off to :User:JimDeLaHunt/ArchivePage/Setting up a private server: log)
(→‎Exploring AdvancedRelationship statistics: Move all content off to :User:JimDeLaHunt/ArchivePage/Exploring AdvancedRelationship statistics.)
Line 30: Line 30:
==Exploring AdvancedRelationship statistics==
==Exploring AdvancedRelationship statistics==


This content is now archived at [[:User:JimDeLaHunt/ArchivePage/Exploring AdvancedRelationship statistics]]. [[User:JimDeLaHunt|JimDeLaHunt]] 03:20, 28 October 2011 (UTC)
My goal here is to understand the facts of how [[Advanced Relationships|AdvancedRelationships]] are actually being used in [[MusicBrainz]] now. Since the statistics pages don't appear to provide counts of ARs, and the search pages don't let you search ARs, I'm using SQL statements to look through the actual database tables implementing ARs in my standalone copy of the MB server, and generate specific statistics.

First results: [[User:JimDeLaHunt/AdvancedRelationshipsCensus|JimDeLaHunt/AdvancedRelationshipsCensus]]. Interesting! --JimDeLaHunt 2008-01-26
* Understand how to list the database tables and browse them.
** The [[MusicBrainz Database|MusicBrainzDatabase]] The [[MusicBrainz]] database is built on the Postgres relational database engine. Postgres project site is [http://postgresql.org/ http://postgresql.org/] . A nice Postgres book, ''Practical PostgreSQL'' by John Worsley and Joshua Drake of [http://www.commandprompt.com/ Command Prompt, Inc.] is freely available online (under the Open Publication License, v1.0) at [http://www.commandprompt.com/ppbook/book1 http://www.commandprompt.com/ppbook/book1] . It helped me, as a novice to both Postgres and SQL. More books listed at [http://www.postgresql.org/docs/books/ http://www.postgresql.org/docs/books/] . --JimDeLaHunt 2008-01-20
** The '''psql''' tool is command-line client for browsing the database and executing SQL statements. A copy is in the MB code tree at '''admin/psql'''. Instructions in the ''Practical PostgreSQL'' book and from '''man psql'''. Within psql, the '''\?''' command gives help on psql commands, and '''\h''' gives help on SQL. '''\q''' is the command to quit psql. --JimDeLaHunt 2008-01-20
** In the MB server environment, you need to supply a user name for the database server, and a database name. A straightforward way to do this is via the option '''psql -U username databasename'''. You can find the username in '''cgi-bin/DBDefs.pm'''. Look there for "[[MusicBrainz]]::Server::Database". I won't put the actual username in this page, because the serious student can look it up, and I don't want to make it too easy for vandals. I'll leave out the "-U username databasename" options in examples below, but you should include it. --JimDeLaHunt 2008-01-20
** To get psql to write results from interactive commands to files, use the command '''\o PATH''' to send psql's output to the Linux pathname (relative to the current working directory). Use the command '''\o''' with no parameter to return output to the display. --JimDeLaHunt 2008-01-20
** By default psql formats output nicely into aligned columns with vertical bars separating them, and a pager to give you a page at a time. To get comma-separated values instead, use this sequence of commands '''\pset format unaligned''' ('''\a''' for short), '''\pset fieldsep ,''' ('''\f ,''' for short), '''\pset pager'''. You can use '''\t''' instead of comma if you want tab-separated instead of comma-separated files. '''\pset recordsep '\n' ''' sets the record separator to a newline. '''\H''' generates HTML table syntax. --JimDeLaHunt 2008-01-20
** To generate output in wiki format, use these commands: '''\a \f || \pset recordsep '||\n||' \pset pager'''. --JimDeLaHunt 2008-01-20

* Current [[Database Schema|DatabaseSchema]] and list of tables
** The psql command "'''\l'''" lists all databases on the server. So does the command line "psql -l". It doesn't tell you much more than you know already from cgi-bin/DBDefs.pm. --JimDeLaHunt 2008-01-20
** The psql command "'''\dt'''" lists all tables in the database. An alternate command that gets just the table names is '''select tablename from pg_tables where schemaname = 'public' order by tablename;'''. "'''\dt+'''" shows all tables with the Postgres COMMENT field include. --JimDeLaHunt 2008-01-20
** When I tried this in the 20071014 server code, I got 90 tables. 30 of these are the tables with names beginning "l_" and "lt_", which are for [[Advanced Relationships|AdvancedRelationships]]. Compare these 90 to 25 tables in the [[Database Schema|DatabaseSchema]] diagram. The table names are listed below. --JimDeLaHunt 2008-01-20

{| border="1"
|-
| '''tablename in MB database'''
|-
| select tablename from pg_tables where schemaname = 'public' order by tablename;
|-
| Pending || PendingData || album
|-
| album_amazon_asin || album_cdtoc || albumjoin
|-
| albummeta || albumwords || annotation
|-
| artist || artist_relation || artist_tag
|-
| artistalias || artistwords || automod_election
|-
| automod_election_vote || cdtoc || clientversion
|-
| country || currentstat || editor_subscribe_editor
|-
| gid_redirect || historicalstat || l_album_album
|-
| l_album_artist || l_album_label || l_album_track
|-
| l_album_url || l_artist_artist || l_artist_label
|-
| l_artist_track || l_artist_url || l_label_label
|-
| l_label_track || l_label_url || l_track_track
|-
| l_track_url || l_url_url || label
|-
| label_tag || labelalias || labelwords
|-
| language || link_attribute || link_attribute_type
|-
| lt_album_album || lt_album_artist || lt_album_label
|-
| lt_album_track || lt_album_url || lt_artist_artist
|-
| lt_artist_label || lt_artist_track || lt_artist_url
|-
| lt_label_label || lt_label_track || lt_label_url
|-
| lt_track_track || lt_track_url || lt_url_url
|-
| moderation_closed || moderation_note_closed || moderation_note_open
|-
| moderation_open || moderator || moderator_preference
|-
| moderator_subscribe_artist || moderator_subscribe_label || puid
|-
| puid_stat || puidjoin || puidjoin_stat
|-
| release || release_tag || replication_control
|-
| script || script_language || stats
|-
| tag || track || track_tag
|-
| trackwords || trm || trm_stat
|-
| trmjoin || trmjoin_stat || url
|-
| vote_closed || vote_open || wordlist
|-
| (90 rows)
|}
* Understanding the AR implementation.
** [[Advanced Relationships|AdvancedRelationships]] describe relationships between two entities: Artist, Label, Track, or Release (called "album" here, probably for historical reasons). They are grouped into relationship classes. They can have attributes, such as instrument or vocal part, modifiers like "co-" and "additional", and dates. See the wikidocs starting with [[Advanced Relationships|AdvancedRelationships]] for more information. --JimDeLaHunt 2008-01-20
** The page [http://musicbrainz.org/edit/relationships/link_type_roots.html http://musicbrainz.org/edit/relationships/link_type_roots.html] lists the [[Advanced Relationship Type|AdvancedRelationshipType]]s, starting with the 15 combinations of entity types. Perhaps it's generated from the database? --JimDeLaHunt 2008-01-27
** The page [http://musicbrainz.org/edit/relationships/link_attrs.html http://musicbrainz.org/edit/relationships/link_attrs.html] lists the attributes of ARs, e.g. "co-" or "executive" or an instrument name. Perhaps it's generated from the database too? --JimDeLaHunt 2008-01-27
** There are 32 tables that appear to implement the [[Advanced Relationships|AdvancedRelationships]]. The main 30 are listed in the table below. --JimDeLaHunt 2008-01-20 Two further tables, "<code><nowiki>link_attribute</nowiki></code>" and "<code><nowiki>link_attribute_type</nowiki></code>", are covered below. --JimDeLaHunt 2008-01-21

{| border="1"
|-
| '''Tables implementing [[Advanced Relationships|AdvancedRelationships]] in MB database'''
|-
| select tablename from pg_tables where schemaname = 'public' and (tablename ~ '<code><nowiki>^</nowiki></code>l_' or tablename ~ '<code><nowiki>^</nowiki></code>lt_');
|-
| l_album_album || l_album_artist || l_album_label || l_album_track || l_album_url
|-
| || l_artist_artist || l_artist_label || l_artist_track || l_artist_url
|-
| || || l_label_label || l_label_track || l_label_url
|-
| || || || l_track_track || l_track_url
|-
| || || || || l_url_url
|-
| lt_album_album || lt_album_artist || lt_album_label || lt_album_track || lt_album_url
|-
| || lt_artist_artist || lt_artist_label || lt_artist_track || lt_artist_url
|-
| || || lt_label_label || lt_label_track || lt_label_url
|-
| || || || lt_track_track || lt_track_url
|-
| || || || || lt_url_url
|-
| (30 rows)
|}
*
** Attributes of these tables are summarised below. (Taking <code><nowiki>l_album_album</nowiki></code> and <code><nowiki>lt_album_album</nowiki></code> as examples.) The <code><nowiki>l_</nowiki></code> tables appear to implement the AR itself. The <code><nowiki>lt_</nowiki></code> tables appear to document the values of <code><nowiki>link_type</nowiki></code> in the <code><nowiki>l_</nowiki></code> tables. --JimDeLaHunt 2008-01-21

<pre>musicbrainz_db=> \d l_album_album
Table "public.l_album_album"
Column | Type | Modifiers
------------+---------------+------------------------------------------------------------
id | integer | not null default nextval('l_album_album_id_seq'::regclass)
link0 | integer | not null default 0
link1 | integer | not null default 0
link_type | integer | not null default 0
begindate | character(10) |
enddate | character(10) |
modpending | integer | not null default 0
[index and constraint information omitted --ed.]

musicbrainz_db=> \d lt_album_album
Table "public.lt_album_album"
Column | Type | Modifiers
-----------------+------------------------+-------------------------------------------------------------
id | integer | not null default nextval('lt_album_album_id_seq'::regclass)
parent | integer | not null
childorder | integer | not null default 0
mbid | character(36) | not null
name | character varying(255) | not null
description | text | not null
linkphrase | character varying(255) | not null
rlinkphrase | character varying(255) | not null
attribute | character varying(255) | default ''::character varying
modpending | integer | not null default 0
shortlinkphrase | character varying(255) | not null default ''::character varying
priority | integer | not null default 0
[index and constraint information omitted --ed.]
</pre>
*
** All 15 of the tables whose names begin with "<code><nowiki>l_</nowiki></code>" have the same set of attributes. Similarly, all 15 of the tables beginning with "<code><nowiki>lt_</nowiki></code>" have the same set of attributes. You can tell this by inspection, repeatedly using '''\d l_album_album \d l_album_artist''' etc. Or you can just run this query: <code><nowiki>select nspname, count(relname), attname</nowiki></code> <code><nowiki>from pg_class, pg_attribute, pg_namespace</nowiki></code> <code><nowiki>where attrelid = relfilenode</nowiki></code> <code><nowiki>and relnamespace = pg_namespace.oid </nowiki></code> <code><nowiki>and relname ~ '^lt*_.*[^qy]$'</nowiki></code> <code><nowiki>group by nspname, attname order by attname;</nowiki></code>. The result is a list of 23 attribute names, which are exactly those attributes used in either the 'lt_' or 'l_' tables. Each attribute name occurs either 15 or 30 times in the set of tables. This confirms that the tables have the same attributes. --JimDeLaHunt 2008-01-21
** Two further tables appear to implement the attributes of the ARs (e.g. co- for co-produced, soprano saxophone for performed, etc.) Each row in the "<code><nowiki>link_attribute</nowiki></code>" table stores one attribute for one AR. The name and description of the attributes are stored in rows of the "<code><nowiki>link_attribute_type</nowiki></code>" table. The are summarised below. --JimDeLaHunt 2008-01-21

<pre>musicbrainz_db=> \d link_attribute Table "public.link_attribute"
Column | Type | Modifiers
----------------+-----------------------+-------------------------------------------------------------
id | integer | not null default nextval('link_attribute_id_seq'::regclass)
attribute_type | integer | not null default 0
link | integer | not null default 0
link_type | character varying(32) | not null default ''::character varying
[index and constraint information omitted --ed.]

musicbrainz_db=> \d link_attribute_type Table "public.link_attribute_type"
Column | Type | Modifiers
-------------+------------------------+------------------------------------------------------------------
id | integer | not null default nextval('link_attribute_type_id_seq'::regclass)
parent | integer | not null
childorder | integer | not null default 0
mbid | character(36) | not null
name | character varying(255) | not null
description | text | not null
modpending | integer | not null default 0
[index and constraint information omitted --ed.]
</pre>
*
** This query illustrates the connection between the tables: <code><nowiki>select a.id, a.attribute_type,</nowiki></code> <code><nowiki>l.name,</nowiki></code> <code><nowiki>a.link, a.link_type</nowiki></code> <code><nowiki>from link_attribute as a,</nowiki></code> <code><nowiki>link_attribute_type as l</nowiki></code> <code><nowiki>where link = 82689 and a.attribute_type = l.id;</nowiki></code> --JimDeLaHunt 2008-01-21
** @@@ continue from here

* Query list of ARs of various kinds
* Which ARs can describe either Tracks and Releases? How often are they used?


==comments==
==comments==

Revision as of 03:20, 28 October 2011

JimDeLaHunt

Jim DeLaHunt [ MB: Editor:Jim DeLaHunt | IRC: none yet | Wiki: JimDeLaHunt | Last.fm: none yet | jdlh.com ]
I live in Vancouver, Canada. Our family has a collection of some 500 CDs which are gradually being ripped, encoded, and tagged. Most are opera and classical, but there are other genres in there too.... I really care about having accurate metadata on my opera tracks, because that's an important part of having it show up well in the music players.
more bio... Languages: English, 日本語, etwas Deutsch, un peu de Français, un poco de español.

Tip: Use --~~~~ in wikitext; when you save the page, the system substitutes a nice name and timestamp.

What I'm seized of at MusicBrainz

  • Get good metadata tags for the CD collection I'm ripping.
  • Figure out how enter metadata for my CD collection efficiently.
  • Improving documentation of existing features. Write howto documents recording what I've learned.
  • Maybe write some better data entry UIs or bots to systematically clean up existing data.
  • Look at the ObjectModel and notions like work (composition) as applicable to ClassicalMusic, OperaMusic and MusicalTheatre.
  • Figure out what MusicBrainz is trying to be: "This site is intended to be a free, on-line encyclopedia of music information." (GeneralFAQ), but that still leaves me with questions.
  • Is MB attempting to build a database of true facts and relationships about music separate from particular recordings, and/or a database of facts and relationships about recordings, and/or a service that produces useful metadata tags for digital music files?

Setting up a private server: log

This content is now archived at User:JimDeLaHunt/ArchivePage/Setting up a private server: log. JimDeLaHunt 03:15, 28 October 2011 (UTC)

Exploring AdvancedRelationship statistics

This content is now archived at User:JimDeLaHunt/ArchivePage/Exploring AdvancedRelationship statistics. JimDeLaHunt 03:20, 28 October 2011 (UTC)

comments

Jim, re the last bullet in the "What I'm seized of at MusicBrainz" section, I'd suggest the answer is yes, yes, and yes, but the last only as a side benefit, not a goal within itself. DeleteWhenCooked :) -- BrianSchweitzer 20:15, 19 January 2008 (UTC)

Jim, this was very helpful. My problem was at the end of the whole process I could not see any Releases, only artists and songs. I used MBServer-20070401-2 and mbdump.tar.bz2 from 24-May-2008 -- Trent George 2008-05-27 Template:lang en-native Template:lang ja-fluent Template:lang de-some Template:lang fr-some Template:lang es-some