User:JimDeLaHunt
JimDeLaHunt
Jim DeLaHunt [ MB: Editor:Jim DeLaHunt | IRC: none yet | Wiki: JimDeLaHunt | Last.fm: none yet | jdlh.com ] |
I live in Vancouver, Canada. Our family has a collection of some 500 CDs which are gradually being ripped, encoded, and tagged. Most are opera and classical, but there are other genres in there too.... I really care about having accurate metadata on my opera tracks, because that's an important part of having it show up well in the music players. |
more bio... Languages: English, 日本語, etwas Deutsch, un peu de Français, un poco de español. |
Tip: Use --~~~~
in wikitext; when you save the page, the system substitutes a nice name and timestamp.
What I'm seized of at MusicBrainz
- Get good metadata tags for the CD collection I'm ripping.
- Figure out how enter metadata for my CD collection efficiently.
- Improving documentation of existing features. Write howto documents recording what I've learned.
- Interesting stats at JimDeLaHunt/AdvancedRelationshipsCensus . --JimDeLaHunt 2008-01-26
- Maybe write some better data entry UIs or bots to systematically clean up existing data.
- Look at the ObjectModel and notions like work (composition) as applicable to ClassicalMusic, OperaMusic and MusicalTheatre.
- Essay: NGS 'Works' should help cut CSG Gordian Knot (2010/December)
- Figure out what MusicBrainz is trying to be: "This site is intended to be a free, on-line encyclopedia of music information." (GeneralFAQ), but that still leaves me with questions.
- Is MB attempting to build a database of true facts and relationships about music separate from particular recordings, and/or a database of facts and relationships about recordings, and/or a service that produces useful metadata tags for digital music files?
Setting up a private server: log
This content is now archived at User:JimDeLaHunt/ArchivePage/Setting up a private server: log. JimDeLaHunt 03:15, 28 October 2011 (UTC)
Exploring AdvancedRelationship statistics
My goal here is to understand the facts of how AdvancedRelationships are actually being used in MusicBrainz now. Since the statistics pages don't appear to provide counts of ARs, and the search pages don't let you search ARs, I'm using SQL statements to look through the actual database tables implementing ARs in my standalone copy of the MB server, and generate specific statistics.
First results: JimDeLaHunt/AdvancedRelationshipsCensus. Interesting! --JimDeLaHunt 2008-01-26
- Understand how to list the database tables and browse them.
- The MusicBrainzDatabase The MusicBrainz database is built on the Postgres relational database engine. Postgres project site is http://postgresql.org/ . A nice Postgres book, Practical PostgreSQL by John Worsley and Joshua Drake of Command Prompt, Inc. is freely available online (under the Open Publication License, v1.0) at http://www.commandprompt.com/ppbook/book1 . It helped me, as a novice to both Postgres and SQL. More books listed at http://www.postgresql.org/docs/books/ . --JimDeLaHunt 2008-01-20
- The psql tool is command-line client for browsing the database and executing SQL statements. A copy is in the MB code tree at admin/psql. Instructions in the Practical PostgreSQL book and from man psql. Within psql, the \? command gives help on psql commands, and \h gives help on SQL. \q is the command to quit psql. --JimDeLaHunt 2008-01-20
- In the MB server environment, you need to supply a user name for the database server, and a database name. A straightforward way to do this is via the option psql -U username databasename. You can find the username in cgi-bin/DBDefs.pm. Look there for "MusicBrainz::Server::Database". I won't put the actual username in this page, because the serious student can look it up, and I don't want to make it too easy for vandals. I'll leave out the "-U username databasename" options in examples below, but you should include it. --JimDeLaHunt 2008-01-20
- To get psql to write results from interactive commands to files, use the command \o PATH to send psql's output to the Linux pathname (relative to the current working directory). Use the command \o with no parameter to return output to the display. --JimDeLaHunt 2008-01-20
- By default psql formats output nicely into aligned columns with vertical bars separating them, and a pager to give you a page at a time. To get comma-separated values instead, use this sequence of commands \pset format unaligned (\a for short), \pset fieldsep , (\f , for short), \pset pager. You can use \t instead of comma if you want tab-separated instead of comma-separated files. \pset recordsep '\n' sets the record separator to a newline. \H generates HTML table syntax. --JimDeLaHunt 2008-01-20
- To generate output in wiki format, use these commands: \a \f || \pset recordsep '||\n||' \pset pager. --JimDeLaHunt 2008-01-20
- Current DatabaseSchema and list of tables
- The psql command "\l" lists all databases on the server. So does the command line "psql -l". It doesn't tell you much more than you know already from cgi-bin/DBDefs.pm. --JimDeLaHunt 2008-01-20
- The psql command "\dt" lists all tables in the database. An alternate command that gets just the table names is select tablename from pg_tables where schemaname = 'public' order by tablename;. "\dt+" shows all tables with the Postgres COMMENT field include. --JimDeLaHunt 2008-01-20
- When I tried this in the 20071014 server code, I got 90 tables. 30 of these are the tables with names beginning "l_" and "lt_", which are for AdvancedRelationships. Compare these 90 to 25 tables in the DatabaseSchema diagram. The table names are listed below. --JimDeLaHunt 2008-01-20
tablename in MB database | ||
select tablename from pg_tables where schemaname = 'public' order by tablename; | ||
Pending | PendingData | album |
album_amazon_asin | album_cdtoc | albumjoin |
albummeta | albumwords | annotation |
artist | artist_relation | artist_tag |
artistalias | artistwords | automod_election |
automod_election_vote | cdtoc | clientversion |
country | currentstat | editor_subscribe_editor |
gid_redirect | historicalstat | l_album_album |
l_album_artist | l_album_label | l_album_track |
l_album_url | l_artist_artist | l_artist_label |
l_artist_track | l_artist_url | l_label_label |
l_label_track | l_label_url | l_track_track |
l_track_url | l_url_url | label |
label_tag | labelalias | labelwords |
language | link_attribute | link_attribute_type |
lt_album_album | lt_album_artist | lt_album_label |
lt_album_track | lt_album_url | lt_artist_artist |
lt_artist_label | lt_artist_track | lt_artist_url |
lt_label_label | lt_label_track | lt_label_url |
lt_track_track | lt_track_url | lt_url_url |
moderation_closed | moderation_note_closed | moderation_note_open |
moderation_open | moderator | moderator_preference |
moderator_subscribe_artist | moderator_subscribe_label | puid |
puid_stat | puidjoin | puidjoin_stat |
release | release_tag | replication_control |
script | script_language | stats |
tag | track | track_tag |
trackwords | trm | trm_stat |
trmjoin | trmjoin_stat | url |
vote_closed | vote_open | wordlist |
(90 rows) |
- Understanding the AR implementation.
- AdvancedRelationships describe relationships between two entities: Artist, Label, Track, or Release (called "album" here, probably for historical reasons). They are grouped into relationship classes. They can have attributes, such as instrument or vocal part, modifiers like "co-" and "additional", and dates. See the wikidocs starting with AdvancedRelationships for more information. --JimDeLaHunt 2008-01-20
- The page http://musicbrainz.org/edit/relationships/link_type_roots.html lists the AdvancedRelationshipTypes, starting with the 15 combinations of entity types. Perhaps it's generated from the database? --JimDeLaHunt 2008-01-27
- The page http://musicbrainz.org/edit/relationships/link_attrs.html lists the attributes of ARs, e.g. "co-" or "executive" or an instrument name. Perhaps it's generated from the database too? --JimDeLaHunt 2008-01-27
- There are 32 tables that appear to implement the AdvancedRelationships. The main 30 are listed in the table below. --JimDeLaHunt 2008-01-20 Two further tables, "
link_attribute
" and "link_attribute_type
", are covered below. --JimDeLaHunt 2008-01-21
Tables implementing AdvancedRelationships in MB database | ||||
select tablename from pg_tables where schemaname = 'public' and (tablename ~ '^ l_' or tablename ~ '^ lt_');
| ||||
l_album_album | l_album_artist | l_album_label | l_album_track | l_album_url |
l_artist_artist | l_artist_label | l_artist_track | l_artist_url | |
l_label_label | l_label_track | l_label_url | ||
l_track_track | l_track_url | |||
l_url_url | ||||
lt_album_album | lt_album_artist | lt_album_label | lt_album_track | lt_album_url |
lt_artist_artist | lt_artist_label | lt_artist_track | lt_artist_url | |
lt_label_label | lt_label_track | lt_label_url | ||
lt_track_track | lt_track_url | |||
lt_url_url | ||||
(30 rows) |
-
- Attributes of these tables are summarised below. (Taking
l_album_album
andlt_album_album
as examples.) Thel_
tables appear to implement the AR itself. Thelt_
tables appear to document the values oflink_type
in thel_
tables. --JimDeLaHunt 2008-01-21
- Attributes of these tables are summarised below. (Taking
musicbrainz_db=> \d l_album_album Table "public.l_album_album" Column | Type | Modifiers ------------+---------------+------------------------------------------------------------ id | integer | not null default nextval('l_album_album_id_seq'::regclass) link0 | integer | not null default 0 link1 | integer | not null default 0 link_type | integer | not null default 0 begindate | character(10) | enddate | character(10) | modpending | integer | not null default 0 [index and constraint information omitted --ed.] musicbrainz_db=> \d lt_album_album Table "public.lt_album_album" Column | Type | Modifiers -----------------+------------------------+------------------------------------------------------------- id | integer | not null default nextval('lt_album_album_id_seq'::regclass) parent | integer | not null childorder | integer | not null default 0 mbid | character(36) | not null name | character varying(255) | not null description | text | not null linkphrase | character varying(255) | not null rlinkphrase | character varying(255) | not null attribute | character varying(255) | default ''::character varying modpending | integer | not null default 0 shortlinkphrase | character varying(255) | not null default ''::character varying priority | integer | not null default 0 [index and constraint information omitted --ed.]
-
- All 15 of the tables whose names begin with "
l_
" have the same set of attributes. Similarly, all 15 of the tables beginning with "lt_
" have the same set of attributes. You can tell this by inspection, repeatedly using \d l_album_album \d l_album_artist etc. Or you can just run this query:select nspname, count(relname), attname
from pg_class, pg_attribute, pg_namespace
where attrelid = relfilenode
and relnamespace = pg_namespace.oid
and relname ~ '^lt*_.*[^qy]$'
group by nspname, attname order by attname;
. The result is a list of 23 attribute names, which are exactly those attributes used in either the 'lt_' or 'l_' tables. Each attribute name occurs either 15 or 30 times in the set of tables. This confirms that the tables have the same attributes. --JimDeLaHunt 2008-01-21 - Two further tables appear to implement the attributes of the ARs (e.g. co- for co-produced, soprano saxophone for performed, etc.) Each row in the "
link_attribute
" table stores one attribute for one AR. The name and description of the attributes are stored in rows of the "link_attribute_type
" table. The are summarised below. --JimDeLaHunt 2008-01-21
- All 15 of the tables whose names begin with "
musicbrainz_db=> \d link_attribute Table "public.link_attribute" Column | Type | Modifiers ----------------+-----------------------+------------------------------------------------------------- id | integer | not null default nextval('link_attribute_id_seq'::regclass) attribute_type | integer | not null default 0 link | integer | not null default 0 link_type | character varying(32) | not null default ''::character varying [index and constraint information omitted --ed.] musicbrainz_db=> \d link_attribute_type Table "public.link_attribute_type" Column | Type | Modifiers -------------+------------------------+------------------------------------------------------------------ id | integer | not null default nextval('link_attribute_type_id_seq'::regclass) parent | integer | not null childorder | integer | not null default 0 mbid | character(36) | not null name | character varying(255) | not null description | text | not null modpending | integer | not null default 0 [index and constraint information omitted --ed.]
-
- This query illustrates the connection between the tables:
select a.id, a.attribute_type,
l.name,
a.link, a.link_type
from link_attribute as a,
link_attribute_type as l
where link = 82689 and a.attribute_type = l.id;
--JimDeLaHunt 2008-01-21 - @@@ continue from here
- This query illustrates the connection between the tables:
- Query list of ARs of various kinds
- Which ARs can describe either Tracks and Releases? How often are they used?
comments
Jim, re the last bullet in the "What I'm seized of at MusicBrainz" section, I'd suggest the answer is yes, yes, and yes, but the last only as a side benefit, not a goal within itself. DeleteWhenCooked :) -- BrianSchweitzer 20:15, 19 January 2008 (UTC)
Jim, this was very helpful. My problem was at the end of the whole process I could not see any Releases, only artists and songs. I used MBServer-20070401-2 and mbdump.tar.bz2 from 24-May-2008 -- Trent George 2008-05-27 Template:lang en-native Template:lang ja-fluent Template:lang de-some Template:lang fr-some Template:lang es-some