User:JimDeLaHunt: Difference between revisions

From MusicBrainz Wiki
Jump to navigationJump to search
(fix formatting (Imported from MoinMoin))
(AR tables (Imported from MoinMoin))
Line 166: Line 166:
| select tablename from pg_tables where schemaname = 'public' order by tablename;
| select tablename from pg_tables where schemaname = 'public' order by tablename;
|-
|-
| Pending || Pending<code><nowiki></nowiki></code>Data || album
| Pending || PendingData || album
|-
|-
| album_amazon_asin || album_cdtoc || albumjoin
| album_amazon_asin || album_cdtoc || albumjoin
Line 228: Line 228:
| (90 rows)
| (90 rows)
|}
|}
* Understanding the AR implementation
* Understanding the AR implementation.
** [[Advanced Relationships|AdvancedRelationships]] describe relationships between two entities: Artist, Label, Track, or Release (called "album" here, probably for historical reasons). They are grouped into relationship classes. They can have attributes, such as instrument or vocal part, modifiers like "co-" and "additional", and dates. See the wikidocs starting with [[Advanced Relationships|AdvancedRelationships]] for more information. --JimDeLaHunt 2008-01-20
** There are 30 tables that appear to have implement the [[Advanced Relationships|AdvancedRelationships]]. They are :

{| border="1"
|-
| '''Tables implementing [[Advanced Relationships|AdvancedRelationships]] in MB database'''
|-
| select tablename from pg_tables where schemaname = 'public' and (tablename ~ '<code><nowiki>^</nowiki></code>l_' or tablename ~ '<code><nowiki>^</nowiki></code>lt_');
|-
| l_album_album || l_album_artist || l_album_label || l_album_track || l_album_url
|-
| || l_artist_artist || l_artist_label || l_artist_track || l_artist_url
|-
| || || l_label_label || l_label_track || l_label_url
|-
| || || || l_track_track || l_track_url
|-
| || || || || l_url_url
|-
| lt_album_album || lt_album_artist || lt_album_label || lt_album_track || lt_album_url
|-
| || lt_artist_artist || lt_artist_label || lt_artist_track || lt_artist_url
|-
| || || lt_label_label || lt_label_track || lt_label_url
|-
| || || || lt_track_track || lt_track_url
|-
| || || || || lt_url_url
|-
| (30 rows)
|}
* Query list of ARs of various kinds
* Query list of ARs of various kinds
* Which ARs can describe either Tracks and Releases? How often are they used?
* Which ARs can describe either Tracks and Releases? How often are they used?

Revision as of 09:10, 21 January 2008

JimDeLaHunt

Jim DeLaHunt [ MB: Editor:Jim DeLaHunt | IRC: none yet | Wiki: JimDeLaHunt | Last.fm: none yet | jdlh.com ]
I live in Vancouver, Canada. Our family has a collection of some 500 CDs which are gradually being ripped, encoded, and tagged. Most are opera and classical, but there are other genres in there too.... I really care about having accurate metadata on my opera tracks, because that's an important part of having it show up well in the music players.
more bio...

What I'm seized of at MusicBrainz

  • Get good metadata tags for the CD collection I'm ripping.
  • Figure out how enter metadata for my CD collection efficiently.
  • Improving documentation of existing features. Write howto documents recording what I've learned.
  • Maybe write some better data entry UIs or bots to systematically clean up existing data.
  • Look at the ObjectModel and notions like work (composition) as applicable to ClassicalMusic, OperaMusic and MusicalTheatre.
  • Figure out what MusicBrainz is trying to be: "This site is intended to be a free, on-line encyclopedia of music information." (GeneralFAQ), but that still leaves me with questions.
  • Is MB attempting to build a database of true facts and relationships about music separate from particular recordings, and/or a database of facts and relationships about recordings, and/or a service that produces useful metadata tags for digital music files?

Setting up a private server: log

I'm trying to set up a private copy of the MusicBrainz server for development work. Here's a log of my actions and lessons learned. I hope this is helpful for other MusicBrainz users.

  • Follow instructions at VirtualMusicBrainzServer . --JimDeLaHunt 2008-01-18
    • Downloading ftp://ftp.musicbrainz.org/pub/musicbrainz/MBServer-20070401-2.tar.bz2 only took me about 5 minutes. --JimDeLaHunt 2008-01-18
    • Don't need to use 7-Zip to open archive; default archive utility for Mac OS 10.5.1 did just fine with both .bz2 and .tar. Took only 2-5 minutes. --JimDeLaHunt 2008-01-18
    • Add the string ".vmwarevm" to the end of the virtual machine directory name. This makes it into a directory bundle, which Mac OS treats as a single file for most purposes. This documented off-handedly in a VMWare blog posting from April 2007. --JimDeLaHunt 2008-01-18 16:00
    • More on "Start the virtual machine in VMWare". Run VMWare Fusion application. File... Open.... In File Open dialogue, navigate to folder MBServer-20070401-2/MBServer-20070401/ where you opened the archive. Select file MBServer-20070401.vmx in that folder. The virtual machine boots up. A bootloader prompt offers to two configurations; pick the one that doesn't offer "safe mode". The boot sequence runs in text mode. Finally a "brainzvm login:" prompt appears. --JimDeLaHunt 2008-01-18 00:55h
    • Cannot update VMWare Tools, since the virtual machine has no CD-ROM configured. See also note on VirtualMusicBrainzServer by Jugdish 2007-08-22 00:31:09. For now, I'm doing nothing about this. The VMWare tools improve mouse and graphics support, but we may not use those at all for this server. --JimDeLaHunt 2008-01-18
    • VirtualMusicBrainzServer step 2 (password change) works fine. --JimDeLaHunt 2008-01-18
    • If virtual machine has "bridged" instead of "NAT" network, then it gets its own IP address, different from that of the host machine. That would work well in my environment, where I have a private subnet. However, it fails the connectivity test in VirtualMusicBrainzServer step 3. Thus: Select menu item Virtual Machine... Settings... Network... "Share the host's internet connection (NAT)", not "Connect directly to the physical network (Bridged)".
    • VirtualMusicBrainzServer step 3 (check network connection) through 6 (logout) work fine, if virtual machine uses NAT. --JimDeLaHunt 2008-01-18
    • My virtual machine came up with the wrong time set. I fixed it using a command like "sudo date -s 2008-01-18" --JimDeLaHunt 2008-01-18
  • VirtualMusicBrainzServer, "Loading data" section
    • VirtualMusicBrainzServer step 1 (connect to virtual machine) works fine. Tip for those who don't know ssh: the command line is "ssh mbserver@<ip address of VM>". --JimDeLaHunt 2008-01-18
    • Step 2 (download the latest MB data dump). If you, like me, didn't know Lynx, here's a quick overview. Lynx is a web browser that operates in text mode. Use the up-arrow and down-arrow keys to move the selection from link to link. The right-arrow follows the currently-selected link. Type "d" to download the currently-selected link. Lynx gives me data rates of 800 KiB/sec, which seems pretty good. Overall this step took me 45 minutes. I stored files to the default directory, which was /home/mbserver. --JimDeLaHunt 2008-01-18
    • Step 3 ("svn up" for latest code). Followed the two commands given. Took about 5 seconds to run. Printed 9 lines beginning with "U", followed by message, "Updated to revision 9649." --JimDeLaHunt 2008-01-18
    • At about this time I lost all my ssh connections. It turns out that the IP address of the mbserver virtual machine had changed. I'm not sure quite why; perhaps it had to do with using NAT instead of Bridged network connections. --JimDeLaHunt 2008-01-18
    • Step 4, Stop apache, drop the empty musicbrainz database, worked fine.
    • Step 4 part 3, import the dumps. failed. When I tried to import the dumps, I got the error message: "Pre-checking /home/mbserver/mbdump-artistrelation.tar.bz2", "Schema sequence mismatch - codebase is 8, /home/mbserver/mbdump-artistrelation.tar.bz2 is 9". This error is mentioned in DebianVMWareDatabase, but the cause they give is that the Perl scripts are out of date. However, the "svn up" command should have fixed that, right? I'm "at revision 9649." Needs investigation. Per 2007-12-02 Forums post I editted /home/mbserver/svn/mb_server/cgi-bin/DBDefs.pm and updated the schema number to "sub DB_SCHEMA_SEQUENCE { 9 }". That got me further, but then I ran into the same snag mentioned later in that forum post. It refers to an Oct 2007 post to mb-datafeeds. I'll look at this next. --JimDeLaHunt 2008-01-18
  • Update MB server code to 20071014 version, following the instructions in http://lists.musicbrainz.org/pipermail/musicbrainz-datafeed/2007-October/000022.html , dated 2007-10-14.
    • cd to ~/svn . Run this command: svn co http://svn.musicbrainz.org/mb_server/branches/RELEASE_20071014-BRANCH mb_server-20071014 . Note that there is no space between "branches/" and "RELEASE", even though the musicbrainz-datafeed message appears to indicate a space. This command took me some 11 minutes to run. --JimDeLaHunt 2008-01-18
    • step 3 (copy new config files) worked as described --JimDeLaHunt 2008-01-18
    • step 4a (edit config files). I did a diff between the new and old config files DBDefs.pm and startup.pl . The differences I saw that seemed like they were based on the virtual machine:
99,100c110,111
< sub WEB_SERVER             { "brainzvm" }
< sub RDF_SERVER             { "brainzvm" }
---
> sub WEB_SERVER             { "www.musicbrainz.example.com" }
> sub RDF_SERVER             { "rdf.musicbrainz.example.com" }
118c129
< sub SMTP_SECRET_CHECKSUM { "s3kr1t" }
---
> sub SMTP_SECRET_CHECKSUM { "" }
183c194
< sub DB_STAGING_SERVER { "virtual" }
---
> sub DB_STAGING_SERVER { "development" }
    • step 4c (Test your configuration) worked as expected. --JimDeLaHunt 2008-01-19
    • Skip steps 5-9. They have to do with upgrading existing data in the database, and we're going to load a fresh dump of data in the database. --JimDeLaHunt 2008-01-19
  • VirtualMusicBrainzServer, "Loading data" section, continued
    • Step 4 part 1, Stop apache, not needed; apache already stopped. --JimDeLaHunt 2008-01-19
    • Step 4 part 2, drop the empty musicbrainz database, worked fine. --JimDeLaHunt 2008-01-19
    • Step 4 part 3, import the dumps. This time it ran successfully. It took 1 hr 27 minutes to run. Note: the script thoughtfully prints timestamps at the start of many lines of output, so it's easy to see how long it takes to run. --JimDeLaHunt 2008-01-19
    • Step 4 part 4 (restart the http server). Failed, error: "apache-perl: could not open document config file /etc/apache-perl/conf.d/vh_httpd.conf". Diagnosis: that's a symlink to "/home/mbserver/svn/mb_server/admin/vh_httpd.conf". In that dir is a "vh_httpd.conf.default". --JimDeLaHunt 2008-01-19
    • updating admin/vh_httpd.conf. There are many changes from the VM's copy to the 20071014 server copy, but the two lines that matter are marked with a TODO comment. I put the "ServerName brainzvm" and "ServerAlias rdf.brainzvm" into the new vh_httpd.conf file. --JimDeLaHunt 2008-01-19
<       # TODO: Enter the name of your server.  You should have two names
<       # available: one for web browsing (HTML), and one for web services (RDF).
<     ServerName brainzvm
<     ServerAlias rdf.brainzvm
9,13c4,8
>       # TODO: Enter the name of your server.  You should have two names
>       # available: one for web browsing (HTML), and one for web services (RDF).
>       ServerName www.musicbrainz.example.com
>       ServerAlias rdf.musicbrainz.example.com
>
    • Update CPAN module configuration for the virtual machine. Ref CPAN module doc, "configuation" section. As root, run "perl -MCPAN -e shell", giving a CPAN interactive session. Command "o conf urllist" gives the list of mirror sites which this CPAN will consult. In the virtual machine as shipped these were "mirrors.playboy.com" (a host name not recognised) and "mirrors.kernel.org" (error messages about it being 820 days old). I changed this with the following commands: "o conf urllist shift", to get rid of mirrors.playboy.com; followed by "o conf urllist unshift ftp://cpan.llarian.net/pub/CPAN/", to put a different mirror in the top spot; followed by "o conf urllist" to confirm the new URLList. --JimDeLaHunt 2008-01-19
    • Step 4 part 4 (restart the http server). Failed again, error below. Nub seems to be "compilation aborted at /home/httpd/musicbrainz/mb_server/htdocs/edit/albumlanguage/getlists.html line 29". This line is "use JSON;". Suspect JSON module not installed in virtual machine. As root, run "perl -MCPAN -e 'install JSON'". @@@ --JimDeLaHunt 2008-01-19
brainzvm:~# /etc/init.d/apache-perl start
Configuration syntax error detected, not starting/reloading...
Preloading 230 components
[Sat Jan 19 12:18:01 2008] [error] Error during compilation of /home/httpd/musicbrainz/mb_server/htdocs/edit/albumlanguage/getlists.html:\nCan't locate JSON.pm in @INC (@INC contains: /home/httpd/musicbrainz/mb_server/cgi-bin /etc/perl /usr/local/lib/perl/5.8.8 /usr/local/share/perl/5.8.8 /usr/lib/perl5 /usr/share/perl5 /usr/lib/perl/5.8 /usr/share/perl/5.8 /usr/local/lib/site_perl /usr/local/lib/perl/5.8.7 /usr/local/share/perl/5.8.7 . /etc/apache-perl/ /etc/apache-perl/lib/perl) at /home/httpd/musicbrainz/mb_server/htdocs/edit/albumlanguage/getlists.html line 29.\nBEGIN failed--compilation aborted at /home/httpd/musicbrainz/mb_server/htdocs/edit/albumlanguage/getlists.html line 29.\n\n\nStack:\n  [/usr/share/perl5/HTML/Mason/Interp.pm:445]\n  [/usr/share/perl5/HTML/Mason/Interp.pm:283]\n  [/usr/share/perl5/HTML/Mason/Interp.pm:203]\n  [/usr/share/perl5/HTML/Mason/Interp.pm:155]\n  [/usr/share/perl5/Class/Container.pm:329]\n  [/usr/share/perl5/Class/Container.pm:53]\n  [/usr/share/perl5/HTML/Mason/ApacheHandler.pm:639]\n  [/home/httpd/musicbrainz/mb_server/cgi-bin/MusicBrainz/Server/Mason.pm:109]\n  [/home/httpd/musicbrainz/mb_server/cgi-bin/MusicBrainz/Server/Mason.pm:136]\nCompilation failed in require at /home/httpd/musicbrainz/mb_server/admin/startup.pl line 114.\nCompilation failed in require at (eval 9) line 1.\n
Syntax error on line 32 of /etc/apache-perl/conf.d/vh_httpd.conf:
Error during compilation of /home/httpd/musicbrainz/mb_server/htdocs/edit/albumlanguage/getlists.html:
Can't locate JSON.pm in @INC (@INC contains: /home/httpd/musicbrainz/mb_server/cgi-bin /etc/perl /usr/local/lib/perl/5.8.8 /usr/local/share/perl/5.8.8 /usr/lib/perl5 /usr/share/perl5 /usr/lib/perl/5.8 /usr/share/perl/5.8 /usr/local/lib/site_perl /usr/local/lib/perl/5.8.7 /usr/local/share/perl/5.8.7 . /etc/apache-perl/ /etc/apache-perl/lib/perl) at /home/httpd/musicbrainz/mb_server/htdocs/edit/albumlanguage/getlists.html line 29.
BEGIN failed--compilation aborted at /home/httpd/musicbrainz/mb_server/htdocs/edit/albumlanguage/getlists.html line 29.

Stack:
  [/usr/share/perl5/HTML/Mason/Interp.pm:445]
  [/usr/share/perl5/HTML/Mason/Interp.pm:283]
  [/usr/share/perl5/HTML/Mason/Interp.pm:203]
  [/usr/share/perl5/HTML/Mason/Interp.pm:155]
  [/usr/share/perl5/Class/Container.pm:329]
  [/usr/share/perl5/Class/Container.pm:53]
  [/usr/share/perl5/HTML/Mason/ApacheHandler.pm:639]
  [/home/httpd/musicbrainz/mb_server/cgi-bin/MusicBrainz/Server/Mason.pm:109]
  [/home/httpd/musicbrainz/mb_server/cgi-bin/MusicBrainz/Server/Mason.pm:136]
Compilation failed in require at /home/httpd/musicbrainz/mb_server/admin/startup.pl line 114.
Compilation failed in require at (eval 9) line 1.

 failed!
brainzvm:~#
    • Step 4 part 4 (restart the http server). This time it worked. --JimDeLaHunt 2008-01-19
mbserver@brainzvm:~$ su -
Password:
brainzvm:~# /etc/init.d/apache-perl start
Starting apache-perl 1.3 web server...Preloading 230 components
.
brainzvm:~# exit
logout
    • Step 5 (check web service of virtual machine's MB server). http://ipaddress/ gives Error 403 "Forbidden You don't have permission to access / on this server." I can traceroute and ssh to this IP address. --JimDeLaHunt 2008-01-19.
    • Problem proved to be that the root directory of the MusicBrainz local site had owner and group of mbserver, and it needed to be www-data. Executing the command chown www-data:www-data mb_server fixed the problem. No need even to stop and restart apache. http://ipaddress/ gives an MB website. --JimDeLaHunt 2008-01-20
brainzvm:/home/mbserver/svn# ls -l
total 8
drwxrwxrwx 7 mbserver mbserver 4096 2008-01-19 02:46 mb_server
drwxrwxrwx 7 www-data www-data 4096 2007-04-03 09:04 mb_server-from-VM

# Note: at this point, server gives HTTP error 403 for every page.

brainzvm:/home/mbserver/svn# chown www-data:www-data mb_server
brainzvm:/home/mbserver/svn# ls -l
total 8
drwxrwxrwx 7 www-data www-data 4096 2008-01-19 02:46 mb_server
drwxrwxrwx 7 www-data www-data 4096 2007-04-03 09:04 mb_server-from-VM

# Note: at this point, server now works correctly; no more error 403.

Exploring AdvancedRelationship statistics

My goal here is to understand the facts of how AdvancedRelationships are actually being used in MusicBrainz now. Since the statistics pages don't appear to provide counts of ARs, and the search pages don't let you search ARs, I'm using SQL statements to look through the actual database tables implementing ARs in my standalone copy of the MB server, and generate specific statistics.

  • Understand how to list the database tables and browse them.
    • The MusicBrainzDatabase The MusicBrainz database is built on the Postgres relational database engine. Postgres project site is http://postgresql.org/ . A nice Postgres book, Practical PostgreSQL by John Worsley and Joshua Drake of Command Prompt, Inc. is freely available online (under the Open Publication License, v1.0) at http://www.commandprompt.com/ppbook/book1 . It helped me, as a novice to both Postgres and SQL. More books listed at http://www.postgresql.org/docs/books/ . --JimDeLaHunt 2008-01-20
    • The psql tool is command-line client for browsing the database and executing SQL statements. A copy is in the MB code tree at admin/psql. Instructions in the Practical PostgreSQL book and from man psql. Within psql, the \? command gives help on psql commands, and \h gives help on SQL. \q is the command to quit psql. --JimDeLaHunt 2008-01-20
    • In the MB server environment, you need to supply a user name for the database server, and a database name. A straightforward way to do this is via the option psql -U username databasename. You can find the username in cgi-bin/DBDefs.pm. Look there for "MusicBrainz::Server::Database". I won't put the actual username in this page, because the serious student can look it up, and I don't want to make it too easy for vandals. I'll leave out the "-U username databasename" options in examples below, but you should include it. --JimDeLaHunt 2008-01-20
    • To get psql to write results from interactive commands to files, use the command \o PATH to send psql's output to the Linux pathname (relative to the current working directory). Use the command \o with no parameter to return output to the display. --JimDeLaHunt 2008-01-20
    • By default psql formats output nicely into aligned columns with vertical bars separating them, and a pager to give you a page at a time. To get comma-separated values instead, use this sequence of commands \pset format unaligned (\a for short), \pset fieldsep , (\f , for short), \pset pager. You can use \t instead of comma if you want tab-separated instead of comma-separated files. \pset recordsep '\n' sets the record separator to a newline. \H generates HTML table syntax. --JimDeLaHunt 2008-01-20
    • To generate output in wiki format, use these commands: \a \f || \pset recordsep '||\n||' \pset pager. --JimDeLaHunt 2008-01-20
  • Current DatabaseSchema and list of tables
    • The psql command "\l" lists all databases on the server. So does the command line "psql -l". It doesn't tell you much more than you know already from cgi-bin/DBDefs.pm. --JimDeLaHunt 2008-01-20
    • The psql command "\dt" lists all tables in the database. An alternate command that gets just the table names is select tablename from pg_tables where schemaname = 'public' order by tablename;. "\dt+" shows all tables with the Postgres COMMENT field include. --JimDeLaHunt 2008-01-20
    • When I tried this in the 20071014 server code, I got 90 tables. 30 of these are the tables with names beginning "l_" and "lt_", which are for AdvancedRelationships. Compare these 90 to 25 tables in the DatabaseSchema diagram. The table names are listed below. --JimDeLaHunt 2008-01-20
tablename in MB database
select tablename from pg_tables where schemaname = 'public' order by tablename;
Pending PendingData album
album_amazon_asin album_cdtoc albumjoin
albummeta albumwords annotation
artist artist_relation artist_tag
artistalias artistwords automod_election
automod_election_vote cdtoc clientversion
country currentstat editor_subscribe_editor
gid_redirect historicalstat l_album_album
l_album_artist l_album_label l_album_track
l_album_url l_artist_artist l_artist_label
l_artist_track l_artist_url l_label_label
l_label_track l_label_url l_track_track
l_track_url l_url_url label
label_tag labelalias labelwords
language link_attribute link_attribute_type
lt_album_album lt_album_artist lt_album_label
lt_album_track lt_album_url lt_artist_artist
lt_artist_label lt_artist_track lt_artist_url
lt_label_label lt_label_track lt_label_url
lt_track_track lt_track_url lt_url_url
moderation_closed moderation_note_closed moderation_note_open
moderation_open moderator moderator_preference
moderator_subscribe_artist moderator_subscribe_label puid
puid_stat puidjoin puidjoin_stat
release release_tag replication_control
script script_language stats
tag track track_tag
trackwords trm trm_stat
trmjoin trmjoin_stat url
vote_closed vote_open wordlist
(90 rows)
  • Understanding the AR implementation.
    • AdvancedRelationships describe relationships between two entities: Artist, Label, Track, or Release (called "album" here, probably for historical reasons). They are grouped into relationship classes. They can have attributes, such as instrument or vocal part, modifiers like "co-" and "additional", and dates. See the wikidocs starting with AdvancedRelationships for more information. --JimDeLaHunt 2008-01-20
    • There are 30 tables that appear to have implement the AdvancedRelationships. They are :
Tables implementing AdvancedRelationships in MB database
select tablename from pg_tables where schemaname = 'public' and (tablename ~ '^l_' or tablename ~ '^lt_');
l_album_album l_album_artist l_album_label l_album_track l_album_url
l_artist_artist l_artist_label l_artist_track l_artist_url
l_label_label l_label_track l_label_url
l_track_track l_track_url
l_url_url
lt_album_album lt_album_artist lt_album_label lt_album_track lt_album_url
lt_artist_artist lt_artist_label lt_artist_track lt_artist_url
lt_label_label lt_label_track lt_label_url
lt_track_track lt_track_url
lt_url_url
(30 rows)
  • Query list of ARs of various kinds
  • Which ARs can describe either Tracks and Releases? How often are they used?
    • @@@ continue from here

comments

Jim, re the last bullet in the "What I'm seized of at MusicBrainz" section, I'd suggest the answer is yes, yes, and yes, but the last only as a side benefit, not a goal within itself. DeleteWhenCooked :) -- BrianSchweitzer 20:15, 19 January 2008 (UTC)