MusicBrainz Database/Download: Difference between revisions

From MusicBrainz Wiki
Jump to navigationJump to search
m (Use better annotated image so I can remove duplicates)
(→‎Introduction: Replace “Setup” with a link to MusicBrainz_Server/Setup, still mention mbdata alternative, drop archived mbzdb)
Line 3: Line 3:
Please read the [[MusicBrainz Database]] product page and the [[MusicBrainz Database/Schema|database schema]] documentation if you are not familiar with the MusicBrainz Database.
Please read the [[MusicBrainz Database]] product page and the [[MusicBrainz Database/Schema|database schema]] documentation if you are not familiar with the MusicBrainz Database.


The data dumps are made available in a format that can be loaded into a local instance of '''PostgreSQL''' using a local instance of MusicBrainz Server.
==Setup==
See [[MusicBrainz_Server/Setup|MusicBrainz Server Setup]] for how to do it.
Instructions and scripts are provided to download the data.


If you are interested in keeping the data in sync with MusicBrainz using our [[Live Data Feed|live data feed]], make sure to enable replication.
There are a two different methods to get a local database up and running, you can either:
* Download a pre-configured [[MusicBrainz_Server/Setup|virtual image]] of the MusicBrainz Server, or
* Download the data dumps and follow the relevant section of the [http://github.com/metabrainz/musicbrainz-server/blob/master/INSTALL.md INSTALL.md]


Alternatively, if you are not interested in having a local MusicBrainz website and web service, you can use [https://github.com/lalinsky/mbdata mbdata] that includes replication without the rest of MusicBrainz Server.
=== Replication ===
If you are interested in keeping the data in sync with MusicBrainz using our [[Live Data Feed|live data feed]], you can either:
* [[MusicBrainz_Server/Setup#Running_Replication|Enable replication]] in the pre-configured virtual image,
* Use an alternative PostgreSQL setup using [https://github.com/lalinsky/mbdata mbdata] that includes replication without the rest of MusicBrainz Server, or
* Use an alternative MySQL setup using [https://github.com/elliotchance/mbzdb mbzdb] that includes replication without the rest of MusicBrainz Server


== Download ==
== Download ==

Revision as of 14:58, 20 May 2020

Introduction

Please read the MusicBrainz Database product page and the database schema documentation if you are not familiar with the MusicBrainz Database.

The data dumps are made available in a format that can be loaded into a local instance of PostgreSQL using a local instance of MusicBrainz Server. See MusicBrainz Server Setup for how to do it. Instructions and scripts are provided to download the data.

If you are interested in keeping the data in sync with MusicBrainz using our live data feed, make sure to enable replication.

Alternatively, if you are not interested in having a local MusicBrainz website and web service, you can use mbdata that includes replication without the rest of MusicBrainz Server.

Download

The data dumps are available for download via http, ftp or rsync at following places:

File Descriptions

Each data dump snapshot provided over FTP includes a number of different files. Depending on your use cases, you may or may not require all of them. Here's a rundown of what they contain:

The Basics

If you're only looking for music metadata, you can start here with the basics. These files should help you get everything you need to replicate the core catalog.

If you're looking for more advanced, or more analytical data, you should still have a look at these basics, but make sure to also see the Advanced section below.

ASC files

All of the `.asc` files contain the PGP signatures for their respective files. You can use these to verify the PGP signatures of the files after you've downloaded.

In order to verify the downloads, you must first fetch the MusicBrainz public key:

$ gpg --recv-keys C777580F
gpg: requesting key C777580F from hkp server keys.gnupg.net
gpg: /home/kevin/.gnupg/trustdb.gpg: trustdb created
gpg: key C777580F: public key "MusicBrainz (MusicBrainz data dump signing key) <support@musicbrainz.org>" imported
gpg: no ultimately trusted keys found
gpg: Total number processed: 1
gpg:               imported: 1  (RSA: 1)`

Now you can verify the GPG signatures. For example, if you download the SHA256SUMS files:

$ wget http://ftp.musicbrainz.org/pub/musicbrainz/data/fullexport/20150718-003933/SHA256SUMS
...
$ wget http://ftp.musicbrainz.org/pub/musicbrainz/data/fullexport/20150718-003933/SHA256SUMS.asc
...
# now you can run:
$ gpg --verify SHA256SUMS.asc SHA256SUMS
gpg: Signature made Sat 18 Jul 2015 03:10:45 AM UTC using RSA key ID C777580F
gpg: Good signature from "MusicBrainz (MusicBrainz data dump signing key) <support@musicbrainz.org>"
gpg: WARNING: This key is not certified with a trusted signature!
gpg:          There is no indication that the signature belongs to the owner.
Primary key fingerprint: D5E6 3B4B DCCE 1956 4294  8684 B8FC 2375 C777 580F


Note: If you don't use `gpg` very frequently, and haven't marked the key as trusted (or marked any other key as trusted), you'll see the above warning that the key is not certified. It doesn't mean that the signature is invalid, just that `gpg` won't be convinced that the source of the key you received is authentic until you tell it that you think it is.

MD5SUM and SHA25SUM

These files contain the checksums for the hosted files. You can run `md5sum` and `sha256sum` on the downloaded .tar.bz2 files to validate the checksums:

$ wget http://ftp.musicbrainz.org/pub/musicbrainz/data/fullexport/20150718-003933/mbdump-stats.tar.bz2
$ sha256sum mbdump-stats.tar.bz2
5ad5de5c6804c6c937729382f7a0db50f46dc9ae0a4a143e7720fb1d4bbbfeba  mbdump-stats.tar.bz2

You can also verify the checksum of all downloaded files at once.

$ sha256sum -c SHA256SUMS
mbdump-cdstubs.tar.bz2: OK
mbdump-cover-art-archive.tar.bz2: OK
mbdump-derived.tar.bz2: OK
mbdump-documentation.tar.bz2: OK
sha256sum: mbdump-edit.tar.bz2: No such file or directory
mbdump-edit.tar.bz2: FAILED open or read
mbdump-editor.tar.bz2: OK
mbdump-stats.tar.bz2: OK
mbdump-wikidocs.tar.bz2: OK
mbdump.tar.bz2: OK
sha256sum: WARNING: 1 listed file could not be read

If you did not download a specific file, you can ignore the error regarding this file.

mbdump.tar.bz2

This is the core MusicBrainz database, including the tables for Artist, Release, Recording, etc.

Most normal catalog use cases only require this database, and the derived data.

mbdump-derived.tar.bz2

The derived data consists of annotations, user ratings, user tags, and search indexes. Combining this with the core database should cover most music-metadata-related use cases.

More Advanced features

mbdump-edit.tar.bz2

This is the complete edit history for the core database. If you want to see how metadata has evolved, make sure to grab this dump in addition to the core.

The history includes things like open and closed edits, edit notes, votes, and auto-editor elections. It does not include information about the people who made the edits. For that information, you'll need the next item as well.

mbdump-editor.tar.bz2

This table includes non-personal user data about the people who've enacted the edits enumerated in the database above.

mbdump-cdstubs.tar.bz2

The CD Stub data is described over on its dedicated page. As mentioned over there, the stubs are submitted anonymously, and are treated as an untrusted source of data, separate from the core database.

mbdump-stats.tar.bz2

Metadata about the metadata (very meta!). The statistics database includes things that you might find over at http://musicbrainz.org/statistics.


Licenses

The license and contents of each file is described below.

Public Domain

CC0 button.svg

The following database dumps are distributed under the CC0 license, which is effectively placing the data into the Public Domain:

  • mbdump.tar.bz2
  • mbdump-cdstubs.tar.bz2

Creative Commons

CC-BY-NC-SA-icon-88x31.png

The following database dumps are distributed under the Attribution-NonCommercial-ShareAlike 3.0 license:

  • mbdump-derived.tar.bz2
  • mbdump-edit.tar.bz2
  • mbdump-editor.tar.bz2
  • mbdump-stats.tar.bz2