- 1 Introduction
- 2 Download
- 3 File Descriptions
- 3.1 The Basics
- 3.2 More Advanced features
- 4 Licenses
The data dumps are made available in a format that can be loaded into a local instance of PostgreSQL using a local instance of MusicBrainz Server. See MusicBrainz Server Setup for how to do it. Instructions and scripts are provided to download the data.
If you are interested in keeping the data in sync with MusicBrainz using our live data feed, make sure to enable replication.
Alternatively, if you are not interested in having a local MusicBrainz website and web service, you can use mbdata that includes replication without the rest of MusicBrainz Server.
The data dumps are available for download via http, ftp or rsync at following places:
- http://ftp.musicbrainz.org/pub/musicbrainz/data/fullexport/ (US mirror - OR) - also supports ftp
- https://data.metabrainz.org/pub/musicbrainz/data/fullexport/ (EU mirror - DE) - also supports ftp
- https://mirrors.dotsrc.org/MusicBrainz/data/fullexport/ (EU mirror - DK) - also supports http and ftp
- rsync://rsync.osuosl.org/musicbrainz/data/fullexport/ (US mirror - OR)
Each data dump snapshot provided over FTP includes a number of different files. Depending on your use cases, you may or may not require all of them. Here's a rundown of what they contain:
If you're only looking for music metadata, you can start here with the basics. These files should help you get everything you need to replicate the core catalog.
If you're looking for more advanced, or more analytical data, you should still have a look at these basics, but make sure to also see the Advanced section below.
All of the `.asc` files contain the PGP signatures for their respective files. You can use these to verify the PGP signatures of the files after you've downloaded.
In order to verify the downloads, you must first fetch the MusicBrainz public key:
$ gpg --recv-keys C777580F gpg: requesting key C777580F from hkp server keys.gnupg.net gpg: /home/kevin/.gnupg/trustdb.gpg: trustdb created gpg: key C777580F: public key "MusicBrainz (MusicBrainz data dump signing key) <firstname.lastname@example.org>" imported gpg: no ultimately trusted keys found gpg: Total number processed: 1 gpg: imported: 1 (RSA: 1)`
Now you can verify the GPG signatures. For example, if you download the SHA256SUMS files:
$ wget http://ftp.musicbrainz.org/pub/musicbrainz/data/fullexport/20150718-003933/SHA256SUMS ... $ wget http://ftp.musicbrainz.org/pub/musicbrainz/data/fullexport/20150718-003933/SHA256SUMS.asc ... # now you can run: $ gpg --verify SHA256SUMS.asc SHA256SUMS gpg: Signature made Sat 18 Jul 2015 03:10:45 AM UTC using RSA key ID C777580F gpg: Good signature from "MusicBrainz (MusicBrainz data dump signing key) <email@example.com>" gpg: WARNING: This key is not certified with a trusted signature! gpg: There is no indication that the signature belongs to the owner. Primary key fingerprint: D5E6 3B4B DCCE 1956 4294 8684 B8FC 2375 C777 580F
Note: If you don't use `gpg` very frequently, and haven't marked the key as trusted (or marked any other key as trusted), you'll see the above warning that the key is not certified. It doesn't mean that the signature is invalid, just that `gpg` won't be convinced that the source of the key you received is authentic until you tell it that you think it is.
MD5SUM and SHA25SUM
These files contain the checksums for the hosted files. You can run `md5sum` and `sha256sum` on the downloaded .tar.bz2 files to validate the checksums:
$ wget http://ftp.musicbrainz.org/pub/musicbrainz/data/fullexport/20150718-003933/mbdump-stats.tar.bz2 $ sha256sum mbdump-stats.tar.bz2 5ad5de5c6804c6c937729382f7a0db50f46dc9ae0a4a143e7720fb1d4bbbfeba mbdump-stats.tar.bz2
You can also verify the checksum of all downloaded files at once.
$ sha256sum -c SHA256SUMS mbdump-cdstubs.tar.bz2: OK mbdump-cover-art-archive.tar.bz2: OK mbdump-derived.tar.bz2: OK mbdump-documentation.tar.bz2: OK sha256sum: mbdump-edit.tar.bz2: No such file or directory mbdump-edit.tar.bz2: FAILED open or read mbdump-editor.tar.bz2: OK mbdump-stats.tar.bz2: OK mbdump-wikidocs.tar.bz2: OK mbdump.tar.bz2: OK sha256sum: WARNING: 1 listed file could not be read
If you did not download a specific file, you can ignore the error regarding this file.
This is the core MusicBrainz database, including the tables for Artist, Release, Recording, etc.
Most normal catalog use cases only require this database, and the derived data.
The derived data consists of annotations, user ratings, user tags, and search indexes. Combining this with the core database should cover most music-metadata-related use cases.
More Advanced features
This is the complete edit history for the core database. If you want to see how metadata has evolved, make sure to grab this dump in addition to the core.
The history includes things like open and closed edits, edit notes, votes, and auto-editor elections. It does not include information about the people who made the edits. For that information, you'll need the next item as well.
This table includes non-personal user data about the people who've enacted the edits enumerated in the database above.
The CD Stub data is described over on its dedicated page. As mentioned over there, the stubs are submitted anonymously, and are treated as an untrusted source of data, separate from the core database.
Metadata about the metadata (very meta!). The statistics database includes things that you might find over at http://musicbrainz.org/statistics.
This dump includes the tables that show connections between MusicBrainz and the Cover Art Archive (keep in mind it does not include the actual images in the archive).
This dump includes the tables that show connections between MusicBrainz and the Event Art Archive (keep in mind it does not include the actual images in the archive). Since the Event Art Archive is not yet in use, there's currently no data here.
This dump includes the tables that specify which relationships in the database are used as examples for each relationship type, as well as the specific guidelines for each relationship type, when available.
The wikidocs_index table, containing info about what revision is transcluded for each wiki page covered by our WikiDocs system.
The license and contents of each file is described below.
The following database dumps are distributed under the CC0 license, which is effectively placing the data into the Public Domain:
The following database dumps are distributed under the Attribution-NonCommercial-ShareAlike 3.0 license: