Development/JSON Data Dumps: Difference between revisions

From MusicBrainz Wiki
Jump to navigationJump to search
mNo edit summary
(2 intermediate revisions by the same user not shown)
Line 27: Line 27:
Keep in mind that for some entity types (such as releases) this file will be several GB in size. Don't expect to be able to open it directly with a text editor or similar software.
Keep in mind that for some entity types (such as releases) this file will be several GB in size. Don't expect to be able to open it directly with a text editor or similar software.


==== ''COPYING'' ====
==== ''COPYING'' files ====
The license file for the data. These dumps are [https://creativecommons.org/publicdomain/zero/1.0/ CC0 licensed].
The license info for the data (see details below).


==== Other files ====
==== Other files ====
Line 62: Line 62:


Note: If you don't use `gpg` very frequently, and haven't marked the key as trusted (or marked any other key as trusted), you'll see the above warning that the key is not certified. It doesn't mean that the signature is invalid, just that `gpg` won't be convinced that the source of the key you received is authentic until you tell it that you think it is.
Note: If you don't use `gpg` very frequently, and haven't marked the key as trusted (or marked any other key as trusted), you'll see the above warning that the key is not certified. It doesn't mean that the signature is invalid, just that `gpg` won't be convinced that the source of the key you received is authentic until you tell it that you think it is.

== Licenses ==

The license for the dumped data is described below.

===Public Domain===

[[Image:CC0_button.svg]]

Most of the content on the JSON dumps is licensed under the [http://creativecommons.org/publicdomain/zero/1.0/ CC0] license, which is effectively placing the data into the Public Domain.

===Creative Commons===

[[Image:CC-BY-NC-SA-icon-88x31.png]]

The following parts of each item in the dumps are licensed under the [http://creativecommons.org/licenses/by-nc-sa/3.0/ Attribution-NonCommercial-ShareAlike 3.0] license:

* annotation
* genres
* ratings
* tags

Revision as of 15:40, 13 March 2023

Introduction

As an alternative to the database dumps in PostgreSQL format, most of the MusicBrainz data is also accessible via separate data dumps in JSON format.

Download

The data dumps are available for download via http, ftp or rsync at following places:

File Descriptions

Each JSON dump snapshot provided over FTP includes a number of different files. Depending on your use cases, you may or may not require all of them. Here's a rundown of what they contain:

tar.xz files

The snapshot contains a separate tar.xz file for each of the MusicBrainz entity types, except for URLs. The compressed file contains the following:

mbdump/entity

The mbdump/ folder contains one file named after the relevant entity type. This file contains one entity entry per line, in the format returned by our JSON web service. Each entry will include all the data that can usually be requested from the web service with inc= calls (except for those that need a subquery): aliases, annotations, genres, ratings (if applicable to the entity types) and tags. It will also include most relationships. Two kinds of relationships are not included:

  • Relationships for works linked to a release or recording. To get these you will need to find the appropriate work in the work dump.
  • Relationships for recordings linked to a release, if the release has more than 500 recordings. This applies only to a very small number of very large releases, but for these cases, you will only be able to find those relationships by finding the other side of the relationship (the artist, label, recording, etc. that is linked to the recording in question via a relationship).

For most entity types, the file does contain all the relevant entities; for recordings, though, the recording dump contains only standalone recordings. All other recordings can be found in the release dump, under the appropriate release(s).

Keep in mind that for some entity types (such as releases) this file will be several GB in size. Don't expect to be able to open it directly with a text editor or similar software.

COPYING files

The license info for the data (see details below).

Other files

The other files in the package are a README and several metadata files: JSON_DUMPS_SCHEMA_NUMBER, REPLICATION_SEQUENCE and SCHEMA_SEQUENCE (indicating the sequence numbers at the time of the dump for the JSON dumps schema, the replicated MusicBrainz data and the main MusicBrainz database schema respectively) and TIMESTAMP (when the dump was created).

ASC files

All of the `.asc` files contain the PGP signatures for their respective files. You can use these to verify the PGP signatures of the files after you've downloaded.

In order to verify the downloads, you must first fetch the MusicBrainz public key:

$ gpg --recv-keys C777580F
gpg: requesting key C777580F from hkp server keys.gnupg.net
gpg: /home/kevin/.gnupg/trustdb.gpg: trustdb created
gpg: key C777580F: public key "MusicBrainz (MusicBrainz data dump signing key) <support@musicbrainz.org>" imported
gpg: no ultimately trusted keys found
gpg: Total number processed: 1
gpg:               imported: 1  (RSA: 1)`

Now you can verify the GPG signatures. For example, if you download the area.tar.xz files:

$ wget http://ftp.musicbrainz.org/pub/musicbrainz/data/json-dumps/20200229-001001/area.tar.xz
...
$ wget http://ftp.musicbrainz.org/pub/musicbrainz/data/json-dumps/20200229-001001/area.tar.xz.asc
...
# now you can run:
$ gpg --verify area.tar.xz.asc area.tar.xz
gpg: Signature made Sat 18 Jul 2015 03:10:45 AM UTC using RSA key ID C777580F
gpg: Good signature from "MusicBrainz (MusicBrainz data dump signing key) <support@musicbrainz.org>"
gpg: WARNING: This key is not certified with a trusted signature!
gpg:          There is no indication that the signature belongs to the owner.
Primary key fingerprint: D5E6 3B4B DCCE 1956 4294  8684 B8FC 2375 C777 580F


Note: If you don't use `gpg` very frequently, and haven't marked the key as trusted (or marked any other key as trusted), you'll see the above warning that the key is not certified. It doesn't mean that the signature is invalid, just that `gpg` won't be convinced that the source of the key you received is authentic until you tell it that you think it is.

Licenses

The license for the dumped data is described below.

Public Domain

CC0 button.svg

Most of the content on the JSON dumps is licensed under the CC0 license, which is effectively placing the data into the Public Domain.

Creative Commons

CC-BY-NC-SA-icon-88x31.png

The following parts of each item in the dumps are licensed under the Attribution-NonCommercial-ShareAlike 3.0 license:

  • annotation
  • genres
  • ratings
  • tags