User:JonnyJD/DiscID

From MusicBrainz Wiki
< User:JonnyJD
Revision as of 14:44, 25 September 2013 by JonnyJD (talk | contribs) (add summary/evaluation)

There was some discussion about the issues and a possible removal of Disc IDs on the 13th MusicBrainz Summit. I want to summarize issues and benefits of discID usage a bit.

Purpose of Disc IDs / TOCs

TOC

A TOC is set of sector offsets/times for a specific pressing of CD. A release can have multiple TOCs attached and the (primarey) times of the release should be the times from one of the TOCs. This way multiple pressings are grouped in one release. Releases can be found by TOC, though currently only a fuzzy search by TOC is available (see below).

Disc ID

A disc ID is basically a hash of the TOC, so you can generate a disc ID from the TOC, but no TOC from the disc ID. On MusicBrainz these disc IDs can be used as IDs for a specific TOC.


Disc Lookup

The disc ID is used as a short identifier to lookup a release "by disc". Taggers expect the returned release to match what they see as the disc: same number of tracks etc. Lookup by TOC is in theory possible, but currently isn't as exact (due to fuzzy lookup, see below) and isn't widely supported (also see below).

Issues with (or because of) Disc IDs

Pregap Tracks

The ability to attach pregrap tracks directly to the release is a long wanted feature and tracked in MBS-967. You can't add a Pregap Track as track 0 when a Disc ID is attached to the release (tracklist is locked). Removing the Disc ID could work, but the release can only be found by clients with TOC lookup support (see below). Removing Disc IDs altogether would extend the problem to all releases.

The additional "track" would confuse many tagger tools, so the pregap track should only be included when explicitely requested, not as a normal track.

Correcting Times

When disc IDs are attached, the track times can't be set manually. Setting times to one of the disc IDs is the only option.

I didn't grasp yet when it is the case that no correct disc ID is available and there is a better source for the correct time. The notes mention Video CDs, which possibly shouldn't get disc IDs attached in the first place? (when there are no audio tracks). Otherwise I don't see why we should mess around with length of data tracks. --JonnyJD (talk) 14:24, 25 September 2013 (UTC)

Client support with and without full TOC

The main client library for disc ID support is libdiscid:

  • submission url provided includes DiscID and TOC, to save the TOC on the server
  • web service url provided also includes both, but is outdated (WS/1)
  • only the disc ID can be gathered directly with the API (as of 0.5.2)
  • there will be an upcoming release (0.6.0) with a "toc string API" (LIB-41)

The web service does allow lookup by disc ID and, if it doesn't match, a fuzzy lookup by TOC. The result of a match by disc ID and a fuzzy match by TOC looks completely different (see comment in LMB-36) A syntactically valid (though not existing) disc ID is always required and a match by TOC is currently always fuzzy.

libmusicbrainz is technically able to work with a TOC lookup, but isn't straightforward due to how the web service works. (see above and LMB-36)

python-musicbrainzngs can only lookup by disc ID. There is an outstanding ticket for (fuzzy) lookup by TOC.

Usage statistics

(still left to gather)

Summary/Evaluation/TLDNR

Disc IDs are only IDs, so they are not important data in themselves. The important data is in the TOCs. However, the Disc IDs also aren't the problem, since the locks are in place because of the TOCs. When lookup would work directly by TOC, we would have the same issues with confused taggers (in case of pregap tracks implemented as normal tracks) or inconsistent data (in case of track times in no relation to actual TOCs).

Additionally the client structure is based on lookup by Disc ID, not by TOC directly. Removing Disc IDs will probably break lots of lookup applications (statistics not gathered as of writing though).