How PUIDs Work
Music Analysis vs Fingerprinting
There are two processes that MusicIP makes available: MusicAnalysis and AudioFingerprinting. Finally there are the PUIDs which are just IDs, no fingerprints.
Music Analysis
Before a PUID is available for MusicBrainz or Picard to use, Music Analysis must have been performed on a track. MusicAnalysis uses up to 10 minutes of the track and examines all sorts of things. This is the secret sauce that makes MusicIP tick, and that allows the MusicIP mixer (aka MusicMagicMixer or MMM) to generate playlists of similar music. This is never going to be open sourced. Music analysis takes a while (about 80% of the file's playing time).
In order to generate a new PUID, you must analyze a track fully. Currently you have to use either the MusicIP mixer or MusicIP's genpuid command-line utility in order to do this. The result of this analysis is submitted to the MusicDNS service and is used by the MusicDNS server to do fuzzy matching. This data is closed source, patented, and even secret (the closed source app MusicIP mixer sends the data to a closed source server, and it never sees the light of the public). The only thing that gets public is the Portable Unique IDentifier (PUID), which is a 128-bit ID of the respective analysis data on the MusicDNS server.
There is a 24 hour latency on the MusicIP side for new PUIDs to become available. This is an artifact of the server architecture, which is optimized to do large numbers of lookups efficiently. (In practice, the latency is currently less than 12 hours.) This should disappear at some point in the future, but not until the MusicIP server architecture is updated.
MusicAnalysis cannot be integrated into the PicardTagger, because the process is closed source and Picard is GPLed.
Audio Fingerprinting and PUIDs
Fingerprinting is a much smaller process - it analyzes about 2 minutes of the track using the open source libofa library to calculate an AudioFingerprint and should take 2-3 seconds per regular-sized track.
With this fingerprint data, you can only do a "lookup" on the MusicDNS web-service, which returns a PUID if a sufficiently close match has been submitted via the MusicAnalysis described above. If MusicAnalysis has not been performed on the track by someone else; no PUID will be found for the track. MusicIP provides free fingerprint lookup services for official MusicBrainz projects and other open source projects.
Note that the PUID is just an arbitrary ID and has no relation to the fingerprint data (except for its relation within the MusicDNS server). This means you cannot generate a new PUID to insert into the database from the fingerprinting process. A new PUID can only be allocated by MusicDNS as a result of the detailed MusicAnalysis process. For the technically minded, consider a fingerprint to be a key that can be used to query for a value, being the PUID.
How Picard uses PUIDs
Assume to start with that the tracks Picard is processing do not have MB IDs previously saved in their tags (otherwise they would automatically get matched by Picard without fingerprinting/scanning and the following would not happen.
User selects an unmatched file and select the Scan button.
Picard puts the unmatched file back into the 'new' folder and calculates the fingerprint of the file.
Picard looks up this fingerprint on the MusicDNS server, trying to find a PUID corresponding to the file's fingerprint.
if the fingerprint yields a match, it receives a PUID. This will only ever happen if MusicAnalysis (above) has already been performed on the file by someone else.
if the fingerprint does not yield a match it receives nothing and Picard can do nothing more with this track until someone performs MusicAnalysis on the file (see above)
If a PUID was received, Picard now does a lookup of this PUID in MusicBrainz to find any matching tracks. This relies on a previous user having used Picard (or similar MB script) to say "this MB track off this MB release matches this MusicIP PUID".
If it gets a match, Picard retrieves the release meta data from MB, and moves the track to the right hand pane. The process stops here.
If it does not get a match the PUID stays associated with the file in memory, but stays in the left-hand pane. Now the user must manually match the track using another mechanism (e.g. Cluster/Lookup or manual search on the website).
Once the file is matched to a track, the user can choose to Submit the PUIDs back to MB, thus helping future users at step 4.
Note that PUID submission currently has some issues with Picard, and PUID submission is not viewed by the development community as one of Picard's primary functions.
When the file is saved, the PUID is saved into the meta data tags of the file and can be used for future lookup or submission.
PUID vs. TRM and the switch to Picard
While this topic is highly subjective, there appear to be common questions about why the switch to Picard and PUID has been undertaken by the MB development teams. Some of the relevant issues are:
TRM fingerprinting involved closed source code owned by
Relatable. While MusicAnalysis is closed source MusicIP code, the act of fingerprinting (used by Picard) can be open source. Both systems are dependent on a server for fingerprint lookups. Relatable's TRM server is run in the MusicBrainz hosting environment, whereas Picard utilizes MusicIP's public servers.
MusicBrainz TRM usage was plagued by large numbers of collisions between quite different tracks getting the same ID. It is hoped/believed that this will be much less with MusicIP's analysis technology. Current anecdotal evidence seems to point to this being the case; as do
the stats. MusicBrainz TRM usage was also plagued with rampant numbers of different TRMs (many useless and unique to a particular user) being generated to represent the same track. This created additional load on the servers and made fixing incorrect matched TRMs difficult to manage for editors. The technology behind MusicIP's analysis is hoped to perform better and generate fewer duplicates.
The relationship with MusicIP is a partnership. Both organizations have community input. MusicIP are using MusicBrainz's data and MusicBrainz (users) are using MusicIP's fingerprint lookup service. This is a healthy relationship to be in.
Discussion
Therefore MB needs a simple command line (or drop-files-here) tool, that does all this if we really want to populate the database with PUIDs. We should harness the past matching activity of our users. --DonRedman
It should be easy enough to create a script that:
Traverses your music files, and finds everything with no PUID
Fingerprint all the matching tracks, then submit the PUIDs
Analyse the remaining tracks to generate PUIDs
Re-run the script 24-hours later, then your entire music collection will have PUIDs.
I intend to do this to my collection once the command-line analyser ships. I should probably make a start on fingerprinting stuff with libofa before then, but it hasn't hit Debian yet. --MartinRudat
It should also be easy (hopefully) to create a script that:
Traverses your music files and finds everything with a PUID and a MB TrackID.
Submits these pairs to MusicBrainz.
Such a script could populate MB with the PUID--MBID pairs which are needed for PUIDs to become really useful. --DonRedman
It seems that the script prints "No MB ID" for files which either have no MB ID or for which no PUID was found, which confused me for a while. Perhaps two separate error messages are appropriate? --foolip
I have run into at least one case (I'm generating PUID's for everything in my library) where Picard does not generate a PUID and the genpuid tool does return a PUID. Not sure what might different about how Picard is handling generating the PUID that might account for this exception.
See
http://bugs.musicbrainz.org/ticket/2839 -- LukasLalinsky 2007-08-16 16:53:45
CategoryDocumentation CategoryPicard