Difference between revisions of "User:Ianmcorvidae/Recordings"

From MusicBrainz Wiki
(AcoustID section, start of a page. Super super draft, not ready for general consumption :))
 
Line 6: Line 6:
 
First, a note: AcoustID is quite out of scope for this discussion except in terms of imagining what benefits it might lend. MusicBrainz does not control AcoustID; luks will look at whatever new system is created and do what he thinks is best (and I'm quite confident that if any of us make suggestions, he will be right, and we will be wrong).
 
First, a note: AcoustID is quite out of scope for this discussion except in terms of imagining what benefits it might lend. MusicBrainz does not control AcoustID; luks will look at whatever new system is created and do what he thinks is best (and I'm quite confident that if any of us make suggestions, he will be right, and we will be wrong).
  
That being said, it is valuable to consider AcoustID's role in such a new system. So: what is AcoustID good at? The fingerprinting process highlights "chroma features" -- i.e. combinations of pitch (in a very western, 12-note sense) plus time plus intensity (for the first two minutes, most recently, 30s before that<ref name="chromaprint_06" />). After this, it uses a filtering process to turn the image into a series of numbers; this process was trained by machine learning<ref name="chromaprint_overview" /> using luks' collection (as I recall from IRC, no citation on that). After this, the fingerprint must be assigned to an acoustid (sometimes, a new one), which happens by way of a comparison process I don't deeply understand<ref name="acoustid_compare" /> but which compares the two fingerprints bitwise and the length of the track (the latter since the fingerprint itself will never be longer than 2 minutes). These are then matched to (at present, of course) recordings.
+
That being said, it is valuable to consider AcoustID's role in such a new system. So: what is AcoustID good at? The fingerprinting process highlights "chroma features" -- i.e. combinations of pitch (in a very western, 12-note sense) plus time plus intensity (for the first two minutes, most recently, 30s before that<ref name="chromaprint_06" />). After this, it uses a filtering process to turn the image into a series of numbers; this process was trained by machine learning<ref name="chromaprint_overview" /> using luks' collection (as I recall from IRC, no citation on that). After this, the fingerprint must be assigned to an acoustid (sometimes, a new one), which happens by way of a comparison process I don't deeply understand<ref name="acoustid_compare" /> but which compares the two fingerprints bitwise (or, part of the track<ref name="inside_acoustid"/>) and the length of the track (the latter since the fingerprint itself will never be longer than 2 minutes). These are then matched to (at present, of course) recordings.
  
 
So: acoustid is good at comparing/differentiating things that differ in terms of the times at which certain notes are played at a given intensity, with tolerance levels tuned for differentiating within luks' collection. Historically we've known this not to do very well with things like karaoke versions versus the normal, or alternate lyrics over the same base, or remasters. There are also some cases where multiple AcoustIDs will be assigned despite similarity, due to the fuzziness of the algorithm.
 
So: acoustid is good at comparing/differentiating things that differ in terms of the times at which certain notes are played at a given intensity, with tolerance levels tuned for differentiating within luks' collection. Historically we've known this not to do very well with things like karaoke versions versus the normal, or alternate lyrics over the same base, or remasters. There are also some cases where multiple AcoustIDs will be assigned despite similarity, due to the fuzziness of the algorithm.
Line 17: Line 17:
 
<ref name="acoustid_compare">https://github.com/lalinsky/acoustid-server/blob/master/postgresql/acoustid_compare.c#L119</ref>
 
<ref name="acoustid_compare">https://github.com/lalinsky/acoustid-server/blob/master/postgresql/acoustid_compare.c#L119</ref>
 
<ref name="chromaprint_06">http://oxygene.sk/2011/12/chromaprint-0-6-released/</ref>
 
<ref name="chromaprint_06">http://oxygene.sk/2011/12/chromaprint-0-6-released/</ref>
 +
<ref name="inside_acoustid">http://oxygene.sk/2011/12/inside-the-acoustid-server/</ref>
 
</references>
 
</references>

Revision as of 05:54, 21 December 2012

This is a draft and will change. Please do not link it to the general discussion page.

Levels

AcoustID

First, a note: AcoustID is quite out of scope for this discussion except in terms of imagining what benefits it might lend. MusicBrainz does not control AcoustID; luks will look at whatever new system is created and do what he thinks is best (and I'm quite confident that if any of us make suggestions, he will be right, and we will be wrong).

That being said, it is valuable to consider AcoustID's role in such a new system. So: what is AcoustID good at? The fingerprinting process highlights "chroma features" -- i.e. combinations of pitch (in a very western, 12-note sense) plus time plus intensity (for the first two minutes, most recently, 30s before that[1]). After this, it uses a filtering process to turn the image into a series of numbers; this process was trained by machine learning[2] using luks' collection (as I recall from IRC, no citation on that). After this, the fingerprint must be assigned to an acoustid (sometimes, a new one), which happens by way of a comparison process I don't deeply understand[3] but which compares the two fingerprints bitwise (or, part of the track[4]) and the length of the track (the latter since the fingerprint itself will never be longer than 2 minutes). These are then matched to (at present, of course) recordings.

So: acoustid is good at comparing/differentiating things that differ in terms of the times at which certain notes are played at a given intensity, with tolerance levels tuned for differentiating within luks' collection. Historically we've known this not to do very well with things like karaoke versions versus the normal, or alternate lyrics over the same base, or remasters. There are also some cases where multiple AcoustIDs will be assigned despite similarity, due to the fuzziness of the algorithm.

In most of the mix/master/track proposed systems, therefore, AcoustIDs probably distinguish best among mixes. In systems with only recording/track, aside from the path-of-least-resistance benefits, they probably distinguish best among recordings. However, in neither case is AcoustID sufficient for defending either merging or splitting, though it's somewhat better for defending splits.

References