History:Better Release Recognition

From MusicBrainz Wiki
Revision as of 18:54, 15 March 2006 by ImportUser (talk) ((Imported from MoinMoin))

Better Album Recognition

The fact that many tracks appear on multiple albums means that even assuming that MusicBrainz uniquely identifies a track it still has a lot of latitude in deciding which album to assign it to.

Proposed Heuristics

  • Pick the oldest album first
    • This would be a good way to favour original releases over compilations and re-issues. This requires good year of release data.
  • Minimize number of albums in this batch
    • If you analyse a batch of songs that are all from one album MB currently tends to assign them across a range of different album. This rule would push MB to try and find one album that fits most of the current batch of tracks.

The next generation PicardTagger is exactly about this.

Some sort of mechanism - raw audio data aside - is needed to help where songs should ultimately be placed. Ideally, an option for "make complete albums" would be excellent. I had a ton of album directories, and tagger definitely majorly broke the shit out of it.

Proposed method: create a list of all albums in which any song being tagged is a part of. Full albums are made, then albums with all but one song, then all but two....

I suspect songs are attributed to the latest album which they were released on. Perhaps reversing this would significantly fix the problem by itself. Thus every 20 sound tracks under the sun released since the song came out originally are ignored.

I'm going to try and do something with the SDK, but I really am not looking forwards to building my own client just to fix this one Achilles heel of the greatest program I've ever used. ~Myren

Musicbrainz users (using myself as my only example) use it for what I see as two distinct purposes.

  • tagging full albums, either their own or ones downloaded
  • tagging single files, generally downloaded from p2p, etc

For each purpose MB should be able to look at the batch submitted and tag or suggest tags along slightly different tendancies, as Myren suggested. For the first one it should be more concious of the already existing tags on the flies. Most files you rip yourself are either going to be tagged already assuming you use a common tool for ripping and encoding, and most of the full albums I've downloaded are mostly tagged correctly already.

For tagging single files users (myself anyway) would want to follow the rules that Myren pointed out, that is going to the older album more than newer or compilation releases. ~Arcterex

For the case of identifying groups of songs as all being ripped from the same album, a useful heuristic would be to use an existing track number ID3 tag (perhaps the tagger already does this?). While this won't help much in the case of multiple releases of the same album with slightly different titles (box set, etc.) it would eliminate miscategorization from compilation releases. @alex

Since the current version already appears to find songs using a percentage value which is partly based on id3 identification, why not just weight certain fields differently for the two different types of look up? For example, album look up mode would favor track # and album name matches very highly. ~thomasf

I've been trying to suss this one out for some time. I would love it for the tagger to be able to sort this one out, as I kind of have to do it myself, and with a large amount of mp3s by a particular artist and a huge, complex discography it becomes almost impossible for a human to do systematically. I have difficulty even trying to sort the algorithm out myself in my head, but basically what it should be aiming for is to make as few albums as possible out of a given number of mp3s by an artist and to tag those albums as fully as possible, taking the artist's complete discography into consideration. A starting point is to look at the tracks which a user has which are unique to a particular album, meaning those albums *have* to be chosen, then try to fill those albums as much as possible...what to do with what's left over is where I get confused, I can't work out how to go about it systematically in order to achieve the desired goal. And even the desired goal gets fuzzy sometimes. Are two complete eps 'better' than one album two thirds complete?

I hope someone is working on this, as it would be the icing on the cake for the PicardTagger. I believe that any present album or track number tag should be ignored when this 'albumization' feature is being used though, or at least have the option to ignore it, and not affect the results. ~benjitz