Difference between revisions of "History:Better Release Recognition"

From MusicBrainz Wiki
(Quite old. Might even delete this... luks? anything worth saving here? (Imported from MoinMoin))
m (5 revision(s))
(No difference)

Revision as of 08:39, 15 March 2009

Status: This Page is Glorious History!

The content of this page either is bit-rotted, or has lost its reason to exist due to some new features having been implemented in MusicBrainz, or maybe just described something that never made it in (or made it in a different way), or possibly is meant to store information and memories about our Glorious Past. We still keep this page to honor the brave editors who, during the prehistoric times (prehistoric for you, newcomer!), struggled hard to build a better present and dreamed of an even better future. We also keep it for archival purposes because possibly it still contains crazy thoughts and ideas that may be reused someday. If you're not into looking at either the past or the future, you should just disregard entirely this page content and look for an up to date documentation page elsewhere.

Better Release Recognition

The fact that many tracks appear on multiple releases means that even assuming that MusicBrainz uniquely identifies a track it still has a lot of latitude in deciding which release to assign it to.

Proposed Heuristics

  • Pick the oldest release first
    • This would be a good way to favour original releases over compilations and re-issues. This requires good year of release data.
  • Minimize number of releases in this batch
    • If you analyse a batch of songs that are all from one release MB currently tends to assign them across a range of different release. This rule would push MB to try and find one release that fits most of the current batch of tracks.

The next generation PicardTagger is exactly about this.

Some sort of mechanism - raw audio data aside - is needed to help where songs should ultimately be placed. Ideally, an option for "make complete releases" would be excellent. I had a ton of release directories, and tagger definitely majorly broke the shit out of it.

Proposed method: create a list of all releases in which any song being tagged is a part of. Full releases are made, then releases with all but one song, then all but two....

I suspect songs are attributed to the latest release which they were released on. Perhaps reversing this would significantly fix the problem by itself. Thus every 20 sound tracks under the sun released since the song came out originally are ignored.

I'm going to try and do something with the SDK, but I really am not looking forwards to building my own client just to fix this one Achilles heel of the greatest program I've ever used. ~Myren

Musicbrainz users (using myself as my only example) use it for what I see as two distinct purposes.

  • tagging full releases, either their own or ones downloaded
  • tagging single files, generally downloaded from p2p, etc

For each purpose MB should be able to look at the batch submitted and tag or suggest tags along slightly different tendancies, as Myren suggested. For the first one it should be more concious of the already existing tags on the flies. Most files you rip yourself are either going to be tagged already assuming you use a common tool for ripping and encoding, and most of the full releases I've downloaded are mostly tagged correctly already.

For tagging single files users (myself anyway) would want to follow the rules that Myren pointed out, that is going to the older release more than newer or compilation releases. ~Arcterex

For the case of identifying groups of songs as all being ripped from the same release, a useful heuristic would be to use an existing track number ID3 tag (perhaps the tagger already does this?). While this won't help much in the case of multiple releases of the same release with slightly different titles (box set, etc.) it would eliminate miscategorization from compilation releases. @alex

Since the current version already appears to find songs using a percentage value which is partly based on id3 identification, why not just weight certain fields differently for the two different types of look up? For example, release look up mode would favor track # and release name matches very highly. ~thomasf

I've been trying to suss this one out for some time. I would love it for the tagger to be able to sort this one out, as I kind of have to do it myself, and with a large amount of mp3s by a particular artist and a huge, complex discography it becomes almost impossible for a human to do systematically. I have difficulty even trying to sort the algorithm out myself in my head, but basically what it should be aiming for is to make as few releases as possible out of a given number of mp3s by an artist and to tag those releases as fully as possible, taking the artist's complete discography into consideration. A starting point is to look at the tracks which a user has which are unique to a particular release, meaning those releases *have* to be chosen, then try to fill those releases as much as possible...what to do with what's left over is where I get confused, I can't work out how to go about it systematically in order to achieve the desired goal. And even the desired goal gets fuzzy sometimes. Are two complete eps 'better' than one release two thirds complete?

I hope someone is working on this, as it would be the icing on the cake for the PicardTagger. I believe that any present release or track number tag should be ignored when this 'albumization' feature is being used though, or at least have the option to ignore it, and not affect the results. ~benjitz