History:Amazon Matching

From MusicBrainz Wiki
Status: This Page is Glorious History!

The content of this page either is bit-rotted, or has lost its reason to exist due to some new features having been implemented in MusicBrainz, or maybe just described something that never made it in (or made it in a different way), or possibly is meant to store information and memories about our Glorious Past. We still keep this page to honor the brave editors who, during the prehistoric times (prehistoric for you, newcomer!), struggled hard to build a better present and dreamed of an even better future. We also keep it for archival purposes because possibly it still contains crazy thoughts and ideas that may be reused someday. If you're not into looking at either the past or the future, you should just disregard entirely this page content and look for an up to date documentation page elsewhere.

The Amazon Matcher is a process that searches Amazon's catalog for releases in the MusicBrainz database and records their URLs. This allows the website to display release cover art, and also provides the links that allow you to buy the release just by clicking on it. See also ASIN.

Alert.png Please note: The Amazon Matcher process is no longer running! To add cover art to a release you can use the AmazonRelationshipType or link to an URL to one of the CoverArtSites. To change incorrect cover art, HowToChangeCoverArt will give you some hints.

2004-06-01: Andy Grundman has submitted an update to the Amazon Matcher that should match a much higher percentage of releases. It contains track name matching as well as various artist support. This page will be updated to reflect the current state of the matcher after this new version goes live. For now, the Missed Matches section below has been updated to show which releases are now matched by the new script.

Algorithm

The current algorithm for matching MB releases to Amazon releases is as follows:

  1. Download all the releases for a given artist from amazon
  2. Pass 1:
  3. For each release in musicbrainz:
    1. Tokenize both release names: convert to lowercase, remove accents, punctuation and whitespace
    2. Compare the release names and store its similarity
    3. Pick a release with the highest similarity rating, as long as it is above 80%
  1. Pass 2: (Try chopping () and [] to get a better match)
  2. For each release in musicbrainz
    1. remove anything appended in () from the MB release names
    2. remove anything appended in () or [] from the Amazon release names
    3. Tokenize the chopped release names and store the similarity
    4. Pick a release with the highest similarity rating, as long as it is above 80%

Question: is pass #2 really done for each release in musicbrainz (as stated above), or is it only done for those left unmatched after the first pass? This could explain a couple of the mis-matches below.

Todo items

  • Select the best image server based on which Amazon store a user has selected
    • NOTE: some release covers are not on all image servers -RJ (And in my experience 'imports' usually have lower quality scans than the same release in its local amazon, though why this should be I have no idea - bawjaws)
  • Match various artist releases

This is not perfect, but it does get a reasonable number of matches. If you have observed things that should've matched, but didn't, please add them to the Missed Matches list below. Also, if some intrepid perl hacker would like to try tuning the matching script, I would appreciate that!

  • This seems as good a place as any to note that the addition of BarCodes to the database would make this kind of matching much easier, more efficient, and more reliable --MatthewExon

Previously Missed, Now Matching

These previously missed matches are now matched successfully by the latest version!

"Strangeitude", dfdb6572-97d0-4852-a4e4-a5f55f27711b, "Strangeitude", ["B000000QGN"]
"1492 - Conquest of Paradise", 7b249234-0aa3-45e9-aa70-3043d5fc28f2, "1492: Conquest of Paradise - Original Motion Picture Soundtrack["SOUNDTRACK"]", ["B000002IUK"]
"Please Please Me", ade577f6-6087-4a4f-8e87-38b0f8169814, "Please Please Me", ["B000002UA9"]
"Fold Your Hands Child, You Walk Like a Peasant", 94a5439b-f067-4b1d-9d4f-024dad95bdf6, "Fold Your Hands Child, You Walk Like a Peasant", B00004T8ZB
"Legal Man", 788604c8-74fc-4235-ae68-e414f2f1c475, "Legal Man [CD-SINGLE]", ["B00004SWH2"]
"Dog on Wheels", 64fd5312-24db-4c4f-b1a4-9b7e33a64f98, "Dog on Wheels / State I Am in / String Bean [CD-SINGLE]["IMPORT"]", ["B000007WND"]
"3.. 6.. 9 Seconds of Light", 64989a07-675f-4b29-8c19-7d68528550f6, "3-6-9 Seconds of Light [CD-SINGLE]["EP"]["IMPORT"]", ["B000007WNC"]
"Storytelling", 6d1d433e-709b-4c6b-8d09-7e8b845be806, "Storytelling["SOUNDTRACK"]", ["B00005OM56"]
"Tactical Neural Implant", 872b430a-170b-42d8-9942-bfe1fe96447e, "Tactical Neural Implant", B000007U3A
"Analogue Bubblebath IV", ca6dbf75-970b-4ff7-8c82-c8baf263ca50, "Analogue Bubblebath 4["EP"]", ["B00000FEOX"]
"The Day The World Went Away", ae8d1dea-e9b6-4018-b4c6-755ff43553ed, "Day the World Went Away [CD-SINGLE] ", ["B00000JNIR"]
"Light", 04c29e01-ef17-4e48-bb05-1737fd8b65e4, "Light [CD-SINGLE]", ["B000003RIV"]
"Cascade", f09e2b42-b4c6-47ba-b2c9-4160855d880a, "Cascade", ["B000005LB3"]
"Euphoria (Firefly)", 7a51b9fb-363d-4fb1-90e4-17b986aa2732, "Euphoria [CD-SINGLE]", ["B000005DD3"]
"Analogue Bubblebath", f4e39fbf-743c-4186-bd23-2a5be5365551, "Analogue Bubblebath [CD-SINGLE]", ["B000000GRN"]
"06:21:03:11 Up Evil", a6d4018b-244a-4041-ba0a-2aae6cd7cb3b, "06:21:03:11 Up Evil", ["B0000028ZU"]

Missed Matches

MB Album Name, MB Album Id, Amazon Album Name, Amazon Asin

"Classics", ff0dff59-9a2a-4498-ad9f-09915b91ba8a, "Classics", B00005Y1TM

Not matched because "Classics" should be an artist release under Aphex Twin and not a VA release.

"The Perfect Drug Versions", a4db9744-347f-47f5-a4bd-394fde23831c, "Perfect Drug [CD-SINGLE]", B000001Y7W

Probably not matched because the track names differ too much.

"Silent Hill 2", 72ea51fd-0d61-48fc-ba16-e1ae178b408d, "Silent Hill V.2["IMPORT"]", ["B00005NO3D"]
  • Not matched because Amazon lists the artist as "Game Music". This is not a VA release though, so I am not sure how best to match this one.

Incorrectly Matched

"Final Fantasy VII Original Soundtrack (disc 1)" (64a20811-f819-4f0b-b305-7ffbf127ab64)
"Final Fantasy VII Original Soundtrack (disc 2)" (6131c8f5-ccb3-4156-b925-1c2f2a12ed20)
"Final Fantasy VII Original Soundtrack (disc 3)" (78a11ee2-42d7-4035-9463-fdf9fe4640c3)
"Final Fantasy VII Original Soundtrack (disc 4)" (62dd9e39-28b5-4d59-8380-087f9f2d42b1)
"Final Fantasy VIII Original Soundtrack (disc 1)" (1c82c54c-58e2-46e3-8a53-23185af40795)
"Final Fantasy VIII Original Soundtrack (disc 2)" (0827d683-933b-431d-a97e-dcb71d3bc3a4)
"Final Fantasy VIII Original Soundtrack (disc 3)" (bed52222-4ba1-4e83-b514-34f017e44f46)
"Final Fantasy VIII Original Soundtrack (disc 4)" (1d11fa6b-2b33-41bc-a16b-dff27b44c394)

are showing the cover art for the Final Fantasy IX Original Soundtrack. The Final Fantasy VII soundtrack should have the cover art from ASIN B000038I2O and the Final Fantasy VIII one should have ASIN "B00003CK5N".

"Led Zeppelin IV" in MB is matched to "Led Zeppelin II" in Amazon. 
 

Peter Gabriel (1978) - the "scratch" album (8e66ea2b-b57b-47d9-8df0-df4630aeb8e5) is pointing to Amazon's Peter Gabriel (1977) - the "car" album.

"The Best of James" (http://musicbrainz.org/album/5575f9cd-3a0d-4bf1-b4b3-ffce44ea1806.html) is pointing to Amazon's "The Best of James Taylor".

Other Matching Problems

'Guerrilla' by the Super Furry Animals (http://www.musicbrainz.org/showalbum.html?albumid=94127) matches to an album in Amazon called 'Guerrilla "import"' that doesn't have an image. But there is an album called just 'Guerrilla' that does have an image. The only things that I can think of that would stop it matching according to the algorithm above is that A) you can't currently buy it from Amazon as they are out of stock and B) the import copy now has a higher 'popularity' rank than the non-import as it is still available for purchase.

The album 'The Charlatans' (http://www.musicbrainz.org/showalbum.html?albumid=42524) by the band also called 'The Charlatans' (who are also known in the states as 'The Charlatans UK') retrieves the image for the album 'The Charlatans' by the american group called 'The Charlatans' (listed in Amazon as 'The Charlatans (1960's)'. A bit of an odd corner case I know but I'm surprised that it can get the other albums by this artist correct and get this one wrong. The real album/image is listed as both 'Charlatans "import"' and 'The Charlatans "UK"' by the artist 'Charlatans UK'.

Many of the albums by 'The Tragically Hip' have the cover art show up fine; however, if using a store like amazon.ca, following the buy link takes you to the "IMPORT" album (which is listed as unavailable) rather than the Canadian release (which are identical AFAIK). For example, the ASIN for "Fully Completely" is "B000002OMP" in Canada (this is invalid in the US), not "B00000IJRC", which is fine with the US store, but the "IMPORT" version in Canada. I suspect this is a problem with other artists (but I haven't found any specific ones).

"With The Beatles" (a91b9173-b958-401b-9551-b15db0e7bc5d/B000002UAC) retrieves the cover for "The Beatles (The White Album)" ASIN "B000002UAX".

The self-titled first album by "Creedence Clearwater Revival" (6da15b06-b848-487c-a74a-af8fe26f1069) retrieves the cover of a best of called "The Best of Creedence Clearwater Revival" (ASIN: "B000006XV2").

This: http://musicbrainz.org/album/8dac0482-cc08-4a45-82be-899604becbcb.html is mis-matched because of the decision we made not to include text like "Music From The Motion Picture" in soundtrack titles.

It should be: http://tinyurl.com/34lsq

Things may have changed since the previous comment but at the moment the album is mismatched because there are four different Bullitt albums in Amazon: "Bullitt (1968 Film) "SOUNDTRACK"", "Bullitt (1968 Film) "SOUNDTRACK" "IMPORT"", "Bullitt (Music from the Motion Picture) "SOUNDTRACK" "IMPORT" "ORIGINAL RECORDING REMASTERED"" and "Bullitt (Music Recreated from and Inspired by the Motion Picture) "SOUNDTRACK" "IMPORT" "ORIGINAL RECORDING REMASTERED"". The second and third of which appear to match the tracklist of the album in musicbrainz linked to above. Since the algorithm outlined above discards everything in brackets for the second pass then all of these match equally and I assume one of the four contenders is then chosen at random. The absence/presence of "Music from the motion picture" etc. is therefore in this case a red herring, though it probably does apply in the "Conquest of Paradise" case listed above.

Releases with the same name, but extra tracks don't match correctly. There are 3 different Weezer (Green) albums named the same and share the first 10 tracks, but the UK release has 11 tracks, and the Japanese release has 12 tracks http://musicbrainz.org/showalbum.html?albumid=56450 however they all match to ASIN "B00005ICAW", but the 12 track release should be ASIN B00005B7U2, and the 11 track release should be "B00005JHYM"

This is missing a match http://www.musicbrainz.org/album/502cf184-caaa-4c77-ab81-87ff38c30c34.html - as amazon has a 'dead page duplicate' for the album. it should point to: http://www.amazon.co.uk/exec/obidos/ASIN/B00004XN08/

Albums http://www.musicbrainz.org/album/7cee1d42-14f7-47e2-988c-14c46d55e162.html and http://www.musicbrainz.org/album/55ff080e-6fd1-4e2e-872f-8eff966bcb7d.html are mis-matched. Correct MB album for that Amazon match is only http://www.musicbrainz.org/album/3900cff7-3334-4007-8eeb-29307c25a8ed.html