User:Jokipii

From MusicBrainz Wiki
Revision as of 11:44, 23 May 2012 by Jokipii (talk | contribs) (stats update)
Jump to navigationJump to search
Antti Jokipii [MB: Jokipii and operator of Jokipii_bot | IRC: Jokipii | Wiki: Jokipii | Last.fm AnttiJokipii]

I am currently trying to improve linking between MusicBrainz and Discogs. I have both databases installed on PostgreSQL. Bot code can be found at musicbrainz-bot and code that produces Discogs database from monthly XML dumps found at discogs-xml2db.

Userscripts

Here is userscript that makes voting for Discogs links easier.

Bot queue

Set descriptions and number of links

In Progress

Release links identified by match on normalized catalog number, release name, linked label, format, same number of tracks, same release country, and same release year.
* 9860

Not Started

Artist Discogs links
* Exact name match. One or more already linked various artist release(s) where artist have track(s). All track(s) found that way point on same artist at Discogs.
* Example 
* 16349
Advanced relationships between releases and artists where both are already linked to discogs
* Producer Hand made example 74164
* Mastered 27970
* and certainly lots also in other relationship classes
Artist name with exact (case insensitive) match, is member of groups with Discogs links, all groups found that way have same Discogs artist as member.
* 8692
Artist (type:group) name with exact (case insensitive) match, have members with Discogs links, all members found that way have been also market as members in Discogs entry.
* 2083
Artist that have Discogs link, and not have type(person/group) set, and have multiple members in Discogs (indicating type:group)
* 1309
Artist that have Discogs link, and not have type(person/group) set, and have Discogs realname without characters "&,/+" and word "and" (indicating type:person)
* 1699

Done

Artist Discogs links
* Exact name match. Have release(s) with Discogs links. All releases found that way point on same artist at Discogs.
Artist types based on disambiguation comment
Artist country based on disambiguation comment
Artist country based on Discogs profile text
Release links identified by exact match on catalog number, release name, linked label, format, same number of tracks and same release country.

Bot programming tasks

  • Merge bot code to musicbrainz-bot Checkmark.pngDone
  • Start using discogs-xml2db to produce Discogs database Checkmark.pngDone
  • Better documentation
  • Map Discogs credits <-> MB Advanced relationships

Some stats

MusicBrainz Total Discogs Total Links (all these are not unique) Percent done (compared to smaller total) Sum of unique MusicBrainz releases connected to linked entities Percent of all MusicBrainz releases Sum of unique Discogs releases connected to linked entities Percent of all Discogs releases
Releases: 988301 2720810 171170 17%
Release groups: 822442 365081 47387 13% 106445 11% 277207 10%
Artist: 626598 2100250 110825 18% 606737 61% 1738474 64%
Label: 55844 245988 16004 29% 414436 42% see note 1542803 57%

note: In MB only 567392 releases have label information, and 420909 don't have.


2012-02-23 MusicBrainz Total Discogs Total Links (all these are not unique) Percent done (compared to smaller total)
Releases: 1008061 2926422 182581 18%
Release Groups: 839314 405891 73656 18%
Artists: 644784 2251519 126210 20%
Labels: 58038 300452 17141 30%


2012-03-23 MusicBrainz Total Discogs Total Links (all these are not unique) Percent done (compared to smaller total)
Releases: 1016640 2926422 200576 20%
Release Groups: 846637 405891 80482 20%
Artists: 651243 2251526 143468 22%
Labels: 58872 300452 17322 29%


2012-04-23 MusicBrainz Total Discogs Total Links (all these are not unique) Percent done (compared to smaller total)
Releases: 1027422 3045567 225871 22%
Release Groups: 854785 423610 90931 21%
Artists: 658827 2328770 149744 23%
Labels: 59775 323996 17529 29%


2012-05-23 MusicBrainz Total Discogs Total Links (all these are not unique) Percent done (compared to smaller total)
Releases: 1035849 3045567 229590 22%
Release Groups: 860665 423610 94024 22%
Artists: 664330 2328770 164656 25%
Labels: 60456 323996 17949 30%