Talk:Survival Of The Fittest

From MusicBrainz Wiki

Gavin Clarke wrote:

The main problem that I see with your proposed voting system is that no item of data is every going to be marked as definitively 100% correct. Take the following example, the band U2 could have 3 entries for the band name:

U2 U too U-two

Each of which has a confidence measure based on the number of votes it had received. The default data picked up by the tagger and the main pages of the site will be the one with the highest confidence rating.

Lets say the votes were distributed as follows

U2 10 votes U too 1 vote U-two 9 votes

This would mean the the (correct) U2 option was displayed by default, so most people would not question it and therefore no votes would be cast. One (or perhaps 2) votes for U-two would make that the default option until a couple more people voted for U2.

You would end up in a situation were every piece of data in the database was only one or two votes away from being annoyingly changed to something that wasn't the definitive correct record. All of the data would have just enough votes to make it the default, as then people would have no need to vote on the record.

Disclaimer: I may have completely misinterpreted the moderation/voting system you had in mind. If I did then I'm sorry and can you please post clarifications as to what I have missed.

  • This is /not/ a problem if the moderators/voters communicate with each other. Because then they would have a discussion on a forum/mailinglist and agree, that U2 is the correct spelling. The U-two voters would then change their vote and this would give U2 a 90% lead. After a month or so the garbage-collector would delete U too and U-two. DonRedman

RjMunro: Some sort of system for adding low-value votes to data based simply on the fact that no one has voted against for a period of time would be neccesary. These votes would be cast automatically by the server, perhaps after a certain number of hits on pages displaying the data. In this way, the "U2" above would gently creep away over time from the "U two" proposal. Perhaps also when data is purged, people who voted on the "U too" option could be emailed and asked if they prefer U2 or U-two.

A variation on this (which is now on the proposal page), if we go with votes carrying weight, is to have that weight decay over time for non-leading options. That way, if someone is still convinced their chosen answer is best, they could re-visit the vote and reassign it their full karma. The purge point could simply be equivalent to the position a newbies vote will decay to over a reasonable period of time.

Robert Kaye

This system sounds usable for when there is a lot of contention between changes in the database. With a lot of contention you get a lot of votes and then your system shines. However, MusicBrainz doesn't seem to have as much contention (I've never seen an argument where people vascillate a piece of data back and forth) as might be appropriate for your suggestion. I think we'd need an order of magnitude more users/voters to get this system to flow.

Also, I fear that fringe music tastes might be hard to enter into such a system if 'Proposals that have very few (or no one) backing them will be removed' -- we already have the problem of firnge music moderations starving for votes. I think this new system would make this worse.

RjMunro: Things are only removed if they are not in the lead. If they are in the lead, they stay there.

Robert Kaye: Survival of the fittest puts it into a completely different perspective, however. I think this system way work really well for subjective metadata, where there isn't a clearly 'correct' way of doing it. Two artist reviews bubbling to the top on survival of the fittest sounds quite nice.

Andy Baker

The great advantage I see from this approach is that it takes two current problems:

  • lack of voters
  • difficulty in using data until it has passed a vote

and lets them help solve each other. People vote just by using the data and people do more mods as they can be used straight away.

Survival by Data-Vote

This is sort of a compilation of ideas I've seen floating here and on the mailing list, and while I can't recall exactly who said what, most of this is NOT based on my own original ideas... credit to those who to whom it belongs :)

Hopefully this will be of use to those trying to figure out a workable system.

System based on 'votes'. 'votes' can be made by hand OR by data-vote.

Anytime someone accesses the data AND Tags their MP3s using that data (so just looking up the data doesn't skew the votes), it is a 'data-vote'.

When someone asks for data on an MP3, they are given the current 'popular' choice, with a flag (if there are other choices) which they can click on to see.

After a certain number of votes for a new edit, the old edit disappears (say 15 'new' votes with no 'old' votes, or 30 'new' votes with only 5 'old' votes... yall can figure out the ratio)

case example data-voting;

Someone looks up an MP3, 'Saturdays warrior - line upon line'.

Tagger returns "Soundtrack - Saturday's Warrior - 10 - Line Upon Line" with 1 TRM, but they think Doug Stewart wrote the music, and they know the styles guide says for soundtracks the Artist is composer... so they enter a moderation changing it to "Doug Stewart - Saturday's Warrior - 10 - Line Upon Line" There is now 1 'vote' for each name.

A few people look up their MP3s, see the two choices and pick the second one, so it gets a few more votes, and the original is still there, with its 1 vote.

Then someone else comes along who knows that Lex de Azevedo actually wrote the music, so they enter a change. Now each person who comes looking gets the 2nd listing, but can check and see the others. As more and more people look up that song, they add votes to the correct one (or incorrect, this does rely alot on people being willing to be sure they make the right choice)

The leading choice will likely tend to get more votes, out of user laziness, but that can be countermanded to a certain extent by hand voting, particularly if hand votes get a higher value. (Perhaps using one of the systems here EditorRating certain users' votes are 'stronger')

Eventually choice #1 has its 1 vote, #2 has 6-7 votes and #3 has 30-35... (being able to see the data-vote count would be nice, then we can see how many others agree with a given option... if the count is close, we can choose to search the net to get more info... or if it is overwhelmingly in favor of one, we can see that too)

Eventually the bad choices will disappear as other choices assume a majority (if some choices maintain a certain percentage of data-votes, they would still be offered as a choice)

An option to see notes would be nice too, as a low vote count choice may have a link explaining why it was given as a choice.. which may inspire people to vote for it. Of course, a leading choice can do the same, especially for things which were changed to fit the style guide. Then people can see why it is "Pearl Jam - Lost Dogs (disc 2)" even though on their disc, it has it as "Pearl Jam - Lost Dogs (Disc 2)" or "Depeche Mode - Exciter: The Limited Edition" rather than "Depeche Mode - Exciter (The Limited Edition)" (hopefully people making such changes add appropriate notes :)

At the same time as all this data-voting is going on, hand voting can be used to 'clean-up' those items that may not get many data-votes, or items known by someone to be wrong. E.g., I have alot of filk songs, and know alot about them, as I have had to look up lots to add, when MB didn't have the data. So when I see something that is obviously wrong, I'm more likely to go in and put a hand-vote (and note) against it. (Same reason we subscribe to artists.. we like their stuff.. we know their stuff... so we are interested in keeping their data clean). So, since I'm taking the time to go in and hand-vote, that vote is going to count higher (say 2 or 3 data-votes weight). Also those who take the time to help out the database by moderating, are more likely to vote on things they know, which balances out those who may simply click on the first option offered when tagging their music.

  • At this point I think that I might want to delete an option (because I know it is wrong but people use it anyway). Could there be the posibility to vote no by hand on an option? Or is this proposal misunderstanding the whole SurvivalOfTheFittest concept completely? --DonRedman I don't think that option will be neccesary. Just vote on what is right, and what is wrong will slowly go away. An extension of what you are advocating is a system that lets you split your vote amoungst a series of options that all seem to be more correct than any of the other options, but you are unsure as to which one is absolutely correct. This is a nice concept, but I think it would have little value in practise. It might be useful with SubjectiveData --RjMunro

(this may make it easier for screw-ups to go in and mess up the data.. but has there been a problem with intentional mess-ups?)

So, even though filk is not a particularly popular genre, it gets attention from those who count most... the people who LIKE it, and decisions are not reliant on a certain number of votes from the general population.

  • Data that is actually being used contributes, so no waiting for someone to moderate it, and possibly having the power to delete it with a single vote because no one else had voted yet.
  • New data is added automatically, with the assumption that if it is incorrect, those using that data will likely know it is wrong, and will fix it... then others can data-vote on the correct choice.
  • If you have hand-fixed something, you can use it then and there, by clicking the flag and picking your choice.

Thats my take on it anyways, feel free to toss me examples you don't see working, and I'll try and sort them into this system... Or if you don't see how something would work... Or if you like the color of my shoelaces... :)

jinxkitten