Difference between revisions of "History:Data Quality"

From MusicBrainz Wiki
Jump to navigationJump to search
((Imported from MoinMoin))
(added hypotheses for how to handle change-data-quality-edits (Imported from MoinMoin))
Line 30: Line 30:
 
* Expire action: Accept, reject
 
* Expire action: Accept, reject
 
* [[Edit Type|EditType]]<code><nowiki></nowiki></code>s which are [[Auto-Edit|AutoEdit]]<code><nowiki></nowiki></code>s
 
* [[Edit Type|EditType]]<code><nowiki></nowiki></code>s which are [[Auto-Edit|AutoEdit]]<code><nowiki></nowiki></code>s
  +
  +
[[Image:Attention.png]] All this information mus be presented to the user on the [[Edit Details|EditDetails]] page. This will be different for all edits now.
   
 
As a rough illustration the data quality levels could influence the edit strictness as follows:
 
As a rough illustration the data quality levels could influence the edit strictness as follows:
Line 49: Line 51:
 
==Unresolved Issues==
 
==Unresolved Issues==
   
Under what circumstances should an editor be able to change the data quality level? It would probably be wise to:
+
What kind of edits should it be to change the data quality? That's a difficult question. We can only make an educated guess and then see how this will work out during beta tests.
  +
* Make it easy to mark the data quality as higher than it currently is.
 
  +
Consider the following matrix:
* Make it harder to mark the data quality as lower than it currently is. We do not want to allow people to take a high quality aritst/release, change it to low data quality and then make all sort of changes to the artist/release quickly.
 
  +
{| border="1"
  +
|-
  +
| legitimate raise || illegitimate raise
  +
|-
  +
| legitimate lowering || illegitimate lowering
  +
|}
  +
  +
===legitimate raise===
  +
  +
User raises DQ legitimately. Any edits entered from this point on should be harder to apply. This means: Once the raise-DQ-edit gets applied, all pending edits should now be harder to apply. This only works if the voting period for the raise-DQ-edit is not longer than that of the other edits on this quality level.
  +
  +
If we want DataQuality to have ''meaning'' then raising it should not happen automatically. It should need some peer review. Do we trust a single user to judge the data quality (esp. if lowering it again is hard)? What about the possibility of "raise-data-quality spam" (like the "random votes" we once had)?
  +
  +
What are the motivations and expected outcome of raising the data quality? For subscribers: lower their workload of watching silly edits. That is a long-term goal whihc can take time and need some effort. For the casual user: Honour and protect their own work. Ideally that would need instant gratification. We need to find a balance between these interest.
  +
  +
===illegitimate raise===
  +
  +
User tries to raise DQ but gets voted down. All edit should be applied using low DQ strictnes at all times.
  +
  +
===legitimate lowering===
  +
  +
User lowers DQ legitimately. We suppose they enter legitimate edits right afterwards (or even just before). ''They will only be motivated to do that, if this is easier than just entering the edits at the current level''. Therefore, once the lower-DQ-edit has been accepted, all pending edits should be applied according to the new easier rules. This means, [[ModBot]] must apply pending edits according to their ''current'' DQ, not according to the DQ at the time the edit was entered!
  +
  +
===illegitimate lowering===
  +
  +
User tries to lower DQ but gets voted down. All other edits entered should be applied or rejected according to the stricter rules. Also getting the lower-DQ-edit to pass should be considerably harder than getting one or two of the other edits to pass.
  +
  +
===Conclusions===
  +
  +
Remember: This is a completely unproven innitial hypothesis that needs to be tested on both test and live data!
  +
* '''[[Raise Data Quality Edit|RaiseDataQualityEdit]]'''
  +
** Takes relatively few unanimous votes (~1)
  +
** expire action is difficult to decide upon. It must be "reject" if we experience "raise-quality-spam", but we could probably start out with "apply" and only raise the strictness once we experience problems.
  +
** voting period is =< voting period for edits in the old DQ
  +
  +
* '''[[Lower Data Quality Edit|LowerDataQualityEdit]]'''
  +
** Is ''hard'' to pass. It takes ''more'' unanimous votes than other edits in the old DQ (~ 1.5 to 2 times more).
  +
** expire action is ''reject''.
  +
** Is ''quick'' to apply. voting period should be about half of normal voting period for edits in the old DQ
   
  +
[[Image:Attention.png]] Lower-data-quality-edits must be extremely easy to track. There should be special feeds or subscriptions just for them. Ideas to raise the awarenes of such edits are:
Does it make sense to offer a Keep Open option for expired mods and have those exired-but-still-open mods be highlighted in the artist subscriber emails in order to get people to vote on it?
 
  +
* If a change-DQ-edit is entered which would have consequences for the edit that I look at, I am informed of this fact. (probably hard to implement: When a normal edit is entered, check for open change-DQ-edits and add a [[Mod Note|ModNote]]. When a change-DQ-edit is entered, add such a note to all relevant pending edits. Uff)
  +
* A number of Mails get sent to random subscribers from a "quality watch" list.
   
 
[[Category:To Be Reviewed]] [[Category:Development]]
 
[[Category:To Be Reviewed]] [[Category:Development]]

Revision as of 22:05, 28 February 2007

Data Quality and Editing Strictness

  • Status: This is work in progress as of Feb 2007.

The concept of data quality is based on release locking suggestions in QualityAndQuantity and ReleaseLocking. After much discussion and even more time, the concept has undergone a number of changes in and will now actually be implemented. This page serves as the point to describe the idea and the DataQualityDiscussion page lets users chime in on the merits of this idea.

Goals

The data quality idea has the following goals:

  • Establish a method to determine the quality of an artist and the releases that belong to that artist. This providers consumers of MusicBrainz a clue about the relative quality rating of the data in the database.
  • Provide fine grained control over what efforts are required to edit the database and to vote on those edits.
  • Provide editors with a means to allow easier editing of data that is deemed to be of poor quality.
  • Provide editors with a means to make it harder to edit data that is considered to be of good quality.
  • Reduce the overall number of edits in the system by making the requirements to pass an edit suited for each edit type.

End user feature changes

To accomplish these goals, this feature will allow editors to indicate the quality for a given artist. An artist can be of unknown, low, medium or high data quality. The data quality indicator determines what level of effort is required to change the artist information or to add/remove albums from an artist. An artist with unknown or medium quality will roughly require the amount of effort that MusicBrainz currently requires to edit the database. An artist with low data quality will make it easier to add/remove albums or to change the artist information (name, sortname, aliases). And an artist with high data quality will require more effort to add/remove albums or the change the artist information. The data quality concept also applies to releases in the same manner. Changing a release with low data quality will be easier than changing a release with high data quality.

Each artist will have a new link in the edit bar: Change artist quality. This link will allow the user to select a new quality rating for the artist. Each album will have a similar link in its edit bar: Change release quality. As with the artist, this link allows the changing of the data quality rating for this release. Changing the quality rating for releases will also be a batch operation.

The daily artist subscription email will now inform users when the quality of an artist or a release belonging to that artist has been changed.

Data quality affects edit strictness

The quality rating for an artist/release will determine the following edit strictness values:

  • Edit voting duration (in days)
  • Number of unanimous votes to pass
  • Expire action: Accept, reject
  • EditTypes which are AutoEdits

Attention.png All this information mus be presented to the user on the EditDetails page. This will be different for all edits now.

As a rough illustration the data quality levels could influence the edit strictness as follows:

Normal or Unknown High Low
Voting period 2 weeks (3 weeks if there are subscribers) 2 weeks 1 week
Yes votes required to pass +1 (=1 more yes than no) +3 +1
Action on expiration accept reject accept
AutoEdits see EditType none All non-structural changes

(The table above needs to be replaced with a detailed table that lists all of the edit types and their associated edit strictness values)

Unresolved Issues

What kind of edits should it be to change the data quality? That's a difficult question. We can only make an educated guess and then see how this will work out during beta tests.

Consider the following matrix:

legitimate raise illegitimate raise
legitimate lowering illegitimate lowering

legitimate raise

User raises DQ legitimately. Any edits entered from this point on should be harder to apply. This means: Once the raise-DQ-edit gets applied, all pending edits should now be harder to apply. This only works if the voting period for the raise-DQ-edit is not longer than that of the other edits on this quality level.

If we want DataQuality to have meaning then raising it should not happen automatically. It should need some peer review. Do we trust a single user to judge the data quality (esp. if lowering it again is hard)? What about the possibility of "raise-data-quality spam" (like the "random votes" we once had)?

What are the motivations and expected outcome of raising the data quality? For subscribers: lower their workload of watching silly edits. That is a long-term goal whihc can take time and need some effort. For the casual user: Honour and protect their own work. Ideally that would need instant gratification. We need to find a balance between these interest.

illegitimate raise

User tries to raise DQ but gets voted down. All edit should be applied using low DQ strictnes at all times.

legitimate lowering

User lowers DQ legitimately. We suppose they enter legitimate edits right afterwards (or even just before). They will only be motivated to do that, if this is easier than just entering the edits at the current level. Therefore, once the lower-DQ-edit has been accepted, all pending edits should be applied according to the new easier rules. This means, ModBot must apply pending edits according to their current DQ, not according to the DQ at the time the edit was entered!

illegitimate lowering

User tries to lower DQ but gets voted down. All other edits entered should be applied or rejected according to the stricter rules. Also getting the lower-DQ-edit to pass should be considerably harder than getting one or two of the other edits to pass.

Conclusions

Remember: This is a completely unproven innitial hypothesis that needs to be tested on both test and live data!

  • RaiseDataQualityEdit
    • Takes relatively few unanimous votes (~1)
    • expire action is difficult to decide upon. It must be "reject" if we experience "raise-quality-spam", but we could probably start out with "apply" and only raise the strictness once we experience problems.
    • voting period is =< voting period for edits in the old DQ
  • LowerDataQualityEdit
    • Is hard to pass. It takes more unanimous votes than other edits in the old DQ (~ 1.5 to 2 times more).
    • expire action is reject.
    • Is quick to apply. voting period should be about half of normal voting period for edits in the old DQ

Attention.png Lower-data-quality-edits must be extremely easy to track. There should be special feeds or subscriptions just for them. Ideas to raise the awarenes of such edits are:

  • If a change-DQ-edit is entered which would have consequences for the edit that I look at, I am informed of this fact. (probably hard to implement: When a normal edit is entered, check for open change-DQ-edits and add a ModNote. When a change-DQ-edit is entered, add such a note to all relevant pending edits. Uff)
  • A number of Mails get sent to random subscribers from a "quality watch" list.