User:Wizzcat/Data Quality Extension

From MusicBrainz Wiki

This page describes a proposal for an enhancement of the Data Quality attribute in MusicBrainz Server.


Currently data quality is mostly used as a locking mechanism to prevent contested edits, and to allow for easier editing of future releases. I believe data quality can be extended to allow for better grading of data quality, and to incentivise data addition and confirmation. Checking liner additions can be a very tedious affair, and when the editor has no good way of displaying the results, such an undertaking is unlikely to happen.


Data quality attributes should, in addition to releases, be added to works and recordings. The highest quality level will be complete, which should only be used after careful investigation of liners and other relevant sources. Unlike lower quality levels complete can be tagged by multiple users for an increased confidence level. Note however that the investigation requirements do not decrease no matter the confidence level, all editors are expected to do a fairly thorough job. Functionally a quality level of complete will behave similarly as high, the difference is that if an edit is done to a complete entity, all editors who have tagged the entity as complete will be explicitly notified about this. If an edit to a complete entitiy passes, all tags will be removed, and the data quality will be returned to high. Editors are free to re-tag after this, or perhaps uphold their tag during the voting process if they find the edit has merit.

Edits to tag as complete will be voted on as normal, with there being a high requirement for a comprehensive edit note of which resources have been investigated.

There are some differences between what effect data quality has on the respective entities which will be described below:


Data quality will have two levels, normal and complete, and all works will start at normal. If after investigation of liners and other relevant sources the lyricist and/or composer credits are found to be correct, the work can be tagged as complete.


Data quality behaves similarly to Works, the difference is in what relations are encompassed by the data quality tag. The relations affected are exclusively artist-track production/performance credits. Any track-track relations like cover or samples from are unaffected by quality levels and will not cause quality demotion or notifications if added.


Raising the quality of releases can only be done after all tracks in the tracklist are linked to respective works and recordings, and those entities already have the same quality level. This means that you are unable to tag a release as complete if some of the tracks are missing credits. The data affected by quality for releases extends to the release title, tracklist, and release event. Note that there can be some leeway in what is required from a complete release event. In some cases it might for instance be very hard or impossible to accurately pin down a release date, and in such cases a release can still be tagged as complete, as long as in-depth investigation has been done, and the deficiency is explicitly noted in the edit note. This does, however, not mean that dates can be neglected, and in cases of higher profile artists the threshold for allowing such inaccuracies should be high.

Note that after a release has been tagged, it can be tagged by others, but that this implies an investigation of the respective works and recordings as well as release-specific attributes. Any edits to works or recordings will bubble up to release level and trigger notifications and quality-lowering unless voted down.

Note also that a reworking of release formats/packaging is more or less required before release completeness can be implemented, as a change would render all previously tagged releases inaccurate.

Proposed Guidelines

An entity tagged as complete is assumed to be containing all pertinent information that is plausibly findable. Sources for this will at a minimum include all liners, if possible checking PROs for additional compositional credits, and other source that might be relevant like artist and label homepages, etc. Note that to be able to tag a release all credits should be accurately represented. This means that occurrences of Sub Optimal Credits should be left at High until such a time when the relation can be properly attributed. Editors have some discretion in this for obscure/unclear credits, but an effort must be made to create a representation that is as close as possible.

Note that for older releases it might be hard to find reliable information, and a release may be considered complete with much less information than a modern release. Lack of available liners is however not an acceptable excuse, as physical copies are with all probability findable.