Difference between revisions of "User:Nikki/Recording lengths 2"

From MusicBrainz Wiki
(All track lengths)
(If there are releases with disc IDs, only include lengths from releases with disc IDs)
 
(23 intermediate revisions by 10 users not shown)
Line 19: Line 19:
 
=== I think the median is better than the mode, because ... ===
 
=== I think the median is better than the mode, because ... ===
 
* if there are an equal number of different track lengths(e.g. {1:03, 1:04, 1:07, 1:08}) then all of them are the mode. Not very useful. However, median could produce a decimal duration. [[User:Hawke|Hawke]] ([[User talk:Hawke|talk]])
 
* if there are an equal number of different track lengths(e.g. {1:03, 1:04, 1:07, 1:08}) then all of them are the mode. Not very useful. However, median could produce a decimal duration. [[User:Hawke|Hawke]] ([[User talk:Hawke|talk]])
 +
* [[User:LordSputnik|LordSputnik]] - The reasons that hawke said. However, if we think of it statistically, the difference between track lengths is an error. This error will typically be < 10 seconds. Because of this error, it makes no sense to quote the recording length to the same number of significant figures as the track length, so we should calculate the median length, then only show the recording length as an approximation, to the nearest 10 seconds.
 +
* Same reason as hawke said. You have also less options in the sub-optimal case (even number of tracks) with median then you have with mode (all track lengths are different). For median I'd simply do: <code>sorted_lengths[len(sorted_lengths)/2]</code>, which I think is very simple to understand for people and it produces more stable results than the mode, which can change from the shortest track to the longest, as a result of adding just one additional track. [[User:LukasLalinsky|Lukáš Lalinský]] ([[User talk:LukasLalinsky|talk]]) 17:01, 26 February 2013 (UTC)
 +
** [[User:Murdos|Murdos]] ([[User talk:Murdos|talk]])
  
 
=== I think the mode is better than the median, because ... ===
 
=== I think the mode is better than the median, because ... ===
* An actual track length. [[User:JonnyJD|JonnyJD]] ([[User talk:JonnyJD|talk]]) 03:12, 26 February 2013 (UTC)
 
  
 
=== The median and the mode both suck! I think we should ... ===
 
=== The median and the mode both suck! I think we should ... ===
 +
* [[User:OliverCharles|OliverCharles]] ([[User talk:OliverCharles|talk]]) 13:08, 26 February 2013 (UTC) Do both! I prefer using a real track length, so I'd put preference on the mode. However, in the case of length being multimodal, I'd just take the median, rounded to the nearest actual track length.
 +
** [[User:Kepstin|Kepstin]] ([[User talk:Kepstin|talk]]) 16:28, 26 February 2013 (UTC)
 +
** [[User:Kuno|warp]]
 +
** That is called the '''Medoid''' http://en.wikipedia.org/wiki/Medoid (except the round down is non-standard) --[[User:JonnyJD|JonnyJD]] ([[User talk:JonnyJD|talk]]) 17:43, 26 February 2013 (UTC)
 +
*** I could get behind that one. [[User:Hawke|Hawke]] ([[User talk:Hawke|talk]])
 +
* [[User:LordSputnik]] Alternative to using the median and rounding: https://moqups.com/LordSputnik/ddvkFg5B
 +
*** or that one. [[User:Hawke|Hawke]] ([[User talk:Hawke|talk]])
  
 
=== I still don't care, just calculate it automatically somehow. ===
 
=== I still don't care, just calculate it automatically somehow. ===
 
* [[User:Nikki|Nikki]] ([[User talk:Nikki|talk]]) 02:48, 26 February 2013 (UTC)
 
* [[User:Nikki|Nikki]] ([[User talk:Nikki|talk]]) 02:48, 26 February 2013 (UTC)
 
* [[User:Ianmcorvidae|Ianmcorvidae]] ([[User talk:Ianmcorvidae|talk]]) 03:22, 26 February 2013 (UTC) (as long as the "choose the shortest" variation is chosen in order to have an actual track length)
 
* [[User:Ianmcorvidae|Ianmcorvidae]] ([[User talk:Ianmcorvidae|talk]]) 03:22, 26 February 2013 (UTC) (as long as the "choose the shortest" variation is chosen in order to have an actual track length)
--[[User:Reosarevok|Reosarevok]] ([[User talk:Reosarevok|talk]]) 13:03, 26 February 2013 (UTC)
+
* [[User:Reosarevok|Reosarevok]] ([[User talk:Reosarevok|talk]]) 13:03, 26 February 2013 (UTC)
  
 
== Which lengths should be included? ==
 
== Which lengths should be included? ==
Line 35: Line 44:
  
 
=== I don't care ===
 
=== I don't care ===
--[[User:Reosarevok|Reosarevok]] ([[User talk:Reosarevok|talk]]) 13:03, 26 February 2013 (UTC)
+
* [[User:Reosarevok|Reosarevok]] ([[User talk:Reosarevok|talk]]) 13:03, 26 February 2013 (UTC)
  
 
=== All track lengths ===
 
=== All track lengths ===
Line 41: Line 50:
 
* [[User:Ianmcorvidae|Ianmcorvidae]] ([[User talk:Ianmcorvidae|talk]]) 03:22, 26 February 2013 (UTC) (if we do something about disc IDs or official releases, IMO it should be weighting, not exclusion)
 
* [[User:Ianmcorvidae|Ianmcorvidae]] ([[User talk:Ianmcorvidae|talk]]) 03:22, 26 February 2013 (UTC) (if we do something about disc IDs or official releases, IMO it should be weighting, not exclusion)
 
* [[User:OliverCharles|OliverCharles]] ([[User talk:OliverCharles|talk]]) 13:03, 26 February 2013 (UTC)
 
* [[User:OliverCharles|OliverCharles]] ([[User talk:OliverCharles|talk]]) 13:03, 26 February 2013 (UTC)
 +
* [[User:Kepstin|Kepstin]] ([[User talk:Kepstin|talk]]) 16:28, 26 February 2013 (UTC)
 +
* [[User:LordSputnik|LordSputnik]]
 +
* [[User:LukasLalinsky|Lukáš Lalinský]] ([[User talk:LukasLalinsky|talk]]) 17:03, 26 February 2013 (UTC)
 +
* [[User:Murdos|Murdos]] ([[User talk:Murdos|talk]])
  
 
=== If there are official releases, only include lengths from official releases ===
 
=== If there are official releases, only include lengths from official releases ===
 +
* [[User:Kuno|warp]]
  
 
=== If there are releases with disc IDs, only include lengths from releases with disc IDs ===
 
=== If there are releases with disc IDs, only include lengths from releases with disc IDs ===
 
* This, but what ian said above: weighting not exclusion. [[User:Hawke|Hawke]] ([[User talk:Hawke|talk]])
 
* This, but what ian said above: weighting not exclusion. [[User:Hawke|Hawke]] ([[User talk:Hawke|talk]])
 +
* [[User:Jesus2099|Jesus2099]] ([[User talk:Jesus2099|talk]]) 16:57, 26 February 2013 (UTC) coming from discs is the only important and unbiased
 +
:: DiscIDs are getting less common though. --[[User:JonnyJD|JonnyJD]] ([[User talk:JonnyJD|talk]]) 17:54, 26 February 2013 (UTC)
 +
* AcoustIDs (duration estimated as with tracks above) would also be reasonable. [[User:Hawke|Hawke]] ([[User talk:Hawke|talk]])
  
 
=== Weighted (Official/TOC) ===
 
=== Weighted (Official/TOC) ===
 
* "Add a track twice" in the list where the mode/median is chosen when the release is official or has a Disc ID. Not more when both is the case though. That might be too much. --[[User:JonnyJD|JonnyJD]] ([[User talk:JonnyJD|talk]]) 04:37, 26 February 2013 (UTC)
 
* "Add a track twice" in the list where the mode/median is chosen when the release is official or has a Disc ID. Not more when both is the case though. That might be too much. --[[User:JonnyJD|JonnyJD]] ([[User talk:JonnyJD|talk]]) 04:37, 26 February 2013 (UTC)

Latest revision as of 04:41, 27 February 2013

After the initial round of feedback, all except one person who responded supported always setting recording lengths automatically. We decided in a dev chat that we will always set them automatically and we are focusing on how to calculate the the length. As in the initial round of feedback, the discussion on this page does not apply to standalone recordings.

Determining the length

The goals I think we have when determining the length are:

  • To use one of the actual track lengths.
  • To avoid anomalous values, whether that's from appended silence, clipping or erroneous data.
  • To avoid huge changes in lengths when data is added or removed (difficult with only one or two values however).

Given those goals:

  • The mean (0 votes previously) does not fit, since it will often not return one of the actual track lengths.
  • Sorting the releases and taking the length from the first one (0 votes previously) does not fit, since it does not avoid anomalous values and does not avoid huge changes in lengths when data is added or removed.
  • Using the shortest length (2 votes previously) does not fit, because it does not avoid all anomalous values (it works for anomalies that make the track longer, but not ones which make it shorter) and also does not avoid huge changes in lengths when data is added or removed.

The two which do fit are the median (3 votes previously) and the mode (4 votes previously). The median is problematic when there are an even number of values, since then you normally take the mean of the two values, which would not necessarily result in an actual track length (e.g. given 3:00 and 5:00, it would give 4:00). The mode is also problematic when there are multiple modes (e.g. again, 3:00 and 5:00, neither is more common than the other). We could however avoid that problem by instead taking the shortest of the middle values or most common values for the median and mode respectively.

Votes and reasons for how the length should be determined

I think the median is better than the mode, because ...

  • if there are an equal number of different track lengths(e.g. {1:03, 1:04, 1:07, 1:08}) then all of them are the mode. Not very useful. However, median could produce a decimal duration. Hawke (talk)
  • LordSputnik - The reasons that hawke said. However, if we think of it statistically, the difference between track lengths is an error. This error will typically be < 10 seconds. Because of this error, it makes no sense to quote the recording length to the same number of significant figures as the track length, so we should calculate the median length, then only show the recording length as an approximation, to the nearest 10 seconds.
  • Same reason as hawke said. You have also less options in the sub-optimal case (even number of tracks) with median then you have with mode (all track lengths are different). For median I'd simply do: sorted_lengths[len(sorted_lengths)/2], which I think is very simple to understand for people and it produces more stable results than the mode, which can change from the shortest track to the longest, as a result of adding just one additional track. Lukáš Lalinský (talk) 17:01, 26 February 2013 (UTC)

I think the mode is better than the median, because ...

The median and the mode both suck! I think we should ...

I still don't care, just calculate it automatically somehow.

  • Nikki (talk) 02:48, 26 February 2013 (UTC)
  • Ianmcorvidae (talk) 03:22, 26 February 2013 (UTC) (as long as the "choose the shortest" variation is chosen in order to have an actual track length)
  • Reosarevok (talk) 13:03, 26 February 2013 (UTC)

Which lengths should be included?

In the initial round of feedback there were some suggestions of only including lengths from releases with disc IDs if any of the releases have disc IDs. Which track lengths should be included for determining the recording length?

I don't care

All track lengths

If there are official releases, only include lengths from official releases

If there are releases with disc IDs, only include lengths from releases with disc IDs

  • This, but what ian said above: weighting not exclusion. Hawke (talk)
  • Jesus2099 (talk) 16:57, 26 February 2013 (UTC) coming from discs is the only important and unbiased
DiscIDs are getting less common though. --JonnyJD (talk) 17:54, 26 February 2013 (UTC)
  • AcoustIDs (duration estimated as with tracks above) would also be reasonable. Hawke (talk)

Weighted (Official/TOC)

  • "Add a track twice" in the list where the mode/median is chosen when the release is official or has a Disc ID. Not more when both is the case though. That might be too much. --JonnyJD (talk) 04:37, 26 February 2013 (UTC)