Talk:Sort Name Style: Difference between revisions

From MusicBrainz Wiki
Jump to navigationJump to search
m (16 revision(s))
m (8 revision(s))

Revision as of 07:54, 15 March 2009

SortNameStyle > Discussion

SortNameStyle Discussion

Why does this need to be so complicated? In the case when these rules are considered to be applicable in music players it seems really backwards to try to implement this kind of scientifically correct way of sorting. When an user thinks of Eric Clapton he thinks of 'Eric Clapton' and not 'Clapton, Eric' and most certainly he thinks of 'Jimi Hendrix Experience' instead of 'Hendrix, Jimi, Experience' or whatnot.

When the purpose of the sorting is to give the artists in a users music library in an easily browseable order why wouldn't the 'intuitive' ordering be the best one? Most of the users aren't librarians or such.

Of course it is debatable what this intuitive ordering would be, but going with the obvious would take us a long way: The name of the artist is exactly how the artist writes it. If Eric Clapton would prefer to be known as 'Clapton, Eric' I suppose he would be using the latter form. Articles like 'a' and 'the' should quite obviously be treated as articles and not parts of the sorted entity. Having 'The Doors' at T and not D is just nonsense.

One important thing to bear in mind is: what can really be implemented? It's not smart to establish a set of rules that cannot be used (or require tremendous amount of work to be used). For example starting to divide the names into different units like last name and first name and all the minor details seems to require user input for each entry (or some database) and no simple algorithm can be written. It can be argued, that if a simple algorithm cannot be written, the sorting scheme is too complicated. I don't think the users have any desire to start reciting a complicated and non straight forward set of rules when they simply want to look for a band in a list.

This all applies primarily to music player interfaces. I suppose libraries and such may need a more precise sorting scheme.

OssiLehtinen 2008-10-08 15:25:34

I don't know that I understand your complaint, however. As can be seen by many of the cross-cultural discusions for names of (fill in the blank) culture/nation, what is intuitive to someone in one country as a treatment for a particular name can quite easily be an ordering based on a totally incomprehensible basis to someone else.

When you write, "If Eric Clapton would prefer to be known as 'Clapton, Eric' I suppose he would be using the latter form.", again, I'm not sure I understand - sortnames would, generally, never be used by any musician. They're not artist(or label, etc) names, they're sortnames - two different fields in the database. "I suppose libraries and such may need a more precise sorting scheme." - this is exactly what a sortname is...

"Articles like 'a' and 'the' should quite obviously be treated as articles and not parts of the sorted entity. Having 'The Doors' at T and not D is just nonsense." Again, this is situational, and assumes articles are easily identified as such, rather than confused with, say, durnames. (or whatever those Dutch (or wherever it was) names that caused such confusion 8 or 9 months ago on the lists). As for situational, while yes, "The Foo" should (and does, per the guidelines) sort as "Foo, The", '"The Foo" and "The Bar"' vs "The Foo and the Bar" are two quite different situations, and sort differently for quite valid reasons, and with internal consistancy to the differing sortname for each of the two.

As for "what can really be implemented?", what, honestly, is so confusing about a simple set of rules for sortnames? Things like durnames, etc, will confuse, yes... but that simply suggests that we ought simply to not edit the sortname if we're not sure. If the Dutch use a special type of name, and you're not sure, perhaps tag the edit for someone who is Dutch to take a look, rather than simply editing and guessing? As for the other situation above (which always confuses), it's a temporary situation; eventually NGS will allow multi-artist attribution, and we'll not have to deal with that rule anymore. (yeah! :) ) -- BrianSchweitzer 13:17, 09 October 2008 (UTC)

Thanks for your comments Brian. I'll try to respond to some of them.

When you write, "If Eric Clapton...

I think you missed my point here. What I was saying was that from usability point of view I don't think the sortname should differ from the artist name. I mean as people know Eric Clapton as 'Eric Clapton' and not 'Clapton, Eric', they imho would naturally start looking for him at E. And now I'm really talking only about a sorted listing of artists given to a user to browse (e.g. in music players).

Again, this is situational, and assumes articles are easily identified as such...

I agnowledge the problem here. Also the German Die should be treated differently from English die and so on. A quite nice solution again regardin music players was suggested in a bugzilla report concerning Gnome Rhythmbox players problematic sorting. Link: http://bugzilla.gnome.org/show_bug.cgi?id=133444#c19 Basically the suggestion is to do the sorting in a 'dumb' alphabetical manner with a (pre)defined list of words to exclude (e.g. the, a, die ...). To fix the anomalous situations exclusions would be added. This would probably work quite smoothly as there aren't that many non German artists starting with die and so on.

As for "what can really be...

Here I think you describe the problem yourself. To know how to properly handle a Dutch name you have to be or ask a Dutch. Ok, that's fine when you enter it to some database, but the actual problem arises when a non Dutch user tries to find it. It really gets too complicated if he has to get in contact with a Dutch person to find out where to look for it. From the international point of view I think it should be assumed that people might have music in various languages other than his native.

This is why I really see 'dumb' alphabetical sorting as the only realistic and usable alternative with optional exclusions of articles and similar with added exceptions based on the human used Artist name. As you stated yourself, the national rules vary greatly and cannot be understood by people who don't speak those languages. Still it's perfectly common for the same people to listen to music from those foreign nations and languages. This makes it impossible to formulate a generic rule which in turn makes it impossible to establish a simple logic (i.e. algorithm) for people to follow and simultaneously take into account all these differences and the linguistic correctness.

What should be looked at is what is doable and usable and not what is 'right'. OssiLehtinen 2008-10-09 20:26:05

Well, but what you're describing is not what, traditionally, is the basis of the sortname concept - it's not a concept that originated at MusicBrainz, but is much older. If you google it, you'll find it implemented elsewhere as well, in libraries, at wiki, etc. Traditionally, in its simplest case, it is the last name, first name situation. Ie, Clapton, Eric, not Eric Clapton. Thus "Rita Marley" and "Bob Marley" would be found together in the CD store, while Janet Asimov would be next to Issac Asimov on the shelf in the library - rather than being separated several shelves (or even aisles, depending upon the size of the collection) apart. Your argument is well taken, but that is why there exist two different fields - artist name and sort name. If someone simply wants to sort on simple first name, they can just as easily use the artist name field. But for all the discussed reasons, sort names cannot simply be programmatically created from the artist name field with certainty (If they could, we wouldn't have to store it, we could simply always derive it).

As for words to exclude, while in basic theory it would seem possible, in practice, it becomes quite easy to mis-identify such articles. (Take a look at the guess case code and all its workarounds for just this if you want an example of how this can quickly become a pain... and GC only handles 3 or 4 languages.) Just one example, from the list you gave, is the word die - it's an article in a few languages, but a noun in others - and perhaps a verb in some language as well for all I know. The problem isn't to have to ask a Dutch person how to find something; I think on the use of data end that implies a much greater degree of difficulty than would actually be the case. The real case is that all Dutch people with that same durname would be grouped together, while all non-Dutch people, for whom it is not a durname, would likely be sorted elsewhere. And whereas that sorting and grouping of Dutch artist names may not make sense to me as a native English speaker, it is still comprehensible - and it's much better than the possible alternate, where everything is sorted based on an assumption that every single artist has an "English" name, making it that much more difficult for the Dutch person to find that sorted Dutch artist - esp when you consider that it would be just that Dutch person who would most likely have several Dutch artists' music and be attempting to locate the music by those artists.

So-called "dumb" sorting is already entirely possible, merely by using the artist name field to populate your local sortname metadata fields; if you want merely basic article stripping, that can be done with some very basic taggerscript (or javascript, etc, depending upon your use of the data) - much more simply and 100% acturately than the reverse, were we to make the sortname field essentially a duplicate of the artist name field, only with basic reversed articles, leaving it to code to attempt to identify and fix the nit-picky language/situation-specific stuff.

-- BrianSchweitzer 17:47, 09 October 2008 (UTC)

Hmm, I must acknoledge, that perhaps this isn't the right place for my concerns. What brought me here with this is that this sorting scheme is thought of the only proper way of sorting artists at least in one important media player software, which is the default in the Ubuntu distribution. In other words the developers refuse to implement any other sorting scheme as they consider this to be the correct way, and as this cannot be implemented, nothing is done. To stray a bit in that direction, I must say that what they have is a noble principle of doing things the 'right way', but the result is that the sorting there looks quite unfinished atm and no light at the end of the tunnel afaik. This is why I wanted to bring up the question whether a simpler scheme might be at least almost as right in some cases and as this (MusicBrainz) is the authority they pointed at, I figured this is where the discussion should be undertaken.

I must clarify a bit regarding the exclusion stuff: The idea was that the list of excluded articles isn't the same for all the users. A Dutch person would have a different list from an English speaking person if the English list doesn't work well for the Dutch person. To emphasize, it wouldn't be assumed that every artist has an 'English' name.

Still a note on the taggerscript suggestion: Anything requiring running scripts of sorts on your music databases is too techy for an average computer user. If getting your music library to correct order requires some scripting it won't get done by 95% of the users. They don't know how to do it and don't need to know.

Yet again, perhaps this isn't the right place for this discussion. My intention wasn't to question whether this proper sorting is proper or not (although admittedly it may have seemed like that at points :) ). My main point is simply to argue that this isn't necessarily the best and only practice in every situation. Again to emphasize: It's not that this scheme shouldn't exist, but that not everyone should consider it the only option.

Anyways, thanks for your patience :) OssiLehtinen 2008-10-10 14:04:20

For taggerscript, I'm referencing what is | built into Picard; I do get what you're saying, but it seems it'd be more effort for less benefit to non-programmatically have different sortnames for different language preferences (English users getting one, Dutch another, etc). But using PicardScripting, you could pretty easily overwrite the sortname field with the artist name field, then move such articles as you'd like to (The, A, etc) to the back with a non-complex $replace. Is it more complex than simply taking the sortname field? Sure. But it isn't very difficult, and it's just the kind of thing PicardScripting is intended to allow. -- BrianSchweitzer 15:10, 10 October 2008 (UTC)



!BibTeX has a pretty complete sorting algorithm and that one defines a "von-part" of the Name, Thus IIRC Manfred Albrecht Freiherr von Richthofen (The 'Red Baron') is sorted: Richthofen, Freiherr von, Manfred Albrecht. I assume that is what you mean by 'adjective'. The correct desciption is either "von-part" or aristocratic title or something like that. I think most sorting practices agree that this is a special part that has to be treated by itself. --DonRedman

  • This cannot be entirely true, at least for German useage. Ludwig van Beethoven for example has the sortname "Beethoven, Ludwig van". This is AFAIK the general useage for German aristocratc titles; so your example should read "Richthofen, Manfred Albrecht Freiherr von". --derGraph


There actually are standardised names for most artists which have been determined by the wise seers at the Library of Congress (and are in the processes of being harmonised with the British Library and other national libraries). These are called "name authority records" and can be conveniently searched online at: http://authorities.loc.gov/

According to them, the official names for the above are: "Muslimgauze (Musician)" and " 'N Sync (Musical group)". (Authorised names are in the "100's" fields, and known alternate names or aliases are found in the "600's" fields.) The others don't seem to have authority records..., yet.

Another scheme that librarians use, primarily for electronic records, is called the Dublin Core. (It's a way of adding information to HTML documents to identify the creator, etc.) I found a site about the Dublin Core which contains a very good description of how people who catalogue things for a living approach unknown names and non-standard characters. There's a PDF document or you can skip directly to an HTML version of page 9 ("Creator") in the Google cache.


What about fictitious artist names like Pete Namlook? It looks like a real person's name, but it isn't. It is, however, an alias for a person. --Zout 
  • I would treat it as a real name. -- WolfSong 13:48, 01 February 2006 (UTC)

I personally treat artists like ♪◆m599XGSMF6 under #2 (giving m599XGSMF6 as a sortname) whereas artists like (´・д・)ノ I have no idea about (Japan probably has some name for them...). --Nikki 


  • Is it counter-intuitive to sort bands that contain single artists under the band rather than the artist? It'd be a pretty weird record shop where Bob Dylan and The Band is in the "D" section, Jimi Hendrix is in the "H" section but the Jimi Hendrix experience is in the "J" section.
    • Which is probably why I've never liked this sorting scheme. I used to work in record store retail and groups with a members name were always sorted by the sort name of the member. So Dave Matthews Band would be under Matthews; Alan Parsons Project would be under Parsons. I do realize that the actual sort name will look ugly but I don't think people generally "look" at sort names with any frequency. You're more likely to look at the "sorted" name (meaning how it looks in a list). The hurdle will be how to place the elements. Is it Hendrix Experience, The Jimi or Hendrix Experience, Jimi, The. The driver for that decision should be how the system (Picard and TaggerScript in general) will interpret it not is it visually appealing to humans. -- WolfSong 17:14, 14 February 2006 (UTC)
      • totally agree, it seems odd that 'Jimmie Hendrix' and 'Jimmie Hendrix Experience' is not sorted together, it seems to defy the purpose of sortnames mo


hb- I’m not happy at all about any of the rules after #4. They are poorly written, poorly structured, and in my opinion, poorly thought out.

There is an on-going debate about the last (#9) item of rule #6, regarding handling of proper names within band names. The fact that rule #6 has 9 items really indicates that there are problems here.

It’s also debatable as to whether there -is- a debate on this issue. While several have spoken out against the current rule, no one has come to its defense or tried to explain the reasoning.

It really feels as though there are only a very few people who prefer to not sort the artist and leave the ArtistSortName tag identical to the ArtistName tag as much as possible. This defeats the idea of the overall dictum: which is that “Sortnames are –heavily- edited in order to sort all artists well” (emphasis added). Rule 6 item 9 is absolutely counter to this direction and its supporters need to speak up or be over-ruled in their silence.

ArtistSortNames do NOT have to be non-ugly. They have to be functional. The simple ArtistName tag is the display tag. Why bother to have two tags if we keep them the same despite the desire to actually “sort” our artists?

Rules 5 and 6 need to be re-written for clarity and process, and for reversal of item 9.

I propose that the existing first four rules remain as they are written.

After those, use the following rules:

5. ArtistNames that include natural separators like “and”, “with”, “&”, “vs.”, and “,” (comma) have already created, de facto multiple ordered sequential “parts”. Whether these ordered “parts” contain Collaborating Artists or not is immaterial. Maintain artist intent by keeping this order and performing the remaining sort naming accordingly –within- each “part”. If there are no separators, then treat the whole ArtistName as a single “part”. Each “part” follows the remaining rules independently from one another. Keep their order, and keep the separator between them without an additional comma.

6. Within each “part”, if present, pull out the full proper name of the non-fictional individual artist or member for whom the sort will be performed, e.g. “Jimi Hendrix” and “Dave Matthews” and “Alex Harvey”. If there is no proper non-fictional artist name or band member, then proceed to rule #7. Format the proper name portion of the “part” accordingly:

a. The proper name must be the individual artist or a band member. An ArtistName “George Washington” does not get sorted under “W” unless the individual is, or the band includes, a George Washington.

b. Regular names like “First-Name Last-Name” are sorted like “Last-Name, First-Name” with the addition of a comma. Example: "Eric Clapton" sorts as "Clapton, Eric".

c. For artist names with a nickname between the first name and last name, the nickname is treated as if it's part of the first name of the artist. Example: "Jean 'Toots' Thielemans" sorts as "Thielemans, Jean 'Toots'".

d. Leading Titles like “Dr.” and “DJ” and “MC” are moved after the person’s name with a preceding comma. Example: “Dr. Dre” sorts as “Dre, Dr.” and "DJ Tiësto" sorts as "Tiësto, DJ".

e. Trailing suffixes like “Sr.” or “Jr.” or “III” always remain at the end of Individual’s Name. Example: "Harry Connick, Jr." sorts as "Connick, Harry, Jr.". Though there usually is one, add no preceding comma where there is no comma already. Example “Dave Thomas III” sorts as “Thomas, Dave III”.

f. For artists whose last names start with an abbreviation, the last names are unabbreviated in the sort name. Example: "Rebecca St. James" sorts as "Saint James, Rebecca".

g. Removal of the proper name from the ArtistName “part” might leave a remaindered portion. In this case, a comma is added at the end of the proper name portion, and the remainder is kept as a new, single, whole entity and treated identically to a “part” under Rule 7. The result is concatenated at the end after the comma. Example: “The Jimi Hendrix Experience” has a remainder of “The Experience” and sorts as “Hendrix, Jimi, Experience, The” and “The Sensational Alex Harvey Band” leaves “The Sensational Band” and sorts as “Harvey, Alex, Sensational Band, The”.

7. ArtistName “parts” without proper names (including band names and remaindered portions from Rule 6) are handled accordingly:

a. Leading Articles, like “The” and “A” and “Los”, regardless of language, are moved to the end with a preceding comma. Example: “The Beatles” and “A Perfect Circle” and “Los Lobos” sort as “Beatles, The” and “Perfect Circle, A” and “Lobos, Los”.

b. Leading abbreviations are unabbreviated. Example: "St. Lunatics" sorts as "Saint Lunatics".

Examples:

The Jimi Hendrix Experience – Hendrix, Jimi, Experience, The

Dave Matthews Band – Matthews, Dave, Band

The Sensational Alex Harvey Band – Harvey, Alex, Sensational Band, The

Stevie Ray Vaughn and Double Trouble – Vaughn, Stevie Ray and Double Trouble

Roger Clyne and The Peacemakers – Clyne, Roger and Peacemakers, The

Hootie and the Blowfish – Hootie and Blowfish, The

Note that there is no need for a comma after “Stevie Ray” and “Roger” as the full proper names were parts without any remainder per rule 6g.

If the name had been “The Peacemakers and Roger Clyne” then keeping proper sequential order would yield “Peacemakers, The and Clyne, Roger”. Note that there is no need for a comma after the “The” following “Peacemakers”.

Note: There are two general rules for when we add a comma. First, in the case where we separate the proper name from it’s remainder. And more commonly, for when we change the sequence of words. For example moving the leading “The” and swapping first name and last name. We don’t add a comma between the natural separators like “and” and “with” because we haven’t relocated any words and they work just fine on their own.

Script writers/programmers will notice that these rules lend themselves quite nicely to programmatically deriving ArtistSortName from ArtistName without having to use many exception cases, given a searchable table of non-fictional proper name.

  • To the anonymous contributor: Well, some of us aren't happy with any of the rules from number 1 on. Personally, I think we could save a lot of time and effort if we rewrote them thus: # For a single artist, ArtistSortName must be the same as the artist's name as listed in the U.S. Library of Congress catalog. # For multiple artists working together (e.g. "Tony Sheridan and the Beatles"), the ArtistSortName should be broken down into separate artists' names, each name replaced with the name as listed in the U.S. Library of Congress catalog, then the names re-assembled (e.g. "Sheridan, Tony and Beatles"). # There is no rule 3. But I doubt that's going to happen. --LarryGilbert

hb- Sorry for the AP. It was unintentional.

Thanks for the support inasmuch as it's one more voice seeking a meaningful ArtistSortName.

I certainly have no problem with Sort names appearing as you propose. I especially like the concept of having a strongly enforced database of standard values. Hey! That's what MusicBrainz is supposed to be.

Serioulsy, I'd considered the LoC approach before. (I've been down this road quite a bit.) I'd so love to see a new tag for LoC LC number. However, for the life of me I can't find a way to access the LoC DB programmatically. If that's not just me, then I think that's a deal killer.

Maybe if someone can find a link explaining how-to, I'd get behind that idea. Until then, as I see it, the only thing we have to go from is the value in ArtistName, and I stand by my proposal.

Still waiting to hear from some one who supports the current rules.--HnryBrdsly



I have a question about artists like, "The Ghost Who Walks"; Should we edit their name in the style of "Ghost Who Walks, The" or just leave it as "The Ghost Who Walks". I think it should be more the latter because it seems more like a sentence/statement than a name. -- Mackattack

  • I think that the sortname should be "Ghost Who Walks, The", on the off chance that somewhere there is a release that left out the "The", so that they get sorted right next to each other. -- MartinRudat 13:34, 17 June 2006 (UTC)


In Flanders (Dutch-speaking part of Belgium), family names that start with "Van", "De" or similar are generally sorted under "Van ...", "De ...", etc. E.g. "Boudewijn de Groot" would be sorted under "de Groot, Boudewijn". --JanC



In the Netherlands they are definitely sorted without the article or preposition though. And that answers the question as well: definitely not adjectives: 'de' and 'het' and their variations are articles 'van' (and some other less common ones like 'in' or 'op') are prepositions. -- thisfred



The value of preserving case is not immediately clear. It is not unusual for music libraries to "smash" case for sort order by storing only one. My first artist entries to MusicBrainz were fouled because the JavaScript in my browser did not work for the guess feature and it was not clear to me that case mattered. This means I and others had to go back and edit all the entries I made. Additional verifications and some substitute for the Javascript magic might be nice, but it seems at the very least that if preserving upper and lower case is going to be a big deal that should be mentioned up front in the sortnamestyle guideline. -- m0llusk



Some clarification is needed for what to do when there are three or more artist names to a sortname, and whether or not it's acceptable to use semicolons, as traditionally dictated in English grammar. There was some disagreement recently with The Hacker, Millimetric & David Caretta: whether the sortname should be "Hacker, The, Millimetric & Caretta, David" or "Hacker, The; Millimetric & Caretta, David". My taste runs to the latter, but the style guidelines here indicate the former is correct. Would it be all right to add something like "Do not use semicolons even if there are three or more names" to clarify the intent? --LarryGilbert

  • I've added the comma to rule 4 above, to make it clear the same separator has to be used. I think that will do --Zout


Why is (for example) 10,000 Maniacs sorted as "10,000 Maniacs" and not "Ten Thousand Maniacs" as libraries do? --LarryGilbert



Since abbreviations such as St. are expanded in sort names, shouldn't Jr. and Sr. be too? --Creap

  • Note that according to the current rules, it's only the first word, after sorting that gets expanded. There are only a few that I can think of that would remain there after the title move, mostly just Saint and it's equivalent in various other languages. --SailorLeo


The sort names for Chinese releases is kind of a mess and very unhelpful. The de facto rule seems be using their tranliterated family name and their English name, e.g. "Chou, Jay" "Leung, Tony" "Chang, Jeff". This is helpful to non-Chinese users, but kind of breaks what sort names are. I sometimes change these to the transliterated names, like "Chang, Shin Che".

Furthermore, the sort names are very unhelpful for Chinese releases and make it harder to find what you are looking for even when the sort names are proper transliterations. This is because different transliterations are used in different areas, and some artists seem to make up their own transliterations that "sound good" For exmple the family name 周 is written as "Zhou", "Chou" and "Chow". It would probably be easier to find what you're looking for if simple unicode sorting was used for Chinese names.

Ideas? Should sort names like "Chou, Jay" be tolerated?

--foolip

  • I believe that there are moves afoot to have proper support for translation/transliteration, rather than overloading sort name for that, until that time, people are going to keep putting latin text in there, rather than simply copying the artist's name, which I think will be the correct thing for most asian names. Perhaps in addition to, say, pinyin, romaji, etc, there should also be an 'official transliteration', based on however the artist/publisher/label seems to write it most often. Until we have that though, I'd say that it's quite likely going to be a pain to try and keep a lid on things like that. (Personally, I'd browse for stuff using 'official transliteration', 'cause I can't read any of those squiggles. =) -- MartinRudat 07:34, 19 August 2007 (UTC)
    • Do you have a reference/link to that discussion where I could give my input? -- foolip
      • No, sorry. This was just my impression from the irc channel. After a bit of poking around, apparently translation and transliteration is supported between albums, but there isn't a relationship between two artist going x is a transliteration of y. I don't really know if this is the state at which translation is going to be supported, or if there is going to be more. The last time I heard anything was sometime last year. -- MartinRudat 10:49, 20 August 2007 (UTC)

Suggested ammendment to the sort name style guide:

  1. All ArtistSortNames should be in Latin script. Other scripts such as Greek, Hebrew and Han (Chinese/Japanese) should use a sort name as per below.
    • An official transliteration/translation, e.g. as it appears on album covers.
    • A widely known transliteration/translation, e.g. as known in the press or by fans.
    • A transliteration using the standard transliteration system used in the region where the artist is active.

-- foolip

This amendment would have undesirable effects for Japanese names: For example, the family name 伊藤(いとう) would be transliterated as Itō according to the Revised Hepburn romanization; however, some artists use non-standard forms like Itoh, Itou or Ito. If one used the artist's prefered form in preference to the standard transliteration in the SortName, artists with the same family name - 伊藤 - would end up not being sorted together. For this reason, at least from the perspective of Japanese it would make more sense to always use the standard form in the SortName. The artist's prefered form could be put in an alias or in the ArtistAnnotation.

-- mrouge



I don't understand why French particules aren't treated the same as Dutch tussenvoegselen. It seems to me all the arguments applied to tussenvoegselen also apply to particules. Truthfully, I feel "Groot, de, Boudewijn" just looks silly and doesn't improve sortability. Suppose there was another artist named "Martijn de Groot" - these two artist would sort to the same order whether the "de" was in the middle or at the end. And what exactly are the rules for Germanic names? Beethoven has sort name "Beethoven, Ludwig van". --dkg


At least in the Dutch speaking part of Belgium, when you're going to look for a name, words like "Van" and "De" are part of the surname, and are used to sort on. 

There might be an exception for people from nobility where those words are not part of the name, and they would be spelled with a lowercase letter. I'm not sure how those get sorted, but it seems they're included just like for other names.

Someone pointed me to the discussion at http://lists.musicbrainz.org/pipermail/musicbrainz-users/2008-April/thread.html

What I've learned from this is that in Belgium the tussenvoegsels (usually) are part of the name, so it gets sorted including it. Someone said that 99% of the Belgians would sort like that. On the other hand people in the Nederlands don't see it as part of the name, and 70% don't sort using it. Whether it's part of the surname seems to depend on being spelled with an upper or lowercase letter, and the lowercase versions would not be part of the name.

It also said that the international standard was to include it, and that it's common with people that migrated to end up with the tussenvoegsel being capitalized.

I'm currently of the opinion that if it's not considered to be part of the name, that it should not be mentioned in the sort order in the first place, so would mean just "Beethoven, Ludwig". I would never look under the G to find "de Groot", and I think that has to do with that I use the word "de" when using his surname, while I never talk about "van Beethoven". So I consider the "de" in "de Groot" to be part of the surname, since nobody will have an idea about who I'm talking when I'm just talking about "Groot". --kroeckx