History talk:Capitalization Proposal

From MusicBrainz Wiki

Comments from Tom Hull:

For HTML, I suggest that the example words appear in bold.

Rationale: I've tried to come up with a simplified rule set that does not generally require in-depth understanding of English grammer, but produces reasonably correct results in almost all cases. The 3/4 letter prepositition size limit is used by (I think) most U.S. publishers.

The trickiest part is (2c). Cutting the preposition size down limits the number of exceptional cases. The 3/4 letter split is a rough guideline. I have omitted prepositions like "up" and "out" because they are so infrequently used as prepositions that it's much simpler (and not terribly wrong) to always capitalize them; on the other hand, such 4-letter words as "from", "into", "onto", and "with" are common and almost always used as prepositions, so there is a rather good case for including them. I personally prefer to lowercase these four, but feel it would be easier (and not terribly wrong) to always uppercase them.)

I've also omitted "so" from the list: while it is sometimes used as a conjunction, it is overwhelmingly used as an adverb, so the same rationale applies as with "up" and "out".

I am hard pressed to explain "as" and "by" except by grammar rules: both are used as conjunctions (lc), prepositions (lc), and adverbs (ulc); although lc uses predominate, they are not overwhelming.

Not capitalizing "to" in infinitives, which is common but not universal practice, puts it overwhelmingly in the lc camp.

The bottom line here is that we have a list of 15 words (I may have missed a couple more, what are they?) that are not capitalized in most or all cases. We could probably illustrate that list in a second file, as well as build up a deeper list of exceptions and special cases. If we have more examples, we may be able to better formulate the rules.

For non-English titles, one can either force the titles into English-language rules (with or without identifying and special-casing the foreign language articles, conjunctions, and prepositions), or one can let each language set its own rules. In my own work I do the former, but I can't defend that as a general principle, so my recommendation is that we tag titles by language and apply the appropriate language-specific rules. Still, this leaves us with further problems: how to determine the language of ambiguous titles or titles with foreign words, and how to handle bilingual titles.

And, of course, this doesn't address the real problem with album titles, which is often just what the hell is the title? Lots of albums say one thing on the spine, another on the cover; have subtitles or series titles that can be combined with dashes or colons, possibly in more than one way. --TomHull



I would just like to express my opinion - opinion only - that using the capitalization on the album itself is never wrong. This is, assuming, ofcourse, that the album title isn't in all-caps (And not an acronym of similiar). Unfortunately, its impossible for people who don't have the album in front of them to judge whether the capitalization is "right" or "wrong", then. Alternative argument: If we had a perfect algorithm for determining the caps of the titles, no user effort would be required for it, as they could be set automatically by the database/display code. Hence the present user-level effort put into getting the capitalization right is somewhat superfluous.

As noted, there are a handful of album- and song-titles where the capitalization is nonstandard, however, and in those cases the only way to make sure is to check the album cover / website for the correct capitalization. One suggestion I would make to avoid effort being wasted on this, would be to have such an automatic capitalization algorithm, with a checkbox for "Non-standard capaitalization" in the input form. Some usability tests might be worth conducting, too; it might make sense to confront submitters with an extra page asking "Is this the capitalization you intended?", and in most cases the non-standard capitalization can be guessed or inhered. --Donwulff



Eric: I like the extra page idea. I think forcing all titles to English language specifications is a bad idea, specifically for the reason that it limits poetic license. But, some titles are actually just descriptions, such as for symphonies, and need to be standardized. I suggest that when you edit data in the database, you can add a comment that moderators can read when they vote. That way, if there is some change that doesn't meet the standard, the change can be explained such as "Capitalization on album cover", etc.

Here's a real easy capitalization rule: Capitalize Every Word. It's not always the prettiest solution, but it removes ambiguity. --Pitboss and Seighin



I don't like capitalization. Just capitalize words that should be capitalized, like the first words, names of people, cities, countries etc. --MJAX

I am not against capitalization but if I can only apply the rules with a thorough understanding of english grammar, then IMHO they are too complicated. --DonRedman


Seeing English capitalization rules used for titles in a languague that have different ones, looks rather strange when you know the language ... And applying them in one's own mods would feel even stranger - so I would appreciate language-specific capitalization (still) to be accepted. I don't find it hard to understand though, why I've had questions about capitalization in some of my (non-English) mods lately - most of which have been in Norwegian. Tenebrous suggested in one of them to put the main rules here (to make voting on the mods easier) - so here they come ... I think the main rules are the same for quite a few languages - in short using the same rules in titles as in normal text, i.e. to capitalize:

  1. first letters, and
  2. names (with a few exceptions for prepositions etc. in names like Stratford-upon-Avon, Frankfurt am Main, Frankfurt an der Oder, Ludwig van Beethoven)

In German, nouns are in general capitalized as well.

Composite names (e.g. The English Chamber Orchestra) probably have a bit more differing rules than the names of persons and places. In Norwegian, the main rule is to capitalize the first letter only (a rule often violated by Norwegians, too). In general, everybody should be particularly critical to titles in the language(s) they know best ... mede

I think that the capitalization rules for English titles are very clear and concise, but I do think we should try to gather information on how to capitalize titles in other languages. Perhaps it will be too much information to add to the Capitalization Guide, but we could at least create a wiki page containing such guidelines and link to that page from the Capitalization Guide. Let me also add that for Swedish titles, the rules are only to capitalize the first word and all proper nouns (i.e. names of persons, places, objects, etc.). -- BigNick