Proposal:Disentangle Interfaces From Schema

From MusicBrainz Wiki
Revision as of 09:59, 19 March 2009 by Zout (talk | contribs) (removed author(s))


Whether to Change the Core Schema or not to Change the Core Schema

Alert.png Status: This page should be updated to avoid many misunderstandings that it creates. The philosophy which is described here is applied in the ObjectModel.

This question sounds pretty philosophical, and in a way I think that it really is. I believe that the recent debate (11/2005) about GettingRidOfFeaturingArtistStyle on the StyleMailingList was so heated, because people have fundamentally different perspectives on the way MusicBrainz should deal with the problem. Depending on the perspective, you either need a CoreSchema change like NextGenerationSchema, or you don't and can just use the PowerOfAR.

The Question: At Which Point to Limit Complexity

The question, as I see it, is this: "How does MusicBrainz adapt to the complexity of real life?".

The phenomenon that MusicBrainz deals with is musical releases in any form and all properties and relationships of these releases and related elements. The complexity of this phenomenon is endless. That means that there will always be cases in which the complexity of the phenomenon is too great for MusicBrainz to deal with. It is important to realize this. There is a limit to the complexity MusicBrainz can deal with, this is a fact. The question can only be: "Where do we set this boundary?".

In the recent discussion I see two answers to this question. I will call them the schematical and the procedural.

A Schematical Structuring of Complexity

The common answer to the question seems to be that complexity should be strucured at the level of storing data. This means that the database schema has to be structured. The limit is then that anything which this structure cannot reflect, is too complex and cannot be dealt with by MusicBrainz. The boundary is set by the complexity of the data structure, and since the structure is inherent to the database schema, I call this persepective schematical.

Two things should be noted in this context.

  • First, the schema will be of very complex nature. Our standards, regarding the question what MusicBrainz has to be able to deal with, are pretty high. Therefore the schema must encompass many aspects of the complex reality and will be complex as well.
  • Second, the design of interfaces is a pretty simple and straightforward issue. Interfaces just follow the logic which is inherent to the structure, they do not add any new structure to the data. This is because the data is already structured. This entails the inconvenient that interfaces are tied to the schema. If you change the schema, you have to adapt all interfaces to this change.

A Procedural Structuring of Complexity

The arguments RjMunro, WolfSong and me (DonRedman) put forward can be summarized in the other answer to the question, where to set the boundary of complexity.

Here the idea is to structure the complexity at the point of dealing with the data, and not at the point of storing it. This means that the primary schema can be unstructured. AdvancedRelationships is such an unstructured schema. The logical enhancement of AdvancedRelationships are AdvancedEntities and perhaps AdvancedProperties. This gives you a modular and extensible schema which can store any dataobject with any set of properties and any (dyadic) relationships between any two objects.

  • Note that there is some limit of complexity in this schema, but -- and this is the important point -- this limit is not a matter of structuring the data, it is only about which kind of objects, properties and relationships can be stored.

In this model the structuring of data is done by the interfaces. An interface like a tab in Keschte's ArtistPageRedesign applies a specific perspective to the data. The tracks-tab asks "which tracks are attributed to this artist?". By applying this perspective it sets a very substantial boundary to complexity. The interface will look for tracks in the artist's albums, in various artist albums and in the PerformanceRelationshipClass. But it will ignore all aspects which are too complex to its choosen perspective, like what groups this artist was a member of and which tracks are are attributed to that group.

That means that the interface applies a specific structure to the data. This is an active process, and since the structuring is inherent to this process -- to the way of dealing with data -- I call this perspective procedural.

Again there are a few details that I want to point out.

  • First, you will note that in this second approach, multiple perspectives on data are possible. There can be a multitude of interfaces that each apply their own perspective to the unstructured dataset and structure it in a way of their own. This approach is very modular, it can easily be extended one by one. It can also be implemented very slowly and step by step.
  • During this implementation you will, however, have situations in which you can already store data that you cannot deal with yet. The current situation of AR is such a case, in which we have an interface that can enter AR data, but lack interfaces that can represent and use the data. This is completely normal to the procedural approach! Using RodBegbies metaphor, while such a situation will be considered a severe pneumonia under the schematical perspective, it is a mere nuisance and incentive to development under the procedural perspective.
  • This means -- and this is an important point of its own -- that interfaces for storing data and interfaces for using data follow a different logic. The major questions of limiting complexity are not posed at the point of entering and storing data, therefore the interfaces for entering data do not impose questions of structure to the user. Questions of structure are posed by interfaces that use data. In another post I have said that the structuring of data is done based on questions not based on the facts we have. According to this perspective, design of interfaces is a very complex and creative process. The logic of interfaces cannot be derived from the datastructure, but must be invented, based on the questions you have.

Google: A Real Life Example and Metaphor

This is all very abstract. To show you that this is not a totally utopic concept, let me point out a real-life example of the procedural perspective: Google.

Google deals with internet pages. Let us focus on the aspect of linking (or intertwingling) of the www. The complexity of this phenomenon is already limited by HTML syntax (although not all pages comply with it). The important aspect is that Google takes this data and poses a question to it that is not inherent to the structure of HTML. It asks "Who links here?". Google crawls the web with this question in mind and then creates a backwards-index which can provide an answer to its question. Google is so extremely useful, because the question it provides an answer to is so common. Google is, in a way, a very cleverly designed interface to the word wide web.


I firmly belive that MusicBrainz should follow the metaphor of Google and crawlers and constructed indices. The metaphor for the schematical perspective are data-diagrams and (I believe) the SQL syntax (which pulls information from a structured set of data using the schema inherent to the available information).

The problem that I see with the schematical perspective, is that extending the boundaty of complexity is extremely costly. It alsways means to change the core database schema and to adapt all interfaces (since they are derived from the schema). I fear that in the end this will lead to a core schema, that will be nearly as complex as AR, but the interfaces will still be tied to the logic of the schema. In the end this leads to an unmaintainable monster.

With the procedural perspective, however, new AdvancedRelationshipTypes can be added without affecting the existing interfaces. It is even possible to carefully change some types, properties or entities, without breaking existing interfaces. This process would be analogue to a DDT change of an XML/RDF schema. Existing interfaces can ignore new XML elements and still apply their old and limited perspective.

Since this is a pretty coherent piece of text, please discuss it on the UsersMailingList or on ToSchemaOrNotToSchema/Discussion. Of course, if you do not understand something, add this to the text. Or you could add a counterargument to the text like this: