LinkedBrainz: Difference between revisions

Revision as of 23:26, 28 July 2010

The LinkedBrainz project is intended to help MusicBrainz publish its database as Linked Data. Linked Data is simply a method for publishing structured data on the web based on semantic web technologies. LinkedBrainz will provide

a mapping of NGS and ARs to RDF
integration with MusicBrainz server code to provide dereferenceable URIs
a SPARQL endpoint for querying MusicBrainz data

Mapping

Entity concepts in MusicBrainz will be mapped to concepts in the Music Ontology and other appropriate ontologies.

There's been some activity related to this lately on the Music Ontology Specification list.

Old RDF Mappings

At least 4 RDF mappings of the MusicBrainz database exist

the original RDF service used by MusicBrainz back in the day
the Zitgist mappings
the DBTune mappings
the new Talis dataincubator mappings (work in progress)

None of these tackle NGS but should serve as a good starting point.

Dereferenceable URIs

There are essentially two approaches to providing dereferenceable URIs of the form http://musicbrainz.org/<type>/<mbid>

These approaches are RDFa and Content Negotiation.

RDFa

RDFa is a syntax for embedding RDF into HTML documents. The RDF modeling of a particular MusicBrainz entity could be embedded along side the normal HTML. Web browsers and RDF consumers would use the exact same content.

pros

only small changes to the code base required
most parsers read RDFa these days

cons

HTML page sizes would get bigger (not sure how much bigger) which might slow everything down

Content Negotiation

With the content negotiation approach, during each request, the HTTP Accept header is examined. If it contains something like "Accept:application/rdf+xml" an RDF/XML document is returned. Otherwise a normal HTML page is returned.

pros

the "classic" linked data approach
most widely supported by RDF consumers
no HTML bloating

cons

must modify code base and muck around with the request cycle a bit

SPARQL Endpoint

The SPARQL endpoint will allow users to query the MusicBrainz RDF graph using all the rich expressiveness of the SPARQL Query Language.

The DBTune SPARQL endpoint provides provides an interface to pre-NGS data.

There are two technical approaches for implementing a SPARQL endpoint.

Replicate and swallow

A script runs on the MusicBrainz DB and creates an RDF dump using our mapping. We then load this dump into a purpose-built triple store like Virtuoso or 4Store.

Obviously this is a hack and not what we want to do in the long run. However, it's the fastest way to get a SPARQL endpoint up and running. This might be done "on the side" separate from the MusicBrainz code base on C4DM's server for testing out mappings, even though it is not our final solution.

Wrapping

Some software wraps the Postgres DB and translates SPARQL queries into SQL. The DBTune endpoint actually does this using the D2R Server software and a declarative mapping file. However, the DBTune endpoint operates on a remote server using a MusicBrainz DB dump. The D2R Server software is based on Java. We might implement something similar that is lighter weight and based on Python (or Perl if that's what it is).

Bureaucratic Details

employees

kurtjx is employed on a consulting basis for 6 months+. Another individual may be employed later.

funding

Funding for LinkedBrainz comes from JISC.

@@ Line 21: / Line 21: @@
 ==Dereferenceable URIs==
 There are essentially two approaches to providing dereferenceable URIs of the form http://musicbrainz.org/<type>/<mbid>
+These approaches are RDFa and Content Negotiation.
 ===RDFa===
@@ Line 48: / Line 50: @@
 ==SPARQL Endpoint==
+The SPARQL endpoint will allow users to query the MusicBrainz RDF graph using all the rich expressiveness of the [http://www.w3.org/TR/rdf-sparql-query/ SPARQL Query Language].
+The [http://dbtune.org/musicbrainz/snorql/ DBTune SPARQL endpoint] provides provides an interface to pre-NGS data.
+There are two technical approaches for implementing a SPARQL endpoint.
+===Replicate and swallow===
+A script runs on the MusicBrainz DB and creates an RDF dump using our mapping.  We then load this dump into a purpose-built triple store like Virtuoso or [http://4store.org 4Store].
+Obviously this is a hack and not what we want to do in the long run.  However, it's the fastest way to get a SPARQL endpoint up and running.  This might be done "on the side" separate from the MusicBrainz code base on C4DM's server for testing out mappings, even though it is not our final solution.
+===Wrapping===
+Some software wraps the Postgres DB and translates SPARQL queries into SQL.  The DBTune endpoint actually does this using the [http://www4.wiwiss.fu-berlin.de/bizer/d2r-server/ D2R Server] software and a declarative mapping file.  However, the DBTune endpoint operates on a remote server using a MusicBrainz DB dump.  The D2R Server software is based on Java.  We might implement something similar that is lighter weight and based on Python (or Perl if that's what it is).
 == Bureaucratic Details==
 ===employees===
-kurtjx is employed on a consulting basis for 6 months+
+kurtjx is employed on a consulting basis for 6 months+.  Another individual may be employed later.
 ===funding===

LinkedBrainz: Difference between revisions

Revision as of 23:26, 28 July 2010

Contents

Mapping

Old RDF Mappings

Dereferenceable URIs

RDFa

pros

cons

Content Negotiation

pros

cons

SPARQL Endpoint

Replicate and swallow

Wrapping

Bureaucratic Details

employees

funding

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Search

sites

Tools