Proposal:Distributed MusicBrainz

From MusicBrainz Wiki
Status: This page describes a failed proposal. It is not official, and should only be used, if at all, as the basis for a new proposal.



Proposal number: RFC-Unassigned
Champion: None
Status: Failed, due to Officially closed as Abandoned, March 24, 2010
This proposal was not tracked in Trac.



The general idea of MusicBrainz is to distribute the whole database across several server where all are considered equal. Full distribution is hard to do, there a several solutions to do this.

Several issues still need to be resolved to make distribution happen. For example, if eveybody can run their own server, how can we trust updates of the distributed database. Will the voting system still work for small servers with small communities, or do we implement cross-server voting. A nice way to solve this issue is to vote periodically on a server that is in charge of new ArtistID, TrackID, or VoteIDs.

Options:

- Use Posegress exclusively

- Use several servers that mirror a single MASTER

- Use equals servers that replicate changes and a single MASTER that manages the artist/album/track IDs.

- Use web services to register listeners among other servers and coordinate propagation among servers (ScottMcClure)

- Any others?

Another fundamental choice is the group process.

- A single list with a known and trusted MB servers.

- A single list of known MB root servers that may propagate their changes to others in a hierarchy.

- total anarchy where each server logs the raputation of other MB servers

The MySQL solution: MySQL replication This MySQL feature assigns a single central server to coordinate the database updates. Other servers are connected as read-only slaves that sync regularly with the main server for updates. Does anybody have this setup working currently?

Johan


It might be worthwile to take a look at what CPAN has done for their replication. http://www.cpan.org/misc/cpan-faq.html#What_is_CPAN 

tom



Erlang's Mnesia database system is designed to be distributed (and fault tolerant).

Huge table could be split among several nodes. Important tables could be replicated among several nodes.

A transaction to a table replicated or shared among nodes is either successful executed on all nodes or rolled back.

The problem of data holiganism still exists. So one probably still needs either trust or a review system.

Marc