Proposal:Distributed MusicBrainz

From MusicBrainz Wiki
Revision as of 14:30, 20 February 2008 by Dmppanda (talk) (Looks abandonned to me... (Imported from MoinMoin))
Status: This Page is Glorious History!

The content of this page either is bit-rotted, or has lost its reason to exist due to some new features having been implemented in MusicBrainz, or maybe just described something that never made it in (or made it in a different way), or possibly is meant to store information and memories about our Glorious Past. We still keep this page to honor the brave editors who, during the prehistoric times (prehistoric for you, newcomer!), struggled hard to build a better present and dreamed of an even better future. We also keep it for archival purposes because possibly it still contains crazy thoughts and ideas that may be reused someday. If you're not into looking at either the past or the future, you should just disregard entirely this page content and look for an up to date documentation page elsewhere.

The general idea of MusicBrainz is to distribute the whole database across several server where all are considered equal. Full distribution is hard to do, there a several solutions to do this.

Several issues still need to be resolved to make distribution happen. For example, if eveybody can run their own server, how can we trust updates of the distributed database. Will the voting system still work for small servers with small communities, or do we implement cross-server voting. A nice way to solve this issue is to vote periodically on a server that is in charge of new ArtistID, TrackID, or VoteIDs.

Options:

- Use Posegress exclusively

- Use several servers that mirror a single MASTER

- Use equals servers that replicate changes and a single MASTER that manages the artist/album/track IDs.

- Use web services to register listeners among other servers and coordinate propagation among servers (ScottMcClure)

- Any others?

Another fundamental choice is the group process.

- A single list with a known and trusted MB servers.

- A single list of known MB root servers that may propagate their changes to others in a hierarchy.

- total anarchy where each server logs the raputation of other MB servers

The MySQL solution: MySQL replication This MySQL feature assigns a single central server to coordinate the database updates. Other servers are connected as read-only slaves that sync regularly with the main server for updates. Does anybody have this setup working currently?

Johan


It might be worthwile to take a look at what CPAN has done for their replication. http://www.cpan.org/misc/cpan-faq.html#What_is_CPAN 

tom



Erlang's Mnesia database system is designed to be distributed (and fault tolerant).

Huge table could be split among several nodes. Important tables could be replicated among several nodes.

A transaction to a table replicated or shared among nodes is either successful executed on all nodes or rolled back.

The problem of data holiganism still exists. So one probably still needs either trust or a review system.

Marc