Development/Summer of Code/2023/BookBrainz

From MusicBrainz Wiki
Jump to navigationJump to search

This page will discuss the current list of suggested ideas for students to develop proposals for Google's Summer of Code for BookBrainz. If you're a student, feel free to base your proposal on one of these ideas, or pick and entirely new idea that you think might be useful to us.

Getting Started

(see also: Getting started with GSoC)

The first thing to do to get started with BookBrainz is to get familiar with the website and start editing (the help page and user guide are good starting points).

We also have a testing website at with its own separate database. You can create an account there and use it to get familiar with the website and database without fear of adding inadequate data.

The next step is to clone the bookbrainz-site GitHub repository, and follow our developer documentation to get the site up and running on your computer.

When you feel ready to try your hand on some bugs, we have “good first bug” category on our ticket tracker:

Come and speak to us in the BookBrainz IRC (Libera.Chat/#bookbrainz) if you finish all of that, or get stuck at any point!


Here are some suggestions of projects that would improve BookBrainz. Of course, suggestion means you're not limited to these ideas and can propose your own if after using BookBrainz you find a substantial area you can improve.

Import other open databases

Proposed Mentors: Monkey
Languages/skills: SQL, Node.js, knowledge of BookBrainz schema
Estimated Project Length: 350 hours
Difficulty: hard
Expected outcomes: A back-end system to parse and import large scale databases, importing one corpus

Forum for discussion

We need a way to import large collections of library records such as MARC records into the database.

Currently there are existing entities in the schema (author_import, work_import, etc.) set up for that purpose, where they will await user confirmation before being fully added (or merged) as a proper entity in the database.

This will require processing very large MARC records or JSON files in a robust manner, creating "adapters" to transform entities from one database schema to the BookBrainz schema, and allowing for repeating the process without duplicating entries.

You will put together a detailed plan of action ahead of time for how to achieve these goals.

As a second part of this project, you will supervise importing a large corpus. The two main candidates are the Library of Congress MARC records on one side, identified as a large and clean collection of book metadata, and on the other side the Bookogs database.

About Bookogs:

The sites Bookogs and Comicogs, sister projects of Discogs, have been closed in 2020; some editors elected BookBrainz to continue contributing open data.

The Bookogs database dumps were made publicly available for download in json format right after the closing of the project.

In order to prevent the loss of Bookogs contributions we want to import all the entries from the database dumps, as discussed in this thread.

Discussions are in progress for matching roles, formats and genres to BookBrainz' schema.

Administration system

Proposed Mentors: Monkey
Languages/skills: Node.js, SQL, ExpressJS
Estimated Project Length: 175 hours
Difficulty: easy
Expected outcomes: A usable administration system with arbitrary levels of privileges

Forum for discussion

BookBrainz currently has no administration system, or any good way of giving users special privileges. This certainly needs to change!

For this project, you will be devising and implementing a basic admin system allowing for a flexible privilege hierarchy.

This will require at minimum:

  • Modifying the database schema, adding at least:
    • a table to define roles
    • a table to attach users to roles
  • Implementing a simple admin panel webpage to allow admins to search for users, give users privileges and take other actions
  • Middleware for securing specific routes according to a user's roles:
    • admins can view the admin panel
    • admins can block or deleted abusive users
    • privileged editors can edit relationships and identifiers
    • privileged editors can trigger a reindex of the search server

Extended goals:

  • a web interface to allow privileged users to edit and add relationship types , identifier types and other types that currently require direct database access
  • a public log of administration actions (see for example the CritiqueBrainz admin log)