Development/Search Architecture: Difference between revisions
From MusicBrainz Wiki
Jump to navigationJump to search
(→Searching from MusicBrainz mirror without local Solr server: add dataflow-search-remote.png) |
(→Components: add dependencies-search.png) |
||
Line 67: | Line 67: | ||
== Components == |
== Components == |
||
[[File:dependencies-search.png]] |
|||
Services explained in the above section about data flows are composed of many components maintained across different repositories. |
Services explained in the above section about data flows are composed of many components maintained across different repositories. |
Revision as of 18:31, 18 January 2021
Data flow
Indexing
- When MusicBrainz updates its database,
- PostgreSQL triggers queue reindex messages;
- These are pulled from RabbitMQ by SIR,
- which then gathers data to be indexed from the database,
- and finally builds searchable documents and sends these to the Solr search server.
Searching
Search can be accessed either by website visitors, or by editors, or by users of MusicBrainz API clients such as MusicBrainz Picard:
- Search webpage (GET at https://musicbrainz.org/search):
- Search form (POST to
musicbrainz.org/search?query=…
):
- Search form (POST to
This form is usually accessed from the search field in website top navigation bar.
- Tag lookup form (POST to
musicbrainz.org/taglookup?…
):
- Tag lookup form (POST to
This form is available from the above Search webpage but makes more specific queries. Its existence is probably legacy from before advanced indexed search was available.
- Other lookup forms (POST to
musicbrainz.org/otherlookup/…?…
):
- Other lookup forms (POST to
Same comments as above about tag lookup form.
- Field completion (POST to
musicbrainz.org/ws/js/…?query=…
):
Fields that match MusicBrainz entities (for example, the area of an artist) have an autocompletion feature which is making search queries behind the scene.
- API search query (POST to
musicbrainz.org/ws/2/…?query=…
):
This kind of query is made by API clients such as MusicBrainz Picard. See “MusicBrainz_API/Search” for client developer documentation.
There are three search modes:
- Direct database search: This is the legacy method to search the database directly using PostgreSQL. It is currently kept as a fallback when indexed search is not working. It has limited capabilities (no searchable field, name search only, etc.).
- Indexed search: This is the simplest plain search mode, using Solr. It searches through both accented and unaccented names, aliases, and more. See
request-params.xml
files in mbsssss repository.
- Advanced indexed search: This is the most versatile search mode, using Solr. It allows to search through specific fields using the Lucene query syntax. See
schema.xml
files in mbsssss repository. See also “Indexed_Search_Syntax” for user documentation.
These modes are used or are made available as follows:
Search access | Direct database search | (Simple) Indexed search | Advanced indexed search |
---|---|---|---|
Search webpage | Yes | Yes (default) | Yes |
Tag lookup webpage | Yes (limited) | ||
Other lookup webpage | Yes (limited) | ||
Field autocompletion | Yes | Yes (default) | No |
API search query | No | Yes | Yes (default) |
Searching from MusicBrainz mirror without local Solr server
MusicBrainz mirrors can be set up with or without local Solr server, for a matter of resources consumption.
When they run their own local Solr server, search works as described in the above section.
When they do not run their own local Solr server, they rather rely on remote search.musicbrainz.org
.
Components
Services explained in the above section about data flows are composed of many components maintained across different repositories.
Here is a complete list of components with their repositories used to make indexed search to work:
- MusicBrainz database schema: See
admin/sql/
directory in musicbrainz-server repository.
It defines how data is stored in PostgreSQL by MusicBrainz Server.- Python bindings for the above MB DB schema: See SQLAlchemy Models in lalinsky/mbdata repository.
- MusicBrainz XML metadata schema: See
schema/
directory in mmd-schema repository.
It defines data returned by MusicBrainz API which is handled by MusicBrainz Server for lookup and browse queries, and by Solr search server for search queries.- Java bindings for the above MB RELAX NG schema: See
brainz-mmd2-jaxb/
directory in the same mmd-schema repository. - Python bindings for the above MB RELAX NG schema: See Python package in distinct mb-rngpy repository.
- Java bindings for the above MB RELAX NG schema: See
- MusicBrainz Solr search schema: See cores defined in mbsssss repository.
It mainly defines how searchable documents are structured and searched, that is mostly everything about searchable fields. - MusicBrainz Solr query response writer: See
mb-solr
directory in mb-solr repository.
It defines how search results are formatted, using the Java bindings for the above MB RELAX NG schema. - MusicBrainz Solr standalone server: See
Dockerfile
file in the same mb-solr repository. - MusicBrainz Solr cloud deployment: See deployment scripts in private mb-solr-cloud repository.
- Search index rebuilder (SIR): See sir repository and sir documentation.
It uses both Python bindings above: the one for the above MB DB schema and the other for the above MB RELAX NG schema.
It also usespysolr
to communicate with Solr server and must comply with MusicBrainz Solr search schema.