Development/Summer of Code/2020/ListenBrainz: Difference between revisions
RobertKaye (talk | contribs) (→Ideas) |
RobertKaye (talk | contribs) No edit summary |
||
Line 6: | Line 6: | ||
If you want to work on ListenBrainz you should show that you are able to set up the server software and understand how some of the infrastructure works. Here are some things that we might ask you about |
If you want to work on ListenBrainz you should show that you are able to set up the server software and understand how some of the infrastructure works. Here are some things that we might ask you about |
||
* Show that you understand the goals that ListenBrainz wants to achieve, which are written on its homepage |
* Show that you understand the goals that ListenBrainz wants to achieve, which are written on its homepage |
||
* Install the server on your computer or use the Vagrant setup scripts to build a virtual machine |
|||
* Create an oauth application on the MusicBrainz website and add the configuration information to your ListenBrainz server. Use this to log in to your server with your MusicBrainz details |
* Create an oauth application on the MusicBrainz website and add the configuration information to your ListenBrainz server. Use this to log in to your server with your MusicBrainz details |
||
* Use the import script that is part of the ListenBrainz server to load scrobbles from last.fm to your ListenBrainz server, or the main ListenBrainz server |
* Use the import script that is part of the ListenBrainz server to load scrobbles from last.fm to your ListenBrainz server, or the main ListenBrainz server |
||
Line 16: | Line 15: | ||
==Ideas== |
==Ideas== |
||
=== |
=== Add more statistics and graphs for users and our community === |
||
Proposed mentors:''mayhem'', ''alastairp''<br> |
Proposed mentors:''mayhem'', ''alastairp'', ''iliekcomputers''<br> |
||
Languages/skills: javascript, D3, data science, graphing, visualization |
Languages/skills: python, javascript, D3, apache spark, data science, graphing, visualization |
||
ListenBrainz now has a statistics infrastructure that collects and computes statistics from the listen data that |
ListenBrainz now has a statistics infrastructure that collects and computes statistics from the listen data that we have stored in our database (and in an Apache Spark cluster). So far we've only implemented a top artists per user query that shows that our statistics infrastructure is working. However, we're interested in adding a lot more statistics/graphs to this setup: |
||
* top album for a user |
|||
=== Design more statistics queries for user/community behaviour === |
|||
* top album for a user by genre |
|||
* top tracks for a user or everyone |
|||
* users with similar music tastes to mine |
|||
* when did I start listening to this artist/album? |
|||
There are many more interesting charts/graphs/statistics that we wish to show, but haven't thought of yet. If you are interested in participating in this project, we will ask you to think about possible user stats and also to come up with other examples of statistics that we might be interested in capturing/producing. One part of this project will include writing queries in Apache Spark and python glue code to take the results and ship them from our Apache Cluster to our production servers. The other part of this project will include serving up these statistics from our servers using python and then to render the results with good looking charts created in javascript with the D3 toolkit. |
|||
Proposed mentors:''mayhem'', ''alastairp''<br> |
|||
Languages/skills: python, spark, visualization, data architecture |
|||
The ListenBrainz statistics infrastructure mentioned above currently only has minimal graphs (top artists) that are supported. We would like to be able to calculate comprehensive statistics about a user or the community's listening habits in general. We'd not only like to know top artists, but more relevant things such as top album of last week, or the top R&B album of the first half of 2019. Most active listener in our community. Best new artist of last year. Overall, we would like to collect and make interesting charts of statistics that we can calculate from the listening behaviour of our users. Overall we would like to be able to create end of the year reports that give an LB users a similar end of the year report that Spotify creates for its own users. |
Revision as of 15:51, 4 February 2020
ListenBrainz is one of the newest MetaBrainz projects. Read more information on its homepage.
Getting started
(see also: Getting started with GSoC)
If you want to work on ListenBrainz you should show that you are able to set up the server software and understand how some of the infrastructure works. Here are some things that we might ask you about
- Show that you understand the goals that ListenBrainz wants to achieve, which are written on its homepage
- Create an oauth application on the MusicBrainz website and add the configuration information to your ListenBrainz server. Use this to log in to your server with your MusicBrainz details
- Use the import script that is part of the ListenBrainz server to load scrobbles from last.fm to your ListenBrainz server, or the main ListenBrainz server
- Use your preferred programming language to write a submission tool that can send Listen data to ListenBrainz. You could make up some fake data for song names and artists. This data doesn't have to be real.
- Try and delete the ListenBrainz database on your local server to remove the fake data that you added.
- Look at the list of tickets that we have open for ListenBrainz and see if you understand what tasks the tickets involve
- If you want to, see if you can contribute to fixing a ticket. Either add a comment to the ticket or ask in IRC for clarification if you don't understand what the ticket means
Ideas
Add more statistics and graphs for users and our community
Proposed mentors:mayhem, alastairp, iliekcomputers
Languages/skills: python, javascript, D3, apache spark, data science, graphing, visualization
ListenBrainz now has a statistics infrastructure that collects and computes statistics from the listen data that we have stored in our database (and in an Apache Spark cluster). So far we've only implemented a top artists per user query that shows that our statistics infrastructure is working. However, we're interested in adding a lot more statistics/graphs to this setup:
- top album for a user
- top album for a user by genre
- top tracks for a user or everyone
- users with similar music tastes to mine
- when did I start listening to this artist/album?
There are many more interesting charts/graphs/statistics that we wish to show, but haven't thought of yet. If you are interested in participating in this project, we will ask you to think about possible user stats and also to come up with other examples of statistics that we might be interested in capturing/producing. One part of this project will include writing queries in Apache Spark and python glue code to take the results and ship them from our Apache Cluster to our production servers. The other part of this project will include serving up these statistics from our servers using python and then to render the results with good looking charts created in javascript with the D3 toolkit.