This project is able to download songs online, transform them into matrices of similarity and compute statistics on the resulting images.
The code present here was used in the article Similarity of structures in popular music.
The library requirements for this code can be found in requirements.txt
. To install them, run the following command line in the terminal:
pip install -r requirements.txt
Once this is done, or if the required libraries are already installed, run the following:
python main.py
This line will run the music pattern algorithm on the tablatures available in data/tablatures/
(it should take about 5 minutes).
The code of this project is able to execute several tasks:
- It can scroll through Ultimate Guitar and download the specified songs (when available). The code for this task can be found in
scroller.py
. - It can transform songs from GuitarPro format to pattern matrices and images. The code for this task uses
song.py
, which transforms a song into a pattern matrix, andpatterns.py
, which saves these pattern matrices and transforms them into images. - It can compute the distance between songs using their pattern matrices. The code for this task can be found in
measures.py
. - It can use the distance between songs to group them into either clusters or neighbourhoods. The code for this task can be found in
grouping.py
. - It can find relations between pattern structures and features. The code for this task can be found in
statistics.py
On top of these files, utils.py
contains a few useful functions and main.py
wraps all the functions in one file.
On top of main.py
which combines all algorithms together, each file ending with .py
(apart from utils.py
and song.py
) can be run independently. For example, if you are only interested in transforming your favorite songs into their corresponding images, you can place them in a new folder my_tablatures/
and run the following command line:
python patterns.py --tab_dir my_tablatures --im_dir my_images --colour green
This will transform the songs in my_tablatures/
into images and save these images in my_images/
. The parameter --colour
can be used to choose the colour of the images and the set of choices can be found in patterns.py
.
The code in scroller.py
is used to scroll through Ultimate Guitar and to download the corresponding files. This code requires you to set up a web driver for Chrome. To download the Chrome driver, go to this webpage and follow the instructions. Once you have downloaded the right Chrome driver, you can either put it in the main directory, or specify its path by using the argument --chromedriver
in scroller.py
.
Disclaimer: the code in scroller.py
is very dependent on the architecture of the website it scrolls through. It might not be up-to-date with the current organization of the website and might need to be slightly modified. If it does not work, you can always manually download the tablatures from the website and put them into the tablature folder.
This project contains a newly created dataset made of 4166 songs, available in dataset/
. These songs were downloaded using this dataset which contains the list of all songs that reached the Billboard hot chart. Each song is characterized by a GuitarPro file and a set of 6 features: artist, title, year, decade, genre, and types. This dataset is useful to study music properties of a large set of popular songs.
The file dataset/run.py
contains all the parameters used for the experiments in the article. To reproduce these experiments, simply run:
python dataset/run.py
Careful, it will take about 5 hours for the whole algorithm to be done (1 hour for the images and 4 hours for the distance matrix).
If not interested in running the whole algorithm, the distance matrix and the images are already precomputed in precomputed/
. Since the distance file was too big to be included, it was split into 9 files that you can find in precomputed/dists/
. To combine them back together, simply run
python precomputed/process_dists.py
When this is done, you can experiment with statistical properties of this dataset using the file precomputed/playing.py
, by running:
python precomputed/playing.py
This file also accepts extra arguments and I invite the reader to experiment with the different clustering algorithms to appreciate the properties of this dataset, and the relations between features and structures.
This project produces two main types of results: image representation of songs, and statistics on a set of songs.
This project transforms songs into corresponding pattern similarity matrices. These matrices offer interesting representation of songs, where the structure of the song can usually be read from the image. Typical images look like the following.
By using these images and comparing them between each other, it becomes possible to define a distance on songs based on pattern similarity. By combining this distance with information on the songs (such as artist, year, or genre), it is possible to obtain figures such as the following.
All figures created by this algorithm look like the one above: the horizontal axis corresponds to groups of songs (here the neighbourhoods of songs) and the vertical axis corresponds to some metric on these groups (here the year of release). The blue dots and blue bars represent the distribution of the main subject of interest (here the year of release of the songs). The red bars usually give extra information on these groups (here the average distance between songs). Finally, the yellow stars represent a special element of this group of songs (here the center of the neighbourhood). More information on these figures and the different types of results can be found in the article.
If you have any questions regarding the code, feel free to contact me at [email protected].
If you found this code useful or used it for your own study, please cite the following paper:
@article{corsini2021similarity,
title={Similarity of structures in popular music},
author={Corsini, Beno{\^\i}t},
journal={preprint},
year={2021},
}