This project implements a pitch recognizer, usable both for voice and instruments, and it is realized with a perceptron-based neural network.
The main service the application is goig to deliver concerns:
- building and training the newtork using different audio samples (synthetized and/or wav)
- capture audio from an input device and submit to neural network
- output results on terminal
The project is deployed under a Linux operating system and exploits ALSA for audio devices control. For wav reading, we used libsndfile. Neural network is self-implemented.
The audio sampling rate we refer by now is 44100 Hz and this is relevant since determines the number of frames effectively read in the period the capturing thread is called (50 ms). Though, both these values can be changed where they are defined, before compilation.
The application provides an additional implementation that compares Fast Fourier Trasforms of the audio sample with a reference list in order to provide the correct answer.
autil
: utility mini-library for easily manage the ALSA interfacesmutils
: additional geormetric functionspnet
: library with all functions needed to build a multi-layer perceptron network, to train it and save its weightspnetlib
: contains macro-function about: the building of a training set of elements calledexample
s, formed by an array of real numbers for samples and another for the label (this generalization allows the user to create any kind of input, as long as he builds up the network accordingly); the training algorithm (we use Stochastic Gradient Descent on minibatch); error evaluation and printing; set shuffling and normalization; plus other utility functions needed during trainingptask_time
: library for time thread utilitieswav
: library for reading a wav file by chunks or entirely, testing also playback if needed
ftrain
and fpitcher
are programs which train the network and use it for predicting real-time audio respectively. Each audio is submitted as the fft of the normalized signal. We usually build a 3 layer perceptron network with 12 output neurons (12 pitches to recognize) and save obtained weights. The number of neurons of the hidden layer can be decided at run time.
Variables to be set at run time:
- the number of input and hidden neurons
- common bias (trainable)
- max number of epochs
- batch size
- learning rate
- momentum
In both pitch-recognizers, the network saved after a training is used by a thread to recognize chunks of data read by an ALSA capturer thread.
The additional application is presented in fcross.c
, in which a reference set of fft samples obtained by synthetic samples (normalized) is compared with the fft of the normalized chunk read through ALSA. The minimum average squared error indicates the nearest pitch recognized.
Some python3 programs are dedicated to plotting results and signals, in order to give a graphic reply to the user. They are:
plot_err.py
: minimal program that plots the global error (on epoch) and the local error (batch) saved after a training. Files are usually addressed atlogs/<executable_name>_glberfile.txt
andlogs/<executable_name>locerfile.txt
. This utility is used both in the main folder and for test programsplot_fft.py
: is connected specifically tohello_fft
and it is used to plot spectrum of a test signalplot_ftrain.py
: does the same thing but it is designed for a check of the prepared training set used byftrain
plot_fcross.py
: plots sine and correspondent fft, checkingfcross
's autogenerated signalsplot_capturer.py
: with debugging mode active, it does the same thing ofplot_fcross.py
The folder test
contains a bunch of tests that, using special network and input configuration, are meant to check performances and find errors. We start testing if the network learns simple problems, like AND, OR, XOR and character identification. All programs with hello_
as prefix indicate test programs: descriptions can be found in the files. The program pitcher3
is a test too, but kept in the main directory: generally it is useful for tuning microphone boosting of your pc (try alsamixer
).
The folder wav_32f
containes .wav
files that record some pitches of digital piano. They can be used to extract samples for training. You can add more samples downloading them from the internet too: it helps making the traning stronger. NOTE: wavs are provided 32bit float
sampled at 44100. Other formats are not supported.
The folder logs
is meant to contain relevant files about the last training that has been launched (<executable_name>.txt
) and saved weights and error values, which can be plot using python utilities. All log files are generated automatically and overwritten by programs. Use plot_err.py
with hello_<test_name>_[glb/loc]err.txt
to plot error during training under certain conditions.
If you want to create a new network, train it and use it into pitcher, do:
$ make clean
You can compile programs independently, else do $ make all
(prepares ftrain
/fpitcher
) or $ make universe
(prepare everything). There are also two possible compilation flags to apply if we want a verbose debugging or not.
NOTE: the majority of log files, especially log error ones, are written in the append mode. Remember to remove/rename the last ones before launching your programs, otherwise you'll corrupt your results. Apply $ make ct
to clear logs
: use with care.
The program ftrain
trains a perceptron based network whose tunable parameters are the following:
- number of neurons in hidden layer
- initial common bias for all neurons
- objective function
- batch size
- max number of epoch to reach while training
- learning rate and momentum
- min error to reach for early stopping
In this file we are going to list the values that leaded to best results, alongside with the failure rate gained over the training set itself used as a test set for now. In a further development, we are going to use also a test set.
The training set is created synthetizing sinusoidal signals with known frequency and volume: a stronger training set should include samples with some noise from the base frequencies too. In order to create the test set, we should try to mix some good examples and some noisy ones.