Skip to content

C implementation of a pitch recognizer using a perceptron-based neural network

Notifications You must be signed in to change notification settings

Fiordarancio/pitcher

Repository files navigation

PITCHER - A neural-PITCH-recognizER

This project implements a pitch recognizer, usable both for voice and instruments, and it is realized with a perceptron-based neural network.

Brief description

The main service the application is goig to deliver concerns:

  • building and training the newtork using different audio samples (synthetized and/or wav)
  • capture audio from an input device and submit to neural network
  • output results on terminal

The project is deployed under a Linux operating system and exploits ALSA for audio devices control. For wav reading, we used libsndfile. Neural network is self-implemented.

The audio sampling rate we refer by now is 44100 Hz and this is relevant since determines the number of frames effectively read in the period the capturing thread is called (50 ms). Though, both these values can be changed where they are defined, before compilation.

The application provides an additional implementation that compares Fast Fourier Trasforms of the audio sample with a reference list in order to provide the correct answer.

Libraries

  • autil : utility mini-library for easily manage the ALSA interfaces
  • mutils : additional geormetric functions
  • pnet : library with all functions needed to build a multi-layer perceptron network, to train it and save its weights
  • pnetlib : contains macro-function about: the building of a training set of elements called examples, formed by an array of real numbers for samples and another for the label (this generalization allows the user to create any kind of input, as long as he builds up the network accordingly); the training algorithm (we use Stochastic Gradient Descent on minibatch); error evaluation and printing; set shuffling and normalization; plus other utility functions needed during training
  • ptask_time : library for time thread utilities
  • wav : library for reading a wav file by chunks or entirely, testing also playback if needed

Main programs

ftrain and fpitcher are programs which train the network and use it for predicting real-time audio respectively. Each audio is submitted as the fft of the normalized signal. We usually build a 3 layer perceptron network with 12 output neurons (12 pitches to recognize) and save obtained weights. The number of neurons of the hidden layer can be decided at run time. Variables to be set at run time:

  • the number of input and hidden neurons
  • common bias (trainable)
  • max number of epochs
  • batch size
  • learning rate
  • momentum

In both pitch-recognizers, the network saved after a training is used by a thread to recognize chunks of data read by an ALSA capturer thread.

The additional application is presented in fcross.c, in which a reference set of fft samples obtained by synthetic samples (normalized) is compared with the fft of the normalized chunk read through ALSA. The minimum average squared error indicates the nearest pitch recognized.

Plotting utilities

Some python3 programs are dedicated to plotting results and signals, in order to give a graphic reply to the user. They are:

  • plot_err.py : minimal program that plots the global error (on epoch) and the local error (batch) saved after a training. Files are usually addressed at logs/<executable_name>_glberfile.txt and logs/<executable_name>locerfile.txt. This utility is used both in the main folder and for test programs
  • plot_fft.py : is connected specifically to hello_fft and it is used to plot spectrum of a test signal
  • plot_ftrain.py : does the same thing but it is designed for a check of the prepared training set used by ftrain
  • plot_fcross.py : plots sine and correspondent fft, checking fcross's autogenerated signals
  • plot_capturer.py: with debugging mode active, it does the same thing of plot_fcross.py

Test programs

The folder test contains a bunch of tests that, using special network and input configuration, are meant to check performances and find errors. We start testing if the network learns simple problems, like AND, OR, XOR and character identification. All programs with hello_ as prefix indicate test programs: descriptions can be found in the files. The program pitcher3 is a test too, but kept in the main directory: generally it is useful for tuning microphone boosting of your pc (try alsamixer).

Data folders

The folder wav_32f containes .wav files that record some pitches of digital piano. They can be used to extract samples for training. You can add more samples downloading them from the internet too: it helps making the traning stronger. NOTE: wavs are provided 32bit float sampled at 44100. Other formats are not supported.

The folder logs is meant to contain relevant files about the last training that has been launched (<executable_name>.txt) and saved weights and error values, which can be plot using python utilities. All log files are generated automatically and overwritten by programs. Use plot_err.py with hello_<test_name>_[glb/loc]err.txt to plot error during training under certain conditions.

Compile

If you want to create a new network, train it and use it into pitcher, do:

  • $ make clean

You can compile programs independently, else do $ make all (prepares ftrain/fpitcher) or $ make universe (prepare everything). There are also two possible compilation flags to apply if we want a verbose debugging or not.

NOTE: the majority of log files, especially log error ones, are written in the append mode. Remember to remove/rename the last ones before launching your programs, otherwise you'll corrupt your results. Apply $ make ct to clear logs: use with care.

A closer look to ftrain

The program ftrain trains a perceptron based network whose tunable parameters are the following:

  • number of neurons in hidden layer
  • initial common bias for all neurons
  • objective function
  • batch size
  • max number of epoch to reach while training
  • learning rate and momentum
  • min error to reach for early stopping

In this file we are going to list the values that leaded to best results, alongside with the failure rate gained over the training set itself used as a test set for now. In a further development, we are going to use also a test set.

Input data

The training set is created synthetizing sinusoidal signals with known frequency and volume: a stronger training set should include samples with some noise from the base frequencies too. In order to create the test set, we should try to mix some good examples and some noisy ones.

About

C implementation of a pitch recognizer using a perceptron-based neural network

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published