This repo contains the implementation of an encoder decoder for MNIST images for learned data compression. At the end, I also explored the possibility of generative models using the trained decoder
This document provides an overview of the implementation and outcomes of a data compression project using the MNIST dataset. It covers the details of the encoder and decoder models, the handling of the dataset, and the outcomes regarding the balance between bit rate and distortion. I also explored the possibility of generative models once the decoder was trained
- TensorFlow Compression (TFC)
- Python
The dataset I used was MNIST handwritten digits from 0 to 9.
Before training, when passing the image in the latent space and restoring it with decoder, the result I obtained was :
As we can see, the figure is just noise as the encoder/decoder has not been trained yet
The chosen latent space I chose to represent my images is a 50-dimensional space which probability distribution follows Gaussian distributions.
Purpose: Incorporates noise into the data representation to enable robust quantization and to minimize quantization errors. This kind of quantization is called dithered quantization The probability distribution followed by latent representations are represented in the following figure :
We thus minimize the Kullback-Leibler divergence to fit the out-of-encoder probability distribution to the one selected (Gaussian + uniform noise) so that the quantizer can be chosen intelligently to reduce distorsion and increase the rate.
After training here are the reconstructed images (out of the decoder)
- Convolutional Layers: Utilizes 2D convolutions with Leaky ReLU activation, configured to reduce dimensions while retaining critical features.
- Dense Layers: Transforms the convolutional output into a flat, dense format for further processing.
- Reconstruction Layers: Employs dense layers and 2D transposed convolutions to reconstruct the original data dimensions from the encoded format.
- Activation: Leaky ReLU is used throughout to maintain non-linearity.
The training has been done on 15 epochs for a 50-dimensional latent space on the entire training set (60000 samples)
Bit Rate and Distortion: Metrics used to evaluate the precision and effectiveness of the compression model.
In the loss function, lmbda
is the parameter which tells the model how much distorsion needs to be prioretized over bits rate. I studied the effect of lambda on both bits rates and distorsion, results are presented in the following figure :
As we could imagine increasing distorsion allow us to reach better rates. The same way lower is the distorsion, better can the rate be.
Once the decoder has been trained, from a random latent vector we can generate an artificial handwritten digits with the decoder : it is a classic architecture for generative AI. We now have a model capable to generate handwritten digits
Of course generated images are not perfect, but we could have improve the generation by not taking a completely random vector but vectors from canonical basis
Further detail on this work can be found in the (french written) report
pdf of the git.