Skip to content

Commit

Permalink
initial commit
Browse files Browse the repository at this point in the history
  • Loading branch information
Erwei Wang authored and Erwei Wang committed Sep 21, 2019
0 parents commit f3c1396
Show file tree
Hide file tree
Showing 132 changed files with 76,722 additions and 0 deletions.
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
*.pyc
*.pkl
25 changes: 25 additions & 0 deletions LICENSE
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
BSD 2-Clause License

Copyright (c) 2019, Erwei Wang
All rights reserved.

Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met:

1. Redistributions of source code must retain the above copyright notice, this
list of conditions and the following disclaimer.

2. Redistributions in binary form must reproduce the above copyright notice,
this list of conditions and the following disclaimer in the documentation
and/or other materials provided with the distribution.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
71 changes: 71 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,71 @@
# LUTNet: Learning FPGA Configurations for Efficient Neural Network Inference

## Repo organization

The repo contains two versions of LUTNet.

* __Fully unrolled LUTNet__: Operators in a convolution layer are mapped to FPGA with one-to-one LUT binding. No BRAM is consumed in dot product as weights are all hardened in LUT binary masks. Details can be found in our paper _LUTNet: Rethinking Inference in FPGA Soft Logic_.
* __Tiled LUTNet__: Operators are tiled and reused. Trades off area efficiency with resource savings. BRAMs are consumed in dot product. Details can be found in our paper _LUTNet: Learning FPGA Configurations forHighly Efficient Neural Network Inference_.

## Prerequisites

For training LUTNet, you should have the following packages installed:
* Keras (v2)
* TensorFlow

For hardware synthesis, we developed and tested the project with Vivado (+ HLS) 2016.3.
Newer versions of Vivado HLS do not work with our project.
In newer versions of HLS the loop unrolling factor is limited to a small number which limits the area-efficiency advantage of LUTNet.

## Citation
If you find LUTNet useful, please cite our [FCCM'19 conference paper](https://arxiv.org/abs/1904.00938).

@inproceedings{lutnet,
author={Wang, Erwei and Davis, James J and Cheung, Peter YK and Constantinides, George A},
title={LUTNet: Rethinking Inference in FPGA Soft Logic},
booktitle={2019 IEEE 27th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)},
pages={26-34},
doi={10.1109/FCCM.2019.00014},
month={April},
year = {2019}
}

## References

### 1. ReBNet

If you find ReBNet useful, please cite the <a href="https://arxiv.org/abs/1711.01243" target="_blank">ReBNet paper</a>:

@inproceedings{finn,
author = {Mohammad Ghasemzadeh, Mohammad Samragh, Farinaz Koushanfar},
title = {ReBNet: Residual Binarized Neural Network},
booktitle = {Proceedings of the 26th IEEE International Symposium on Field-Programmable Custom Computing Machines},
series = {FCCM '18},
year = {2018}
}

### 2. PYNQ-Classification

If you make use of this code, please acknowledge us by citing [our article](https://spiral.imperial.ac.uk/handle/10044/1/57937):

@inproceedings{Wang_FCCM18,
author={E. Wang and J. J. Davis and P. Y. K. Cheung},
booktitle={IEEE Symposium on Field-programmable Custom Computing Machines (FCCM)},
title={{A PYNQ-based Framework for Rapid CNN Prototyping}},
year={2018}
}

### 3. FINN

If you find BNN-PYNQ useful, please cite the <a href="https://arxiv.org/abs/1612.07119" target="_blank">FINN paper</a>:

@inproceedings{finn,
author = {Umuroglu, Yaman and Fraser, Nicholas J. and Gambardella, Giulio and Blott, Michaela and Leong, Philip and Jahre, Magnus and Vissers, Kees},
title = {FINN: A Framework for Fast, Scalable Binarized Neural Network Inference},
booktitle = {Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays},
series = {FPGA '17},
year = {2017},
pages = {65--74},
publisher = {ACM}
}

1 change: 1 addition & 0 deletions tiled-lutnet/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
codegen_output/*
52 changes: 52 additions & 0 deletions tiled-lutnet/ADVANCED.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
# Advices on Modifying the Project

Generally, if you make architectural modifications to training software, please remember to change the HLS parameters accordingly.

## For Each New HLS Run

For reasons unknown, in each HLS run, the HLS tool will randomise the order of CONV/FC layers, meaning that LUTARRAY_1.v is almost never going to be the first layer.
Therefore, after the HLS run, we need to always look at the synthesised varilog and keep track of the order of layers.
This is done by looking at the layer id inside the generated source codes `tiled-lutnet/lutnet/src/network/LUTNET_c6/sol/syn/verilog/LUTARRAY*.v`.
For each file, look for a line that looks something like this.
```
parameter ap_const_lv36_1 = 36'b1;
```
In this example, it means this source file LUTARRAY*.v corresponds to the first LUTNet layer.
After you record down the order of the layers, update them at (around) line 624 in `tiled-lutnet/lutnet/h5py-2-hls/${dataset}/h52header_51lut_tm_mnist_spase.py` before proceeding to generate the LUT array verilog files.

## Change Tiling Factors

The following source files should be changed accordingly.
```
tiled-lutnet/training-software/model_architectures.py
tiled-lutnet/training-software/MNIST-CIFAR-SVHN/Binary.py
tiled-lutnet/training-software/MNIST-CIFAR-SVHN/models/${dataset}/scripts/bnn_pruning.py
tiled-lutnet/lutnet/h5py-2-hls/${dataset}/h52header_51lut_tm_mnist_spase.py
tiled-lutnet/lutnet/src/network/MNIST/hw/config.h
```

IMPORTANT: Tiling factor of "1" (i.e. full unrolling) doesn't work with this project.
HLS will generate a completely sequential implementation instead (a complete opposite to full unrolling).
This is due to the coding style of HLS C source codes in this project.
For fully unrolled LUTNet/ReBNet layers, please go to `LUTNet/unrolled-lutnet`.
Mix and match of tiled and fully unrolled LUTNet layers is possible.

## Reproduce ReBNet Results

This repository also includes the training and implementation source codes of ReBNet.
Implementing ReBNet takes similar steps as LUTNet, except LUT array replacement is not needed -- all BNN weights are stored in the generated C header file `weight.h`.

## Change Microarchitecture

The default microarchitecture is (5,1)-LUTNet.
The source codes for other microarchitectures are included in this repository.

## LUTNet-ReBNet Hybrids

Mixing and matching LUTNet and ReBNet layers are supported.
The following source codes should be modified.
```
tiled-lutnet/training-software/model_architectures.py (LUT=LUT for LUTNet and LUT=False for ReBNet)
tiled-lutnet/lutnet/h5py-2-hls/MNIST/h52header_51lut_tm_mnist_spase.py
tiled-lutnet/lutnet/src/network/MNIST/hw/top.cpp
```
82 changes: 82 additions & 0 deletions tiled-lutnet/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,82 @@
# LUTNet: Learning FPGA Configurations for Efficient Neural Network Inference

## Training LUTNet

The training software of LUTNet uses a train-prune-retrain workflow.
Here, we use the LFC model with MNIST dataset as an example.

### Step 1: Train BNN (ReBNet) From Scratch

```
cd training-software/MNIST-CIFAR-SVHN/
bash bnn_regularised_training.sh
bash dummy_generation.sh
```

Select the dataset (MNIST/CIFAR-10/SVHN) when prompted.

### Step 2: BNN Fine-grained Pruning + Logic Expansion (Retraining with LUTNet Architecture)

Open `models/${dataset}/scripts/bnn_pruning.py` and edit each layer's pruning threshold. Below is an example with LFC model classifying MNIST (-1 means no pruning), where a higher threshold corresponds to more aggressive pruning.

```
p_d1=-1
p_d2=0.78
p_d3=0.78
p_d4=0.78
p_d5=-1
```
Then, execute the LUTNet retraining script.

```
bash lutnet_training_script.sh
```

Select the test id (an identifier that distinguishes among multiple test outputs) and the dataset when prompted. After training finishes, the pretrained network and accuracy results for the intermediate BNN and final LUTNet can be found in `models/${dataset}/pruned_bnn` and `models/${dataset}/pruned_lutnet`, respectively.

## Mapping a Trained LUTNet on an FPGA

The pretrained LUTNet (in .h5 format) is converted into RTL (verilog) and then synthesised into an FPGA bitstream.

### Step 1: Convert a Pretrained LUTNet into C Headers and Verilog Source Codes

```
cd lutnet/h5py-2-hls/MNIST
bash 51lutnet_mnist_generate_lutarray.sh
```
Enter the test id of the pretrained network (that you'd like to implement) when prompted.
The script generates two sets of source codes: LUT arrays in verilog format and other parameters (bn thresholds, scaling factors etc.) in C header format.
The file copy of LUT array verilog files may fail because the destination folder `/lutnet/src/network/LUTNET_c6/` does not exist yet.
This is normal as we will run this script again after HLS, and the folder will then exist.

### Step 2: HLS

```
cd ../../lutnet/src/network/
bash lutnet_synthesis_script_part1.sh
```
Wait for HLS to finish.
For CNV it could take up to a day.
The HLS output directory is `LUTNET_c6/`.
In side, the LUT arrays are synthesised as place holder modules which contains no meaningful logic.
We now replace those place holders with LUT array verilogs that we have generated in Step 1.

IMPORTANT: After HLS finishes, open `LUTNET_c6/sol/syn/verilog/DoCompute.v` and scroll all the way down to around line 530 (may vary randomly) where the FIFO modules are instantiated.
For reasons unknown to me (I think it is a bug with HLS) only some of these modules were created (again, randomly) whereas others do not exist.
My workaround is to check which modules are created and use them to replace the missing ones -- they are functionally identical.

Then, go back to `lutnet/h5py-2-hls/MNIST` and `bash 51lutnet_mnist_generate_lutarray.sh` again.
This time the file copies should be successful, and the LUT array place holders are replaced.

### Step 3: Vivado Synthesis.

```
bash lutnet_synthesis_script_part1.sh
```

The final step, bitstream generation, will fail as the pin assignment is not complete.
But you can still obtain the post-placement utilisation and power consumption reports under `src/network/vivado_output`.

## Custom Models

For advices on how to make changes to the models please see [ADVANCED.md](ADVANCED.md).
1 change: 1 addition & 0 deletions tiled-lutnet/lutnet/h5py-2-hls/CIFAR_10/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
*.h5
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
#!/bin/bash

echo -e "Please enter the test id of pretrained lutnet that you'd like to retrieve: "
read id

cp ../../../training-software/MNIST-CIFAR-SVHN/models/CIFAR-10/pruned_lutnet/pruned_lutnet_${id}_BIN.h5 pretrained_network_51lut_tm.h5
python h52header_51lut_tm_spase.py
mkdir -p ../codegen_output
cp ../codegen_output/LUTARRAY.v ../codegen_output/LUTARRAY_1.v ../codegen_output/LUTARRAY_2.v ../codegen_output/LUTARRAY_3.v ../codegen_output/LUTARRAY_4.v ../codegen_output/LUTARRAY_5.v ../codegen_output/LUTARRAY_6.v ../codegen_output/LUTARRAY_7.v ../../src/network/LUTNET_c6/sol1/syn/verilog/
cp ../codegen_output/weights.h ../../src/network/CIFAR10/hw/weights.h

Loading

0 comments on commit f3c1396

Please sign in to comment.