initial commit

awai54st · Sep 21, 2019 · f3c1396 · f3c1396
commit f3c1396
Show file tree

Hide file tree

Showing 132 changed files with 76,722 additions and 0 deletions.
diff --git a/.gitignore b/.gitignore
@@ -0,0 +1,2 @@
+*.pyc
+*.pkl
diff --git a/LICENSE b/LICENSE
@@ -0,0 +1,25 @@
+BSD 2-Clause License
+
+Copyright (c) 2019, Erwei Wang
+All rights reserved.
+
+Redistribution and use in source and binary forms, with or without
+modification, are permitted provided that the following conditions are met:
+
+1. Redistributions of source code must retain the above copyright notice, this
+   list of conditions and the following disclaimer.
+
+2. Redistributions in binary form must reproduce the above copyright notice,
+   this list of conditions and the following disclaimer in the documentation
+   and/or other materials provided with the distribution.
+
+THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
+DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
+FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
+SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
+OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
diff --git a/README.md b/README.md
@@ -0,0 +1,71 @@
+# LUTNet: Learning FPGA Configurations for Efficient Neural Network Inference
+
+## Repo organization
+
+The repo contains two versions of LUTNet.
+
+* __Fully unrolled LUTNet__: Operators in a convolution layer are mapped to FPGA with one-to-one LUT binding. No BRAM is consumed in dot product as weights are all hardened in LUT binary masks. Details can be found in our paper _LUTNet: Rethinking Inference in FPGA Soft Logic_.
+* __Tiled LUTNet__: Operators are tiled and reused. Trades off area efficiency with resource savings. BRAMs are consumed in dot product. Details can be found in our paper _LUTNet: Learning FPGA Configurations forHighly Efficient Neural Network Inference_.
+
+## Prerequisites
+
+For training LUTNet, you should have the following packages installed:
+* Keras (v2)
+* TensorFlow
+
+For hardware synthesis, we developed and tested the project with Vivado (+ HLS) 2016.3. 
+Newer versions of Vivado HLS do not work with our project. 
+In newer versions of HLS the loop unrolling factor is limited to a small number which limits the area-efficiency advantage of LUTNet.
+
+## Citation
+If you find LUTNet useful, please cite our [FCCM'19 conference paper](https://arxiv.org/abs/1904.00938).
+
+    @inproceedings{lutnet,
+    author={Wang, Erwei and Davis, James J and Cheung, Peter YK and Constantinides, George A},
+    title={LUTNet: Rethinking Inference in FPGA Soft Logic},
+    booktitle={2019 IEEE 27th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)},
+    pages={26-34},
+    doi={10.1109/FCCM.2019.00014},
+    month={April},
+    year = {2019}
+    }
+
+## References
+
+### 1. ReBNet
+
+If you find ReBNet useful, please cite the <a href="https://arxiv.org/abs/1711.01243" target="_blank">ReBNet paper</a>:
+
+    @inproceedings{finn,
+    author = {Mohammad Ghasemzadeh, Mohammad Samragh, Farinaz Koushanfar},
+    title = {ReBNet: Residual Binarized Neural Network},
+    booktitle = {Proceedings of the 26th IEEE International Symposium on Field-Programmable Custom Computing Machines},
+    series = {FCCM '18},
+    year = {2018}
+    }
+
+### 2. PYNQ-Classification
+
+If you make use of this code, please acknowledge us by citing [our article](https://spiral.imperial.ac.uk/handle/10044/1/57937):
+
+    @inproceedings{Wang_FCCM18,
+    author={E. Wang and J. J. Davis and P. Y. K. Cheung},
+    booktitle={IEEE Symposium on Field-programmable Custom Computing Machines (FCCM)},
+    title={{A PYNQ-based Framework for Rapid CNN Prototyping}},
+    year={2018}
+    }
+
+### 3. FINN
+
+If you find BNN-PYNQ useful, please cite the <a href="https://arxiv.org/abs/1612.07119" target="_blank">FINN paper</a>:
+
+    @inproceedings{finn,
+    author = {Umuroglu, Yaman and Fraser, Nicholas J. and Gambardella, Giulio and Blott, Michaela and Leong, Philip and Jahre, Magnus and Vissers, Kees},
+    title = {FINN: A Framework for Fast, Scalable Binarized Neural Network Inference},
+    booktitle = {Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays},
+    series = {FPGA '17},
+    year = {2017},
+    pages = {65--74},
+    publisher = {ACM}
+    }
+
diff --git a/tiled-lutnet/.gitignore b/tiled-lutnet/.gitignore
@@ -0,0 +1 @@
+codegen_output/*
diff --git a/tiled-lutnet/ADVANCED.md b/tiled-lutnet/ADVANCED.md
@@ -0,0 +1,52 @@
+# Advices on Modifying the Project
+
+Generally, if you make architectural modifications to training software, please remember to change the HLS parameters accordingly.
+
+## For Each New HLS Run
+
+For reasons unknown, in each HLS run, the HLS tool will randomise the order of CONV/FC layers, meaning that LUTARRAY_1.v is almost never going to be the first layer.
+Therefore, after the HLS run, we need to always look at the synthesised varilog and keep track of the order of layers.
+This is done by looking at the layer id inside the generated source codes `tiled-lutnet/lutnet/src/network/LUTNET_c6/sol/syn/verilog/LUTARRAY*.v`.
+For each file, look for a line that looks something like this.
+```
+parameter    ap_const_lv36_1 = 36'b1;
+```
+In this example, it means this source file LUTARRAY*.v corresponds to the first LUTNet layer.
+After you record down the order of the layers, update them at (around) line 624 in `tiled-lutnet/lutnet/h5py-2-hls/${dataset}/h52header_51lut_tm_mnist_spase.py` before proceeding to generate the LUT array verilog files.
+
+## Change Tiling Factors
+
+The following source files should be changed accordingly.
+```
+tiled-lutnet/training-software/model_architectures.py
+tiled-lutnet/training-software/MNIST-CIFAR-SVHN/Binary.py
+tiled-lutnet/training-software/MNIST-CIFAR-SVHN/models/${dataset}/scripts/bnn_pruning.py
+tiled-lutnet/lutnet/h5py-2-hls/${dataset}/h52header_51lut_tm_mnist_spase.py
+tiled-lutnet/lutnet/src/network/MNIST/hw/config.h
+```
+
+IMPORTANT: Tiling factor of "1" (i.e. full unrolling) doesn't work with this project.
+HLS will generate a completely sequential implementation instead (a complete opposite to full unrolling).
+This is due to the coding style of HLS C source codes in this project.
+For fully unrolled LUTNet/ReBNet layers, please go to `LUTNet/unrolled-lutnet`.
+Mix and match of tiled and fully unrolled LUTNet layers is possible.
+
+## Reproduce ReBNet Results
+
+This repository also includes the training and implementation source codes of ReBNet.
+Implementing ReBNet takes similar steps as LUTNet, except LUT array replacement is not needed -- all BNN weights are stored in the generated C header file `weight.h`.
+
+## Change Microarchitecture
+
+The default microarchitecture is (5,1)-LUTNet.
+The source codes for other microarchitectures are included in this repository.
+
+## LUTNet-ReBNet Hybrids
+
+Mixing and matching LUTNet and ReBNet layers are supported.
+The following source codes should be modified.
+```
+tiled-lutnet/training-software/model_architectures.py (LUT=LUT for LUTNet and LUT=False for ReBNet)
+tiled-lutnet/lutnet/h5py-2-hls/MNIST/h52header_51lut_tm_mnist_spase.py
+tiled-lutnet/lutnet/src/network/MNIST/hw/top.cpp
+```
diff --git a/tiled-lutnet/README.md b/tiled-lutnet/README.md
@@ -0,0 +1,82 @@
+# LUTNet: Learning FPGA Configurations for Efficient Neural Network Inference
+
+## Training LUTNet
+
+The training software of LUTNet uses a train-prune-retrain workflow. 
+Here, we use the LFC model with MNIST dataset as an example.
+
+### Step 1: Train BNN (ReBNet) From Scratch
+
+```
+cd training-software/MNIST-CIFAR-SVHN/
+bash bnn_regularised_training.sh
+bash dummy_generation.sh
+```
+
+Select the dataset (MNIST/CIFAR-10/SVHN) when prompted.
+
+### Step 2: BNN Fine-grained Pruning + Logic Expansion (Retraining with LUTNet Architecture)
+
+Open `models/${dataset}/scripts/bnn_pruning.py` and edit each layer's pruning threshold. Below is an example with LFC model classifying MNIST (-1 means no pruning), where a higher threshold corresponds to more aggressive pruning.
+
+```
+p_d1=-1
+p_d2=0.78
+p_d3=0.78
+p_d4=0.78
+p_d5=-1
+```
+Then, execute the LUTNet retraining script.
+
+```
+bash lutnet_training_script.sh
+```
+
+Select the test id (an identifier that distinguishes among multiple test outputs) and the dataset when prompted. After training finishes, the pretrained network and accuracy results for the intermediate BNN and final LUTNet can be found in `models/${dataset}/pruned_bnn` and `models/${dataset}/pruned_lutnet`, respectively.
+
+## Mapping a Trained LUTNet on an FPGA
+
+The pretrained LUTNet (in .h5 format) is converted into RTL (verilog) and then synthesised into an FPGA bitstream.
+
+### Step 1: Convert a Pretrained LUTNet into C Headers and Verilog Source Codes
+
+```
+cd lutnet/h5py-2-hls/MNIST
+bash 51lutnet_mnist_generate_lutarray.sh
+```
+Enter the test id of the pretrained network (that you'd like to implement) when prompted.
+The script generates two sets of source codes: LUT arrays in verilog format and other parameters (bn thresholds, scaling factors etc.) in C header format.
+The file copy of LUT array verilog files may fail because the destination folder `/lutnet/src/network/LUTNET_c6/` does not exist yet.
+This is normal as we will run this script again after HLS, and the folder will then exist.
+
+### Step 2: HLS
+
+```
+cd ../../lutnet/src/network/
+bash lutnet_synthesis_script_part1.sh
+```
+Wait for HLS to finish. 
+For CNV it could take up to a day.
+The HLS output directory is `LUTNET_c6/`.
+In side, the LUT arrays are synthesised as place holder modules which contains no meaningful logic.
+We now replace those place holders with LUT array verilogs that we have generated in Step 1.
+
+IMPORTANT: After HLS finishes, open `LUTNET_c6/sol/syn/verilog/DoCompute.v` and scroll all the way down to around line 530 (may vary randomly) where the FIFO modules are instantiated.
+For reasons unknown to me (I think it is a bug with HLS) only some of these modules were created (again, randomly) whereas others do not exist.
+My workaround is to check which modules are created and use them to replace the missing ones -- they are functionally identical.
+
+Then, go back to `lutnet/h5py-2-hls/MNIST` and `bash 51lutnet_mnist_generate_lutarray.sh` again.
+This time the file copies should be successful, and the LUT array place holders are replaced.
+
+### Step 3: Vivado Synthesis.
+
+```
+bash lutnet_synthesis_script_part1.sh
+```
+
+The final step, bitstream generation, will fail as the pin assignment is not complete.
+But you can still obtain the post-placement utilisation and power consumption reports under `src/network/vivado_output`.
+
+## Custom Models
+
+For advices on how to make changes to the models please see [ADVANCED.md](ADVANCED.md).
diff --git a/tiled-lutnet/lutnet/h5py-2-hls/CIFAR_10/.gitignore b/tiled-lutnet/lutnet/h5py-2-hls/CIFAR_10/.gitignore
@@ -0,0 +1 @@
+*.h5
diff --git a/tiled-lutnet/lutnet/h5py-2-hls/CIFAR_10/51lutnet_generate_lutarray.sh b/tiled-lutnet/lutnet/h5py-2-hls/CIFAR_10/51lutnet_generate_lutarray.sh
@@ -0,0 +1,11 @@
+#!/bin/bash
+
+echo -e "Please enter the test id of pretrained lutnet that you'd like to retrieve: "
+read id
+
+cp ../../../training-software/MNIST-CIFAR-SVHN/models/CIFAR-10/pruned_lutnet/pruned_lutnet_${id}_BIN.h5 pretrained_network_51lut_tm.h5
+python h52header_51lut_tm_spase.py
+mkdir -p ../codegen_output
+cp ../codegen_output/LUTARRAY.v ../codegen_output/LUTARRAY_1.v ../codegen_output/LUTARRAY_2.v ../codegen_output/LUTARRAY_3.v ../codegen_output/LUTARRAY_4.v ../codegen_output/LUTARRAY_5.v ../codegen_output/LUTARRAY_6.v ../codegen_output/LUTARRAY_7.v ../../src/network/LUTNET_c6/sol1/syn/verilog/
+cp ../codegen_output/weights.h ../../src/network/CIFAR10/hw/weights.h
+