This directory provides examples of how to deploy Deep CNNs on FPGAs using Xilinx Python APIs.
All examples provided within this directory exercise precompiled models whose components are stored in the local ./data directory.
A "compiled model" consists of low level HW instructions, and quantization parameters.
Compiler Outputs: JSON FILE (HW Instructions *_56.json or _28.json), model_data directory with preprocessed floating point weights.
Quantizer Outputs: JSON FILE containing scaling factors for each layer in the corresponding network. (_8b.json or *_16b.json)
Important Notes:
- The final layers of the network (Fully connected, Softmax) are run on the CPU, as those layers are not supported by the FPGA
- The streaming_classify example will require you to download the imagnet validation set, and place the images here
- Amazon AWS EC2 F1 requires root privileges to load the FPGA, use the documented workaround
The following three examples of applications using the Python xfDNN API are provided:
- A Test Classification example that demonstrates how to run inference on a single image "dog.jpg"
- A Streaming Classifcation example that streams images from disk through the FPGA for classification.
- A Multi-Network example that shows different DNNs running independently on multiple processing elements on the FPGA.
The provided examples can target a few different hardware overlays. For more detail on the various configurations see here.
As of ml-suite 1.3 release you can target the latest generation of the accelerator "XDNNv3" by using -k v3
.
Note that XDNNv3 only supports 8b precision at this time.
"XDNNv2" is invoked using the -k med
and -k large
flags.
To run any of the three examples, use the provided bash run.sh script.
-
Start Anaconda.
$ conda activate ml-suite
-
Navigate to the
ml-suite/examples/classification
dir.$ cd ml-suite/examples/classification
-
Familiarize yourself with the script usage by:
./run.sh -h
The key parameters are:- -p
platform
Valid values arealveo-u200
,alveo-u250
,aws
,nimbix
,1525
,1525-ml
- Note: If the platform flag is omitted software will try to auto-detect the platform, but the -ml shells will always need to be specified
- -t
test
- Valid values aretest_classify
orstreaming_classify
ormultinet
- -k
kernel config
- Valid values aremed
,large
orv3
- Used to select overlaybins - -b
quantization precision
- Valid values are16
or8
- corresponding to INT16 or INT8- Note: XDNNv3 only supports 8 which is also the default precision (INT8)
- -c
compiler optimized
- This flag runs the network with a compiler optimization for max throughput- Note: XDNNv3 only
- -g
check golden
- This flag enables accuracy checking given a golden result text file.
- -p
- Single Image Classification on alveo-u200, ResNet50 v1, with XDNNv3:
$ ./run.sh -p alveo-u200 -t test_classify -k v3 -b 8 -m resnet50
- Single Image Classification on AWS, with medium kernels:
$ ./run.sh -p aws -t test_classify -k med -b 16
- Streaming Image Classification on alveo-u200 with large kernels:
$ ./run.sh -p alveo-u200 -t streaming_classify -k large -b 8
- Streaming Image Classification on alveo-u200 with XDNNv3:
$ ./run.sh -p alveo-u200 -t streaming_classify -k v3 -b 8
- Streaming Image Classification on alveo-u200 with XDNNv3, throughput optimized, and reporting accuracy for Imagenet:
$ ./run.sh -p alveo-u200 -t streaming_classify -k v3 -b 8 -g -c throughput
- Multinet Image Classification on Nimbix (Currently, for Multinet only the med size kernel and 16b precision are supported)
./run.sh -p nimbix -t multinet -k med -b 16
Take a look at the following scripts to understand the examples:
- run.sh
- test_classify.py
- mp_classify.py
- test_classify_async_multinet.py
The python scripts use the arg parser defined in xdnn_io.py
--xclbin
- Defines which FPGA binary overlay to use. The available binaries are stored in overlaybins--netcfg
- FPGA instructions generated by the Compiler for the network being ran--quantizecfg
- Path to json file to use for quantization (The json file contains scaling params)--fpgaoutsz
- Flattened size of the final activation computed by FPGA (The FPGA will not do FC layers or Softmax)--datadir
- Path to data files to run for the network (weights)--labels
- Path to text file containing line seperated labels--golden
- Path to text file containing line seperated correct labels--images
- Directory with image files to classify (Only applicable to streaming_classify)--jsoncfg
- Path to json file used to define seperate networks (Only applicable to multinet)
For Multinet deployments, the different models/networks are set in the --jsoncfg
file. For the Multinet example given above, see how to set the arguments here [multinet.json][]
$ ./run.sh -p 1525 -t test_classify -k med -b 16
=============== pyXDNN =============================
[XBLAS] # kernels: 1
Linux:4.4.0-121-generic:#145-Ubuntu SMP Fri Apr 13 13:47:23 UTC 2018:x86_64
Distribution: Ubuntu 16.04.2 LTS
GLIBC: 2.23
---
---
CL_PLATFORM_VENDOR Xilinx
CL_PLATFORM_NAME Xilinx
CL_DEVICE_0: 0x175a1d0
CL_DEVICES_FOUND 1, using 0
loading /home/bryanloz/DEEPXILINX/MLsuite/overlaybins/1525/xdnn_28_16b.xclbin
[XBLAS] kernel0: kernelSxdnn_0
[XDNN] loading xclbin settings from /home/bryanloz/DEEPXILINX/MLsuite/overlaybins/1525/xdnn_28_16b.xclbin.json
[XDNN] using custom DDR banks 2,2,1,1
[XDNN] kernel configuration
[XDNN] num cores : 2
[XDNN] dsp array width : 28
[XDNN] img mem size : 4 MB
[XDNN] version : 2.1
[XDNN] 8-bit mode : 0
[XDNN] Max Image W/H : 0
[XDNN] Max Image Depth : 0
Loading weights/bias/quant_params to FPGA...
---------- Prediction 0 for dog.jpg ----------
0.7376 - "n02112018 Pomeranian"
0.0785 - "n02123394 Persian cat"
0.0365 - "n02085620 Chihuahua"
0.0183 - "n02492035 capuchin, ringtail, Cebus capucinus"
0.0150 - "n02094433 Yorkshire terrier"