Skip to content

Latest commit

 

History

History
147 lines (119 loc) · 6.63 KB

README.md

File metadata and controls

147 lines (119 loc) · 6.63 KB

Image Classification with Python APIs

Introduction

This directory provides examples of how to deploy Deep CNNs on FPGAs using Xilinx Python APIs.

All examples provided within this directory exercise precompiled models whose components are stored in the local ./data directory.

A "compiled model" consists of low level HW instructions, and quantization parameters.

Compiler Outputs: JSON FILE (HW Instructions *_56.json or _28.json), model_data directory with preprocessed floating point weights.
Quantizer Outputs: JSON FILE containing scaling factors for each layer in the corresponding network. (
_8b.json or *_16b.json)

Important Notes:

  • The final layers of the network (Fully connected, Softmax) are run on the CPU, as those layers are not supported by the FPGA
  • The streaming_classify example will require you to download the imagnet validation set, and place the images here
  • Amazon AWS EC2 F1 requires root privileges to load the FPGA, use the documented workaround

The following three examples of applications using the Python xfDNN API are provided:

  1. A Test Classification example that demonstrates how to run inference on a single image "dog.jpg"
  2. A Streaming Classifcation example that streams images from disk through the FPGA for classification.
  3. A Multi-Network example that shows different DNNs running independently on multiple processing elements on the FPGA.

Kernel Configurations

The provided examples can target a few different hardware overlays. For more detail on the various configurations see here.

As of ml-suite 1.3 release you can target the latest generation of the accelerator "XDNNv3" by using -k v3.
Note that XDNNv3 only supports 8b precision at this time.

"XDNNv2" is invoked using the -k med and -k large flags.

Running the Examples

To run any of the three examples, use the provided bash run.sh script.

  1. Start Anaconda.

    $ conda activate ml-suite
  2. Navigate to the ml-suite/examples/classification dir.

    $ cd ml-suite/examples/classification
  3. Familiarize yourself with the script usage by:
    ./run.sh -h
    The key parameters are:

    • -p platform Valid values are alveo-u200, alveo-u250, aws, nimbix, 1525, 1525-ml
      • Note: If the platform flag is omitted software will try to auto-detect the platform, but the -ml shells will always need to be specified
    • -t test - Valid values are test_classify or streaming_classify or multinet
    • -k kernel config - Valid values are med, large or v3 - Used to select overlaybins
    • -b quantization precision - Valid values are 16 or 8 - corresponding to INT16 or INT8
      • Note: XDNNv3 only supports 8 which is also the default precision (INT8)
    • -c compiler optimized - This flag runs the network with a compiler optimization for max throughput
      • Note: XDNNv3 only
    • -g check golden - This flag enables accuracy checking given a golden result text file.

Example Invocations

  1. Single Image Classification on alveo-u200, ResNet50 v1, with XDNNv3:
    $ ./run.sh -p alveo-u200 -t test_classify -k v3 -b 8 -m resnet50
  2. Single Image Classification on AWS, with medium kernels:
    $ ./run.sh -p aws -t test_classify -k med -b 16
  3. Streaming Image Classification on alveo-u200 with large kernels:
    $ ./run.sh -p alveo-u200 -t streaming_classify -k large -b 8
  4. Streaming Image Classification on alveo-u200 with XDNNv3:
    $ ./run.sh -p alveo-u200 -t streaming_classify -k v3 -b 8
  5. Streaming Image Classification on alveo-u200 with XDNNv3, throughput optimized, and reporting accuracy for Imagenet:
    $ ./run.sh -p alveo-u200 -t streaming_classify -k v3 -b 8 -g -c throughput
  6. Multinet Image Classification on Nimbix (Currently, for Multinet only the med size kernel and 16b precision are supported)
    ./run.sh -p nimbix -t multinet -k med -b 16

Example Script Switches

Take a look at the following scripts to understand the examples:

  • run.sh
  • test_classify.py
  • mp_classify.py
  • test_classify_async_multinet.py

The python scripts use the arg parser defined in xdnn_io.py

  • --xclbin - Defines which FPGA binary overlay to use. The available binaries are stored in overlaybins
  • --netcfg - FPGA instructions generated by the Compiler for the network being ran
  • --quantizecfg - Path to json file to use for quantization (The json file contains scaling params)
  • --fpgaoutsz - Flattened size of the final activation computed by FPGA (The FPGA will not do FC layers or Softmax)
  • --datadir - Path to data files to run for the network (weights)
  • --labels - Path to text file containing line seperated labels
  • --golden - Path to text file containing line seperated correct labels
  • --images - Directory with image files to classify (Only applicable to streaming_classify)
  • --jsoncfg - Path to json file used to define seperate networks (Only applicable to multinet)

For Multinet deployments, the different models/networks are set in the --jsoncfg file. For the Multinet example given above, see how to set the arguments here [multinet.json][]

Example Output From Single Image Classification

$ ./run.sh -p 1525 -t test_classify -k med -b 16
=============== pyXDNN =============================
[XBLAS] # kernels: 1
Linux:4.4.0-121-generic:#145-Ubuntu SMP Fri Apr 13 13:47:23 UTC 2018:x86_64
Distribution: Ubuntu 16.04.2 LTS
GLIBC: 2.23
---

---
CL_PLATFORM_VENDOR Xilinx
CL_PLATFORM_NAME Xilinx
CL_DEVICE_0: 0x175a1d0
CL_DEVICES_FOUND 1, using 0
loading /home/bryanloz/DEEPXILINX/MLsuite/overlaybins/1525/xdnn_28_16b.xclbin
[XBLAS] kernel0: kernelSxdnn_0
[XDNN] loading xclbin settings from /home/bryanloz/DEEPXILINX/MLsuite/overlaybins/1525/xdnn_28_16b.xclbin.json
[XDNN] using custom DDR banks 2,2,1,1

[XDNN] kernel configuration
[XDNN]   num cores       : 2
[XDNN]   dsp array width : 28
[XDNN]   img mem size    : 4 MB
[XDNN]   version         : 2.1
[XDNN]   8-bit mode      : 0
[XDNN]   Max Image W/H   : 0
[XDNN]   Max Image Depth : 0

Loading weights/bias/quant_params to FPGA...

---------- Prediction 0 for dog.jpg ----------
0.7376 - "n02112018 Pomeranian"
0.0785 - "n02123394 Persian cat"
0.0365 - "n02085620 Chihuahua"
0.0183 - "n02492035 capuchin, ringtail, Cebus capucinus"
0.0150 - "n02094433 Yorkshire terrier"