File size: 3,651 Bytes
be49b0b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
# ONNX -> TensorRT INT8
These scripts were last tested using the 
[NGC TensorRT Container Version 20.06-py3](https://ngc.nvidia.com/catalog/containers/nvidia:tensorrt).
You can see the corresponding framework versions for this container [here](https://docs.nvidia.com/deeplearning/sdk/tensorrt-container-release-notes/rel_20.06.html#rel_20.06).

## Quickstart

> **NOTE**: This INT8 example is only valid for **fixed-shape** ONNX models at the moment. 
>
INT8 Calibration on **dynamic-shape** models is now supported, however this example has not been updated
to reflect that yet. For more details on INT8 Calibration for **dynamic-shape** models, please
see the [documentation](https://docs.nvidia.com/deeplearning/tensorrt/developer-guide/index.html#int8-calib-dynamic-shapes).

### 1. Convert ONNX model to TensorRT INT8

See `./onnx_to_tensorrt.py -h` for full list of command line arguments.

```bash
./onnx_to_tensorrt.py --explicit-batch \
                      --onnx resnet50/model.onnx \
                      --fp16 \
                      --int8 \
                      --calibration-cache="caches/yolov6.cache" \
                      -o resnet50.int8.engine 
```

See the [INT8 Calibration](#int8-calibration) section below for details on calibration
using your own model or different data, where you don't have an existing calibration cache
or want to create a new one.

## INT8 Calibration

See [ImagenetCalibrator.py](ImagenetCalibrator.py) for a reference implementation
of TensorRT's [IInt8EntropyCalibrator2](https://docs.nvidia.com/deeplearning/sdk/tensorrt-api/python_api/infer/Int8/EntropyCalibrator2.html).

This class can be tweaked to work for other kinds of models, inputs, etc.

In the [Quickstart](#quickstart) section above, we made use of a pre-existing cache,
[caches/yolov6.cache](caches/yolov6.cache), to save time for the sake of an example.

However, to calibrate using different data or a different model, you can do so with the `--calibration-data` argument.

* This requires that you've mounted a dataset, such as Imagenet, to use for calibration.
    * Add something like `-v /imagenet:/imagenet` to your Docker command in Step (1) 
      to mount a dataset found locally at `/imagenet`.
* You can specify your own `preprocess_func` by defining it inside of `ImageCalibrator.py` 

```bash
# Path to dataset to use for calibration. 
#   **Not necessary if you already have a calibration cache from a previous run.
CALIBRATION_DATA="/imagenet"

# Truncate calibration images to a random sample of this amount if more are found.
#   **Not necessary if you already have a calibration cache from a previous run.
MAX_CALIBRATION_SIZE=512

# Calibration cache to be used instead of calibration data if it already exists,
# or the cache will be created from the calibration data if it doesn't exist.
CACHE_FILENAME="caches/yolov6.cache"

# Path to ONNX model
ONNX_MODEL="model/yolov6.onnx"

# Path to write TensorRT engine to
OUTPUT="yolov6.int8.engine"

# Creates an int8 engine from your ONNX model, creating ${CACHE_FILENAME} based
# on your ${CALIBRATION_DATA}, unless ${CACHE_FILENAME} already exists, then
# it will use simply use that instead.
python3 onnx_to_tensorrt.py --fp16 --int8 -v \
        --max_calibration_size=${MAX_CALIBRATION_SIZE} \
        --calibration-data=${CALIBRATION_DATA} \
        --calibration-cache=${CACHE_FILENAME} \
        --preprocess_func=${PREPROCESS_FUNC} \
        --explicit-batch \
        --onnx ${ONNX_MODEL} -o ${OUTPUT}

```

### Pre-processing

In order to calibrate your model correctly, you should `pre-process` your data the same way
that you would during inference.