PyTorch转TensorRT 加速推理

一、what onNX and TensorRT are onnx

You can train your model in any framework of your choice and then convert it to onNX format.
The huge benefit of having a common format is that the software or hardware that loads your model at run time only needs to be compatible with ONNX.
不同框架（pytorch,tf,mxnet等）转为同一框架(onnx)，便于在不同的软硬件平台加载模型

TensorRT

NVIDIA’s TensorRT is an SDK for high performance deep learning inference.
It provides APIs to do inference for pre-trained models and generates optimized runtime engines for your platform.
从精度，显存，硬件几个方面来加速模型推理效率

二、Enviroment

Install PyTorch, ONNX, and OpenCV
Install TensorRT
Download and install NVIDIA CUDA 10.0 or later following by official instruction: link
Download and extract CuDNN library for your CUDA version (login required): link
Download and extract NVIDIA TensorRT library for your CUDA version (login required): link. The minimum required version is 6.0.1.5. Please follow the Installation Guide for your system and don’t forget to install Python’s part
Add the absolute path to CUDA, TensorRT, CuDNN libs to the environment variable PATH or LD_LIBRARY_PATH
Install PyCUDA

三、convert 1.Load and launch a pre-trained model using PyTorch 2. Convert the PyTorch model to onNX format 3. Visualize onNX Model 4. Initialize model in TensorRT

Now it’s time to parse the onNX model and initialize TensorRT Context and Engine. To do it we need to create an instance of Builder. The builder can create Network and generate Engine (that would be optimized to your platformhardware) from this network. When we create Network we can define the structure of the network by flags, but in our case, it’s enough to use default flag which means all tensors would have an implicit batch dimension. With Network definition we can create an instance of Parser and finally, parse our onNX file.
Tips: Initialization can take a lot of time because TensorRT tries to find out the best and faster way to perform your network on your platform. To do it only once and then use the already created engine you can serialize your engine. Serialized engines are not portable across different GPU models, platforms, or TensorRT versions. Engines are specific to the exact hardware and software they were built on.

5. Main pipeline 参考（建议啃一下）

https://learnopencv.com/how-to-convert-a-model-from-pytorch-to-tensorrt-and-speed-up-inference/
https://www.cnblogs.com/mrlonely2018/p/14842107.html
https://learnopencv.com/how-to-run-inference-using-tensorrt-c-api/
https://blog.csdn.net/yanggg1997/article/details/111587687

PyTorch转TensorRT 加速推理

Python相关栏目本月热门文章