栏目分类:
子分类:
返回
名师互学网用户登录
快速导航关闭
当前搜索
当前分类
子分类
实用工具
热门搜索
名师互学网 > IT > 软件开发 > 后端开发 > Python

PyTorch转TensorRT 加速推理

Python 更新时间: 发布时间: IT归档 最新发布 模块sitemap 名妆网 法律咨询 聚返吧 英语巴士网 伯小乐 网商动力

PyTorch转TensorRT 加速推理

一、what onNX and TensorRT are onnx

You can train your model in any framework of your choice and then convert it to onNX format.
The huge benefit of having a common format is that the software or hardware that loads your model at run time only needs to be compatible with ONNX.
不同框架(pytorch,tf,mxnet等)转为同一框架(onnx),便于在不同的软硬件平台加载模型

TensorRT

NVIDIA’s TensorRT is an SDK for high performance deep learning inference.
It provides APIs to do inference for pre-trained models and generates optimized runtime engines for your platform.
从精度,显存,硬件几个方面来加速模型推理效率

二、Enviroment

Install PyTorch, ONNX, and OpenCV
Install TensorRT
Download and install NVIDIA CUDA 10.0 or later following by official instruction: link
Download and extract CuDNN library for your CUDA version (login required): link
Download and extract NVIDIA TensorRT library for your CUDA version (login required): link. The minimum required version is 6.0.1.5. Please follow the Installation Guide for your system and don’t forget to install Python’s part
Add the absolute path to CUDA, TensorRT, CuDNN libs to the environment variable PATH or LD_LIBRARY_PATH
Install PyCUDA

三、convert 1.Load and launch a pre-trained model using PyTorch 2. Convert the PyTorch model to onNX format 3. Visualize onNX Model 4. Initialize model in TensorRT

Now it’s time to parse the onNX model and initialize TensorRT Context and Engine. To do it we need to create an instance of Builder. The builder can create Network and generate Engine (that would be optimized to your platformhardware) from this network. When we create Network we can define the structure of the network by flags, but in our case, it’s enough to use default flag which means all tensors would have an implicit batch dimension. With Network definition we can create an instance of Parser and finally, parse our onNX file.
Tips: Initialization can take a lot of time because TensorRT tries to find out the best and faster way to perform your network on your platform. To do it only once and then use the already created engine you can serialize your engine. Serialized engines are not portable across different GPU models, platforms, or TensorRT versions. Engines are specific to the exact hardware and software they were built on.

5. Main pipeline 参考 (建议啃一下)

https://learnopencv.com/how-to-convert-a-model-from-pytorch-to-tensorrt-and-speed-up-inference/
https://www.cnblogs.com/mrlonely2018/p/14842107.html
https://learnopencv.com/how-to-run-inference-using-tensorrt-c-api/
https://blog.csdn.net/yanggg1997/article/details/111587687

转载请注明:文章转载自 www.mshxw.com
本文地址:https://www.mshxw.com/it/740542.html
我们一直用心在做
关于我们 文章归档 网站地图 联系我们

版权所有 (c)2021-2022 MSHXW.COM

ICP备案号:晋ICP备2021003244-6号