https://github.com/umitkacar/onnx-tensorrt-optimization
40x faster AI inference: ONNX to TensorRT optimization with FP16/INT8 quantization, multi-GPU support, and deployment
https://github.com/umitkacar/onnx-tensorrt-optimization
cuda deep-learning edge-computing fp16 gpu-acceleration inference-acceleration int8 latency-optimization mlops model-deployment model-optimization nvidia-gpu onnx onnxruntime production-ai pytorch-to-onnx quantization real-time-inference tensorflow-to-onnx tensorrt
Last synced: 22 days ago
JSON representation
40x faster AI inference: ONNX to TensorRT optimization with FP16/INT8 quantization, multi-GPU support, and deployment
- Host: GitHub
- URL: https://github.com/umitkacar/onnx-tensorrt-optimization
- Owner: umitkacar
- License: mit
- Created: 2023-12-06T17:35:52.000Z (over 2 years ago)
- Default Branch: main
- Last Pushed: 2025-11-14T11:53:08.000Z (4 months ago)
- Last Synced: 2026-02-06T18:34:26.475Z (about 1 month ago)
- Topics: cuda, deep-learning, edge-computing, fp16, gpu-acceleration, inference-acceleration, int8, latency-optimization, mlops, model-deployment, model-optimization, nvidia-gpu, onnx, onnxruntime, production-ai, pytorch-to-onnx, quantization, real-time-inference, tensorflow-to-onnx, tensorrt
- Language: Python
- Size: 126 KB
- Stars: 4
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- Contributing: CONTRIBUTING.md
- License: LICENSE