Projects in Awesome Lists tagged with tensorrt-llm

https://github.com/deftruth/awesome-llm-inference

📖A curated list of Awesome LLM/VLM Inference Papers with codes: WINT8/4, FlashAttention, PagedAttention, MLA, Parallelism, etc. 🎉🎉

awesome-llm deepseek deepseek-r1 deepseek-v3 flash-attention flash-attention-3 flash-mla llm-inference minimax-01 mla paged-attention tensorrt-llm vllm

Last synced: 04 Apr 2025

https://github.com/janhq/nitro

Local AI API Platform

gguf llamacpp onnx onnxruntime tensorrt-llm

Last synced: 08 Mar 2025

https://github.com/janhq/cortex.cpp

Local AI API Platform

gguf llamacpp onnx onnxruntime tensorrt-llm

Last synced: 13 Mar 2025

https://github.com/collabora/whisperlive

A nearly-live implementation of OpenAI's Whisper.

dictation obs openai tensorrt tensorrt-llm text-to-speech translation voice-recognition whisper whisper-tensorrt

Last synced: 09 Apr 2025

https://github.com/collabora/WhisperLive

A nearly-live implementation of OpenAI's Whisper.

dictation obs openai tensorrt tensorrt-llm text-to-speech translation voice-recognition whisper whisper-tensorrt

Last synced: 07 Apr 2025

https://github.com/shashikg/whispers2t

An Optimized Speech-to-Text Pipeline for the Whisper Model Supporting Multiple Inference Engine

asr deep-learning speech-recognition speech-to-text tensorrt tensorrt-llm vad voice-activity-detection whisper

Last synced: 12 Apr 2025

https://github.com/shashikg/WhisperS2T

An Optimized Speech-to-Text Pipeline for the Whisper Model Supporting Multiple Inference Engine

asr deep-learning speech-recognition speech-to-text tensorrt tensorrt-llm vad voice-activity-detection whisper

Last synced: 08 May 2025

https://github.com/huggingface/optimum-benchmark

🏋️ A unified multi-backend utility for benchmarking Transformers, Timm, PEFT, Diffusers and Sentence-Transformers with full support of Optimum's hardware optimizations & quantization schemes.

benchmark neural-compressor onnxruntime openvino pytorch tensorrt-llm text-generation-inference

Last synced: 04 Dec 2024

https://github.com/netease-media/grps

Deep Learning Deployment Framework: Supports tf/torch/trt/trtllm/vllm and other NN frameworks. Support dynamic batching, and streaming modes. It is dual-language compatible with Python and C++, offering scalability, extensibility, and high performance. It helps users quickly deploy models and provide services through HTTP/RPC interfaces.

dynamic-batching serving tensorflow tensorrt tensorrt-llm torch triton-inference-server vllm

Last synced: 05 Apr 2025

https://github.com/NetEase-Media/grps

【深度学习模型部署框架】支持tf/torch/trt/trtllm/vllm以及更多nn框架，支持dynamic batching、streaming模式，支持python/c++双语言，可限制，可拓展，高性能。帮助用户快速地将模型部署到线上，并通过http/rpc接口方式提供服务。

dynamic-batching serving tensorflow tensorrt tensorrt-llm torch triton-inference-server vllm

Last synced: 18 Feb 2025

https://github.com/netease-media/grps_trtllm

Higher performance OpenAI LLM service than vLLM serve: A pure C++ high-performance OpenAI LLM service implemented with GPRS+TensorRT-LLM+Tokenizers.cpp, supporting chat and function call, AI agents, distributed multi-GPU inference, multimodal capabilities, and a Gradio chat interface.

ai-agent chatglm deepseek-r1 function-call internvideo internvl2 janus-pro llama-index llama3 llm minicpm-v multi-modal olmocr openai phi qwen2 qwen2-vl qwq tensorrt-llm