Projects in Awesome Lists tagged with tensorrt-llm
A curated list of projects in awesome lists tagged with tensorrt-llm .
https://github.com/deftruth/awesome-llm-inference
📖A curated list of Awesome LLM/VLM Inference Papers with codes: WINT8/4, FlashAttention, PagedAttention, MLA, Parallelism, etc. 🎉🎉
awesome-llm deepseek deepseek-r1 deepseek-v3 flash-attention flash-attention-3 flash-mla llm-inference minimax-01 mla paged-attention tensorrt-llm vllm
Last synced: 04 Apr 2025
https://github.com/janhq/nitro
Local AI API Platform
gguf llamacpp onnx onnxruntime tensorrt-llm
Last synced: 08 Mar 2025
https://github.com/janhq/cortex.cpp
Local AI API Platform
gguf llamacpp onnx onnxruntime tensorrt-llm
Last synced: 13 Mar 2025
https://github.com/collabora/whisperlive
A nearly-live implementation of OpenAI's Whisper.
dictation obs openai tensorrt tensorrt-llm text-to-speech translation voice-recognition whisper whisper-tensorrt
Last synced: 09 Apr 2025
https://github.com/collabora/WhisperLive
A nearly-live implementation of OpenAI's Whisper.
dictation obs openai tensorrt tensorrt-llm text-to-speech translation voice-recognition whisper whisper-tensorrt
Last synced: 07 Apr 2025
https://github.com/shashikg/whispers2t
An Optimized Speech-to-Text Pipeline for the Whisper Model Supporting Multiple Inference Engine
asr deep-learning speech-recognition speech-to-text tensorrt tensorrt-llm vad voice-activity-detection whisper
Last synced: 12 Apr 2025
https://github.com/shashikg/WhisperS2T
An Optimized Speech-to-Text Pipeline for the Whisper Model Supporting Multiple Inference Engine
asr deep-learning speech-recognition speech-to-text tensorrt tensorrt-llm vad voice-activity-detection whisper
Last synced: 08 May 2025
https://github.com/huggingface/optimum-benchmark
🏋️ A unified multi-backend utility for benchmarking Transformers, Timm, PEFT, Diffusers and Sentence-Transformers with full support of Optimum's hardware optimizations & quantization schemes.
benchmark neural-compressor onnxruntime openvino pytorch tensorrt-llm text-generation-inference
Last synced: 04 Dec 2024
https://github.com/netease-media/grps
Deep Learning Deployment Framework: Supports tf/torch/trt/trtllm/vllm and other NN frameworks. Support dynamic batching, and streaming modes. It is dual-language compatible with Python and C++, offering scalability, extensibility, and high performance. It helps users quickly deploy models and provide services through HTTP/RPC interfaces.
dynamic-batching serving tensorflow tensorrt tensorrt-llm torch triton-inference-server vllm
Last synced: 05 Apr 2025
https://github.com/NetEase-Media/grps
【深度学习模型部署框架】支持tf/torch/trt/trtllm/vllm以及更多nn框架,支持dynamic batching、streaming模式,支持python/c++双语言,可限制,可拓展,高性能。帮助用户快速地将模型部署到线上,并通过http/rpc接口方式提供服务。
dynamic-batching serving tensorflow tensorrt tensorrt-llm torch triton-inference-server vllm
Last synced: 18 Feb 2025
https://github.com/netease-media/grps_trtllm
Higher performance OpenAI LLM service than vLLM serve: A pure C++ high-performance OpenAI LLM service implemented with GPRS+TensorRT-LLM+Tokenizers.cpp, supporting chat and function call, AI agents, distributed multi-GPU inference, multimodal capabilities, and a Gradio chat interface.
ai-agent chatglm deepseek-r1 function-call internvideo internvl2 janus-pro llama-index llama3 llm minicpm-v multi-modal olmocr openai phi qwen2 qwen2-vl qwq tensorrt-llm
Last synced: 06 Apr 2025
https://github.com/NetEase-Media/grps_trtllm
【高性能OpenAI LLM服务】通过GPRS+TensorRT-LLM+Tokenizers.cpp实现纯C++版高性能OpenAI LLM服务,支持chat和function call模式,支持ai agent,支持分布式多卡推理,支持多模态,支持gradio聊天界面。
ai-agent chatglm deepseek-r1 function-call internvl2 llama-index llama3 llm multi-modal openai qwen-vl qwen2 qwen2-vl tensorrt-llm
Last synced: 18 Feb 2025
https://github.com/guidance-ai/llgtrt
TensorRT-LLM server with Structured Outputs (JSON) built with Rust
cfg guidance json openai-api regex structured-generation tensorrt-llm
Last synced: 05 May 2025
https://github.com/argonne-lcf/llm-inference-bench
LLM-Inference-Bench
benchmark deepspeed inference llamacpp llm tensorrt-llm vllm
Last synced: 10 Apr 2025
https://github.com/zrzrzrzrzrzrzr/lm-fly
大模型推理框架加速,让 LLM 飞起来
llm llm-inference mlx openvino tensorrt-llm tgi vllm
Last synced: 28 Dec 2024
https://github.com/j3soon/llm-tutorial
LLM tutorial materials include but not limited to NVIDIA NeMo, TensorRT-LLM, Triton Inference Server, and NeMo Guardrails.
llm nemo nemo-guardrails nvidia-nemo tensorrt-llm
Last synced: 10 Apr 2025