Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
Projects in Awesome Lists tagged with tensorrt-llm
A curated list of projects in awesome lists tagged with tensorrt-llm .
https://github.com/janhq/cortex.cpp
Local AI API Platform
gguf llamacpp onnx onnxruntime tensorrt-llm
Last synced: 18 Dec 2024
https://github.com/collabora/WhisperLive
A nearly-live implementation of OpenAI's Whisper.
dictation obs openai tensorrt tensorrt-llm text-to-speech translation voice-recognition whisper whisper-tensorrt
Last synced: 06 Nov 2024
https://github.com/collabora/whisperlive
A nearly-live implementation of OpenAI's Whisper.
dictation obs openai tensorrt tensorrt-llm text-to-speech translation voice-recognition whisper whisper-tensorrt
Last synced: 17 Dec 2024
https://github.com/shashikg/whispers2t
An Optimized Speech-to-Text Pipeline for the Whisper Model Supporting Multiple Inference Engine
asr deep-learning speech-recognition speech-to-text tensorrt tensorrt-llm vad voice-activity-detection whisper
Last synced: 20 Dec 2024
https://github.com/shashikg/WhisperS2T
An Optimized Speech-to-Text Pipeline for the Whisper Model Supporting Multiple Inference Engine
asr deep-learning speech-recognition speech-to-text tensorrt tensorrt-llm vad voice-activity-detection whisper
Last synced: 14 Nov 2024
https://github.com/huggingface/optimum-benchmark
🏋️ A unified multi-backend utility for benchmarking Transformers, Timm, PEFT, Diffusers and Sentence-Transformers with full support of Optimum's hardware optimizations & quantization schemes.
benchmark neural-compressor onnxruntime openvino pytorch tensorrt-llm text-generation-inference
Last synced: 04 Dec 2024
https://github.com/netease-media/grps
【深度学习模型部署框架】支持tf/torch/trt/trtllm/vllm以及更多nn框架,支持dynamic batching、streaming模式,支持python/c++双语言,可限制,可拓展,高性能。帮助用户快速地将模型部署到线上,并通过http/rpc接口方式提供服务。
dynamic-batching serving tensorflow tensorrt tensorrt-llm torch triton-inference-server vllm
Last synced: 22 Dec 2024
https://github.com/netease-media/grps_trtllm
【grps接入trtllm】通过GPRS+TensorRT-LLM+Tokenizers.cpp实现纯C++版高性能OpenAI LLM服务,支持chat和function call模式,支持ai agent,支持分布式多卡推理,支持多模态,支持gradio聊天界面。
ai-agent chatglm function-call internvl2 llama-index llama3 llm multi-modal openai qwen-vl qwen2 tensorrt-llm
Last synced: 17 Dec 2024
https://github.com/argonne-lcf/llm-inference-bench
LLM-Inference-Bench
benchmark deepspeed inference llamacpp llm tensorrt-llm vllm
Last synced: 03 Dec 2024
https://github.com/guidance-ai/llgtrt
TensorRT-LLM server with Structured Outputs (JSON) built with Rust
cfg guidance json openai-api regex structured-generation tensorrt-llm
Last synced: 17 Nov 2024
https://github.com/zrzrzrzrzrzrzr/lm-fly
大模型推理框架加速,让 LLM 飞起来
llm llm-inference mlx openvino tensorrt-llm tgi vllm
Last synced: 29 Oct 2024
https://github.com/j3soon/llm-tutorial
LLM tutorial materials include but not limited to NVIDIA NeMo, TensorRT-LLM, Triton Inference Server, and NeMo Guardrails.
llm nemo nemo-guardrails nvidia-nemo tensorrt-llm
Last synced: 07 Dec 2024