Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/mani-kantap/llm-inference-solutions

A collection of all available inference solutions for the LLMs
https://github.com/mani-kantap/llm-inference-solutions

llm-inference llm-serving llmops

Last synced: 3 months ago
JSON representation

A collection of all available inference solutions for the LLMs

Awesome Lists containing this project

README

        

# llm-inference-solutions
A collection of all available inference solutions for the LLMs

| Name | Org | Description |
| ------------- |:-------------:| :-------------:|
| [vllm](https://github.com/vllm-project/vllm) | UC Berkeley | A high-throughput and memory-efficient inference and serving engine for LLMs
| [Text-Generation-Inference](https://github.com/huggingface/text-generation-inference) | Hugginface🤗 |Large Language Model Text Generation Inference
| [llm-engine](https://github.com/scaleapi/llm-engine) | ScaleAI |Scale LLM Engine public repository
| [DeepSpeed](https://github.com/microsoft/DeepSpeed) | Microsoft | DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective
| [OpenLLM](https://github.com/bentoml/OpenLLM) | BentoML | Operating LLMs in production
| [LLMDeploy](https://github.com/InternLM/lmdeploy) | InternLM Team | LMDeploy is a toolkit for compressing, deploying, and serving LLM
| [FlexFlow](https://github.com/flexflow/FlexFlow) | CMU,Stanford,UCSD | A distributed deep learning framework.
| [CTranslate2](https://github.com/OpenNMT/CTranslate2) | OpenNMT | Fast inference engine for Transformer models
| [Fastchat](https://github.com/lm-sys/FastChat) | lm-sys | An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.
| [Triton-Inference-Server](https://github.com/triton-inference-server/server) | Nvidia | The Triton Inference Server provides an optimized cloud and edge inferencing solution.
| [Lepton.AI](https://github.com/leptonai/leptonai) | lepton.ai|A Pythonic framework to simplify AI service building
| [ScaleLLM](https://github.com/vectorch-ai/ScaleLLM) | Vectorch | A high-performance inference system for large language models, designed for production environments
| [Lorax](https://predibase.com/blog/lorax-the-open-source-framework-for-serving-100s-of-fine-tuned-llms-in) | Predibase | Serve 100s of Fine-Tuned LLMs in Production for the Cost of 1
| [TensorRT-LLM](https://github.com/NVIDIA/TensorRT-LLM) | Nvidia | TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines
| [mistral.rs](https://github.com/EricLBuehler/mistral.rs) | mistral.rs| Blazingly fast LLM inference.
| [NanoFlow](https://github.com/efeslab/Nanoflow) | NanoFlow | A throughput-oriented high-performance serving framework for LLMs
| [LMCache](https://github.com/LMCache/LMCache) | LMCache | Fast and Cost Efficient Inference
| [Litserve](https://github.com/Lightning-AI/LitServe) | Lighting.AI | Lightning-fast serving engine for AI models. Flexible. Easy. Enterprise-scale.