Projects in Awesome Lists tagged with vllm

https://github.com/meta-llama/llama-recipes

Scripts for fine-tuning Meta Llama with composable FSDP & PEFT methods to cover single/multi-node GPUs. Supports default & custom datasets for applications such as summarization and Q&A. Supporting a number of candid inference solutions such as HF TGI, VLLM for local or cloud deployment. Demo apps to showcase Meta Llama for WhatsApp & Messenger.

ai finetuning langchain llama llama2 llm machine-learning python pytorch vllm

Last synced: 30 Dec 2024

https://github.com/xorbitsai/inference

Replace OpenAI GPT with another LLM in your app by changing a single line of code. Xinference gives you the freedom to use any LLM you need. With Xinference, you're empowered to run inference with any open-source language models, speech recognition models, and multimodal models, whether in the cloud, on-premises, or even on your laptop.

artificial-intelligence chatglm deployment flan-t5 gemma ggml glm4 inference llama llama3 llamacpp llm machine-learning mistral openai-api pytorch qwen vllm whisper wizardlm

Last synced: 30 Dec 2024

https://github.com/katanaml/sparrow

Data processing with ML, LLM and Vision LLM

computer-vision gpt huggingface-transformers llm machinelearning nlp-machine-learning rag vllm

Last synced: 24 Dec 2024

https://github.com/openrlhf/openrlhf

An Easy-to-use, Scalable and High-performance RLHF Framework (70B+ PPO Full Tuning & Iterative DPO & LoRA & RingAttention)

deepspeed large-language-models raylib reinforcement-learning reinforcement-learning-from-human-feedback transformers vllm

Last synced: 24 Dec 2024

https://github.com/OpenRLHF/OpenRLHF

An Easy-to-use, Scalable and High-performance RLHF Framework (70B+ PPO Full Tuning & Iterative DPO & LoRA & RingAttention)

deepspeed large-language-models raylib reinforcement-learning reinforcement-learning-from-human-feedback transformers vllm

Last synced: 05 Nov 2024

https://github.com/bricks-cloud/bricksllm

🔒 Enterprise-grade API gateway that helps you monitor and impose cost or rate limits per API key. Get fine-grained access control and monitoring per user, application, or environment. Supports OpenAI, Azure OpenAI, Anthropic, vLLM, and open-source LLMs.

ai anthropic api artificial-intelligence azure docker generative-ai golang gpt llm open-source openai postgresql privacy rest-api security self-hosted vllm ycombinator

Last synced: 13 Dec 2024

https://github.com/bricks-cloud/BricksLLM

🔒 Enterprise-grade API gateway that helps you monitor and impose cost or rate limits per API key. Get fine-grained access control and monitoring per user, application, or environment. Supports OpenAI, Azure OpenAI, Anthropic, vLLM, and open-source LLMs.

ai anthropic api artificial-intelligence azure docker generative-ai golang gpt llm open-source openai postgresql privacy rest-api security self-hosted vllm ycombinator

Last synced: 06 Nov 2024

https://github.com/prometheus-eval/prometheus-eval

Evaluate your LLM's response with Prometheus and GPT4 💯

evaluation gpt4 litellm llm llm-as-a-judge llm-as-evaluator llmops python vllm

Last synced: 05 Nov 2024

https://github.com/substratusai/kubeai

Private Open AI on Kubernetes

ai autoscaler faster-whisper inference-operator k8s kubernetes llm ollama ollama-operator openai-api vllm vllm-operator whisper

Last synced: 28 Dec 2024

https://github.com/jakobdylanc/llmcord

Make Discord your LLM frontend ● Supports any OpenAI compatible API (Ollama, LM Studio, vLLM, OpenRouter, xAI, Mistral, Groq and more)

bot chat chatbot discord frontend gpt gpt-4 gpt-4o grok groq llama llama3 llava llm mistral ollama oobabooga openai vllm xai

Last synced: 29 Dec 2024

https://github.com/varunshenoy/super-json-mode

Low latency JSON generation using LLMs ⚡️

huggingface-transformers llm openai vllm

Last synced: 27 Dec 2024

https://github.com/containers/ramalama

The goal of RamaLama is to make working with AI boring.

ai containers inference-server llamacpp llms local podman vllm

Last synced: 27 Dec 2024

https://github.com/microsoft/vidur

A large-scale simulation framework for LLM inference

inference llm simulation transformer vllm

Last synced: 27 Dec 2024

https://github.com/harleyszhang/llm_note

LLM notes, including model inference, transformer model structure, and llm framework code analysis notes

cuda-programming kv-cache llm llm-inference transformer-models triton-kernels vllm

Last synced: 21 Dec 2024

https://github.com/runpod-workers/worker-vllm

The RunPod worker template for serving our large language model endpoints. Powered by vLLM.

language-model llm runpod vllm

Last synced: 05 Nov 2024

https://github.com/netease-media/grps

【深度学习模型部署框架】支持tf/torch/trt/trtllm/vllm以及更多nn框架，支持dynamic batching、streaming模式，支持python/c++双语言，可限制，可拓展，高性能。帮助用户快速地将模型部署到线上，并通过http/rpc接口方式提供服务。

dynamic-batching serving tensorflow tensorrt tensorrt-llm torch triton-inference-server vllm

Last synced: 29 Dec 2024

https://github.com/gotzmann/booster

Booster - open accelerator for LLM models. Better inference and debugging for AI hackers

chatgpt exllama ggml gpt llama llama-cpp llamacpp llm ollama oobabooga openai vllm

Last synced: 29 Dec 2024

https://github.com/dnth/x.infer

Framework agnostic computer vision inference. Run 1000+ models by changing only one line of code. Supports models from transformers, timm, ultralytics, vllm, ollama and your custom model.

computer-vision inference-api ollama pytorch-image-models transformers ultralytics vllm

Last synced: 29 Dec 2024

https://github.com/Trainy-ai/llm-atc

Fine-tuning and serving LLMs on any cloud

finetuning llama2 llms vllm

Last synced: 13 Nov 2024

https://github.com/yoziru/nextjs-vllm-ui

Fully-featured, beautiful web interface for vLLM - built with NextJS.

ai llm-ui llm-webui nextjs openai-api self-hosted tailwindcss typescript ui vllm vllm-ui webui

Last synced: 25 Dec 2024

https://github.com/jasonacox/tinyllm

Setup and run a local LLM and Chatbot using consumer grade hardware.

artificial-intelligence chatbot large-language-models llama-cpp-python llm openai rag retrieval-augmented-generation vllm

Last synced: 28 Oct 2024

https://github.com/opencsgs/llm-inference

llm-inference is a platform for publishing and managing llm inference, providing a wide range of out-of-the-box features for model deployment, such as UI, RESTful API, auto-scaling, computing resource management, monitoring, and more.

deepspeed llama-cpp llm-inference ray transformer vllm

Last synced: 07 Nov 2024

https://github.com/phospho-app/fastassert

Dockerized LLM inference server with constrained output (JSON mode), built on top of vLLM and outlines. Faster, cheaper and without rate limits. Compare the quality and latency to your current LLM API provider.

docker llm llm-inference outlines vllm

Last synced: 09 Nov 2024

https://github.com/argonne-lcf/llm-inference-bench

LLM-Inference-Bench

benchmark deepspeed inference llamacpp llm tensorrt-llm vllm

Last synced: 22 Dec 2024

https://github.com/apconw/sanic-web

一个轻量级、支持全链路且易于二次开发的大模型应用项目基于 Dify 、Ollama&Vllm、Sanic 和 Text2SQL 📊 等技术构建的一站式大模型应用开发项目，采用 Vue3、TypeScript 和 Vite 5 打造现代UI。它支持通过 ECharts 📈 实现基于大模型的数据图形化问答，具备处理 CSV 文件 📂 表格问答的能力。同时，能方便对接第三方开源 RAG 系统检索系统 🌐等，以支持广泛的通用知识问答。

ai bigdata chat chatgpt dify echarts llm ollama python qwen rag sanic text2sql vllm vue3

Last synced: 03 Dec 2024

https://github.com/france-travail/happy_vllm

A REST API for vLLM, production ready

api-rest llm llm-serving production vllm

Last synced: 26 Dec 2024

https://github.com/zrzrzrzrzrzrzr/lm-fly

大模型推理框架加速，让 LLM 飞起来

llm llm-inference mlx openvino tensorrt-llm tgi vllm

Last synced: 28 Dec 2024

https://github.com/kyegomez/simpleunet

An simple implementation of Unet because all the implementations i've seen are wayy tooo complicated.

artificial-intelligence biomedical biomedical-image-processing computer-vision gpt4 image image-classification image-segmentation texttovide unet vllm

Last synced: 09 Nov 2024

https://github.com/llmariner/llmariner

LLMariner transforms your GPU clusters into a powerhouse for generative AI workloads.

ai gpu kubernetes llm ml operator vllm

Last synced: 14 Nov 2024

https://github.com/neuralmagic/nm-vllm-certs

General Information, model certifications, and benchmarks for nm-vllm enterprise distributions

vllm

Last synced: 08 Nov 2024

https://github.com/ivangabriele/docker-functionary

Ready-to-deploy Docker image for Functionary LLM served as an OpenAI-Compatible API.

ai docker docker-hub docker-image functionary functions large-language-models llama2 llm openai openai-api server vllm

Last synced: 26 Oct 2024

https://github.com/ivangabriele/docker-llm

Pre-loaded LLMs served as an OpenAI-Compatible API via Docker images.

api docker docker-image llm llms llong lmsys openai openai-api openorca orca runpod server vast vicuna vllm

Last synced: 27 Oct 2024

https://github.com/france-travail/benchmark_llm_serving

A library to benchmark LLMs via their API exposure

benchmark llm llm-serving vllm

Last synced: 26 Dec 2024

https://github.com/blib-la/ask-poddy

Ask Poddy: Run Open Source LLMs and Embeddings as OpenAI-Compatible Serverless Endpoints (Tutorial)

ai embedding endpoint infinity llm nextjs openai rag runpod serverless vllm worker

Last synced: 06 Dec 2024

https://github.com/yas-sim/openvino_genai_sample_codes

OpenVINO.genai sample codes with a helper class that supports vLLM-like iterator-based streaming output.

chatbot chatgpt edge-ai inference intel llm openvino python vllm

Last synced: 13 Oct 2024

https://github.com/agnostiqhq/tutorials_covalent_pycon_2024

agents ai ai-foundry autonomous-agents chatgpt covalent gpu hpc huggingface large-language-models llama llamacpp llm ml vllm

Last synced: 22 Dec 2024

https://github.com/navinkumarmnk/ai-learning-platform

AI-Learning-Platform, a LLM-RAG pipeline which behaves like a guide and able to solve doubts. Deployed on-premise IBM ppc64le architecture. vLLM for model inference & Qdrant with Langchain for RAG Pipeline. Server written in django, postgres & cassandra as the sql & nosql databases.

cassandra django langchain llm postgresql ppc64le qdrant ray-distributed vllm

Last synced: 18 Dec 2024

https://github.com/TimeSurgeLabs/promptproxy

Call many AIs from a single API.

ai docker huggingface llama llama2 llm openai openai-api openai-api-proxy vllm

Last synced: 05 Nov 2024

https://github.com/timesurgelabs/promptproxy

Call many AIs from a single API.

ai docker huggingface llama llama2 llm openai openai-api openai-api-proxy vllm

Last synced: 19 Nov 2024

https://github.com/getflexai/flex_ai

simplifies fine-tuning and inference for 60+ open-source LLMs through a single API

ai axolotl fine-tuning gemma inference llama llama2 llama3 llm-finetuning llms lora mistral vllm

Last synced: 22 Dec 2024

https://github.com/danitilahun/llm_projects

This repository has a lot of LLM projects done. It is the best place to start learning LLM.

fine-tuning gemini gpt gpt-3 instruction-tuning langchain large-language-models llama llm retrieval-augmented-generation transformer vllm

Last synced: 27 Dec 2024

https://github.com/dinhanhx/litellm-infinity-vllm-serving-local-models

docker compose to serve some local language models

cohere docker docker-compose infinity jinaai litellm openai vllm

Last synced: 22 Dec 2024

https://github.com/hcd233/aris-ai-model-server

An OpenAI Compatible API which integrates LLM, Embedding and Reranker. 一个集成 LLM、Embedding 和 Reranker 的 OpenAI 兼容 API

ai awq embedding fastapi gptq llm mlx openai-compatible-api rag reranker sentence-transformers vllm

Last synced: 12 Nov 2024

https://github.com/xlisp/learn-vllm

vllm learning

cuda nvidia pytorch vllm

Last synced: 07 Dec 2024

https://github.com/datvodinh/serve-llm

Serve high throughput and scalable LLM using Ray and vLLM

kubernetes llm ray torch transformers vllm

Last synced: 17 Nov 2024

https://github.com/netease-media/grps_vllm

【grps接入vllm】通过vllm LLMEngine Api实现LLM服务。

vllm

Last synced: 07 Dec 2024

https://github.com/john-knl/llm-project

A Yu-Gi-Oh! Card game AI assistant with a simple UI Mechanisms are rudimentary; accuracy of model is not guaranteed

huggingface llama3 llamaindex python3 streamlit-application vllm yugioh-api

Last synced: 07 Dec 2024

https://github.com/evilpsycho/open-llm-benchmark

Evaluate open-source language models on Agent, formatted output, command following, long text, multilingual, coding, and custom task capabilities. 开源语言模型在Agent，格式化输出，指令追随，长文本，多语言，代码，自定义任务的能力基准测试。

evaluation-framework huggingface large-language-models llamacpp llm-agent llms-benchmarking openai vllm

Last synced: 22 Dec 2024

https://github.com/g-hano/chat2mistral

This is a RAG app that can run on multiple gpus simulteniously. I used Llama-Index framework for storing and retrieving documents, vLLM library for multiple gpu support.

aws-ec2 generative-ai huggingface llama-index llm rag vllm

Last synced: 13 Nov 2024

https://github.com/paulpierre/vllm-docker

test Llama-3.2-11B-Vision-Instruct 4-bit quant quickly on an a100 40GB

docker docker-compose llama llama3 llm llm-inference llms vllm

Last synced: 22 Dec 2024

https://github.com/hemanthpai/hass-ai-assistant

A Home Assistant integration to control your smart home using a local, self hosted LLM

ai functionary hacs hacs-integration hass hassio-integration home-assistant vllm

Last synced: 10 Oct 2024

https://github.com/adesoji1/vllm-docker

How vLLM and Docker are Changing the Game for LLM Deployments

docker vllm

Last synced: 03 Dec 2024

https://github.com/seungjaelim/efficient-road-repairs-system.vlm

microsoft-phi-3 vllm vlm

Last synced: 02 Dec 2024

https://github.com/seungjaelim/efficient-road-repairs-system

[KAIST CS632] Detects road damage with FPGA-porting of YOLO, generates repair estimates using LMM RAG, and stores data in a GS1 EPCIS server for centralized management via a React web dashboard

epcis-standard fpga gs1-standard microsoft-phi-3 vllm vlm yolov8

Last synced: 19 Dec 2024

https://github.com/aisingapore/sealion-vllm

Serve the AI Singapore SEA-LION model ⚛ with vLLM

vllm

Last synced: 16 Nov 2024

https://github.com/wtlow003/modal-llm-serving

Examples of serving LLM on Modal.

llm lmdeploy modal model-serving openai openai-api sglang vllm

Last synced: 15 Nov 2024

https://github.com/murtaza-arif/rag-agnostic-guide

A comprehensive guide to building Retrieval-Augmented Generation (RAG) systems using various open-source tools.

ai gpt4all llm lmstudio localai ml mlflow mlops ollama openlit python rag ragflow vllm

Last synced: 13 Dec 2024