Projects in Awesome Lists tagged with inference-server

https://github.com/roboflow/inference

Turn any computer or edge device into a command center for your computer vision projects.

agents classification computer-vision deployment docker inference inference-api inference-server instance-segmentation jetson machine-learning object-detection onnx python tensorrt vit yolo11 yolov12 yolov5 yolov8

Last synced: 10 Apr 2026

https://github.com/containers/ramalama

The goal of RamaLama is to make working with AI boring.

ai containers inference-server llamacpp llm podman vllm

Last synced: 04 Feb 2026

https://github.com/basetenlabs/truss

The simplest way to serve AI/ML models in production

artificial-intelligence easy-to-use falcon inference-api inference-server machine-learning model-serving open-source packaging stable-diffusion whisper wizardlm

Last synced: 08 May 2026

https://github.com/pipeless-ai/pipeless

An open-source computer vision framework to build and deploy apps in minutes

artificial-intelligence cloud computer-vision deep-learning ffmpeg gstreamer inference inference-server machine-learning multimedia multimedia-applications object-detection perception pipeline-framework python stream-processing video video-processing vision-framework yolo

Last synced: 09 Apr 2026

https://github.com/underneathall/pinferencia

Python + Inference - Model Deployment library in Python. Simplest model inference server ever.

ai artificial-intelligence computer-vision data-science deep-learning huggingface inference inference-server machine-learning model-deployment model-serving modelserver nlp paddlepaddle predict python pytorch serving tensorflow transformers

Last synced: 08 Oct 2025

https://github.com/michael-a-kuykendall/shimmy

⚡ Python-free Rust inference server — OpenAI-API compatible. GGUF + SafeTensors, hot model swap, auto-discovery, single binary. FREE now, FREE forever.

api-server command-line-tool developer-tools gguf huggingface huggingface-models huggingface-transformers inference-server llama llamacpp llm-inference local-ai lora machine-learning ollama-api openai-compatible rust rust-crate transformers

Last synced: 13 Sep 2025

https://github.com/bmw-innovationlab/bmw-yolov4-inference-api-gpu

This is a repository for an nocode object detection inference API using the Yolov3 and Yolov4 Darknet framework.

alexeyab-darknet api bounding-boxes computer-vision deep-learning deeplearning detection-inference-api docker dockerfile gpu inference inference-gui inference-server neural-network no-code rest-api yolo yolo-gui yolov3 yolov4

Last synced: 02 Jul 2025

https://github.com/aiptimizer/TurboOCR

Fast GPU OCR server. 270 img/s on FUNSD. TensorRT FP16, PP-OCRv5, HTTP + gRPC.

document-ai document-parsing easyocr fastapi fp16 gpu-ocr grpc inference-server nvidia ocr paddleocr pdf-extraction qwen-vl rag tensorrt text-detection text-recognition

Last synced: 26 Jun 2026

https://github.com/containers/podman-desktop-extension-ai-lab

Work with LLMs on a local environment using containers

ai containers inference-server llms local podman

Last synced: 16 May 2025

https://github.com/bmw-innovationlab/bmw-yolov4-inference-api-cpu

This is a repository for an nocode object detection inference API using the Yolov4 and Yolov3 Opencv.

api bounding-boxes computer-vision cpu cpu-inference-api deep-learning deep-neural-networks detection-inference-api docker inference inference-gui inference-server neural-network no-code object-detection opencv rest-api yolov3 yolov4 yolov4-darknet

Last synced: 02 Jul 2025

https://github.com/bmw-innovationlab/bmw-tensorflow-inference-api-cpu

This is a repository for an object detection inference API using the Tensorflow framework.

api bounding-boxes computer-vision computervision cpu deep-learning deeplearning detection-inference-api docker docker-ce docker-container docker-image inference inference-engine inference-server object-detection predictions rest-api tensorflow tensorflow-framework

Last synced: 23 Oct 2025

https://github.com/kibae/onnxruntime-server

ONNX Runtime Server: The ONNX Runtime Server is a server that provides TCP and HTTP/HTTPS REST APIs for ONNX inference.

ai contributions-welcome cuda deep-learning inference-server machine-learning nueral-networks onnx onnxruntime

Last synced: 05 Apr 2025

https://github.com/vertexclique/orkhon

Orkhon: ML Inference Framework and Server Runtime

async data-parallelism inference-server machine-learning multiprocessing python3 tensorflow

Last synced: 07 Apr 2025

https://github.com/autodeployai/ai-serving

Serving AI/ML models in the open standard formats PMML and ONNX with both HTTP (REST API) and gRPC endpoints

ai-serving inference inference-server onnx onnx-grpc onnx-inference onnx-models onnx-realtime onnx-rest pmml pmml-deployment pmml-grpc pmml-inference pmml-model pmml-realtime pmml-rest

Last synced: 21 Feb 2026

https://github.com/kf5i/k3ai

K3ai is a lightweight, fully automated, AI infrastructure-in-a-box solution that allows anyone to experiment quickly with Kubeflow pipelines. K3ai is perfect for anything from Edge to laptops.

artificial-intelligence datascience edge inference-server k3s kubeflow kubeflow-pipelines kubernetes machinelearning

Last synced: 12 Jul 2025

https://github.com/notai-tech/fastdeploy

Deploy DL/ ML inference pipelines with minimal extra code.

deep-learning docker falcon gevent gunicorn http-server inference-server model-deployment model-serving python pytorch serving streaming-audio tensorflow-serving tf-serving torchserve triton triton-inference-server triton-server websocket

Last synced: 13 Apr 2025

https://github.com/jundot/omlx

LLM inference server with continuous batching & SSD caching for Apple Silicon — managed from the macOS menu bar

apple-silicon inference-server llm macos mlx openai-api

Last synced: 27 May 2026

https://github.com/rubixml/server

A standalone inference server for trained Rubix ML estimators.

api http-server inference inference-engine inference-server infrastructure json-api machine-learning microservice ml-infrastructure model-deployment model-server php php-machine-learning php-ml rest-api rubix-ml rubix-server

Last synced: 30 Jul 2025

https://github.com/curtisgray/wingman

Wingman is the fastest and easiest way to run Llama models on your PC or Mac.

ai chatbot chatgpt download downloader gpu gpu-acceleration gpu-monitoring inference inference-engine inference-server linux llama llamacpp llm local macos openai windows

Last synced: 05 Oct 2025

https://github.com/friendliai/friendli-client

Friendli: the fastest serving engine for generative AI

ai generative-ai gpt gpt3 inference inference-engine inference-server llama2 llm llm-inference llm-ops llm-serving llmops llms mistral ml mlops serving stable-diffusion

Last synced: 05 Apr 2025

https://github.com/tensorchord/inference-benchmark

Benchmark for machine learning model online serving (LLM, embedding, Stable-Diffusion, Whisper)

benchmark inference-server llm stable-diffusion whisper

Last synced: 30 Apr 2025

https://github.com/tommylemon/cvauto

👁 零代码零标注 CV AI 自动化测试工具 🚀 免除大量人工画框和打标签等，直接零代码快速自动化测试 CV 计算机视觉 AI 人工智能图像识别算法：行人检测、动植物分类、人脸识别、OCR 车牌识别、旋转校正、舞蹈姿态、抠图分割等，还可一键下载测试报告、导出训练和测试数据集

ai ai-testing apijson classification computer-vision cv cv2 detection face-recognition inference-api inference-server ocr pose-estimation rotation segmentation test-automation ultralytics ultralytics-yolo yolo yolo11

Last synced: 15 Sep 2025

https://github.com/roboflow/inference-dashboard-example

Roboflow's inference server to analyze video streams. This project extracts insights from video frames at defined intervals and generates informative visualizations and CSV outputs.

inference inference-server object-detection predictions

Last synced: 05 May 2025

https://github.com/nglsg/uniapi

The Universal LLM Gateway - Integrate ANY AI Model with One Consistent API

ai ai-tools api-client api-integration api-wrapper chatbot cpp cross-platform cuda gpu-accelerated high-performance http-server inference-server language-model llm llm-integration openai-compatible rest-api universal-api

Last synced: 17 Jun 2025

https://github.com/hec-ovi/vllm-qwen

vLLM + Qwen3.6-27B (BF16) OpenAI-compatible inference server on AMD Strix Halo (Ryzen AI Max+ 395, gfx1151). Vision input, 256K context, /v1/responses with separated reasoning, via TheRock ROCm.

amd docker gfx1151 inference-server llm-serving local-llm multimodal-llm openai-compatible qwen qwen3 rocm ryzen-ai self-hosted strix-halo vllm