Projects in Awesome Lists tagged with inference-server
A curated list of projects in awesome lists tagged with inference-server .
https://github.com/roboflow/inference
Turn any computer or edge device into a command center for your computer vision projects.
agents classification computer-vision deployment docker inference inference-api inference-server instance-segmentation jetson machine-learning object-detection onnx python tensorrt vit yolo11 yolov12 yolov5 yolov8
Last synced: 10 Apr 2026
https://github.com/containers/ramalama
The goal of RamaLama is to make working with AI boring.
ai containers inference-server llamacpp llm podman vllm
Last synced: 04 Feb 2026
https://github.com/basetenlabs/truss
The simplest way to serve AI/ML models in production
artificial-intelligence easy-to-use falcon inference-api inference-server machine-learning model-serving open-source packaging stable-diffusion whisper wizardlm
Last synced: 08 May 2026
https://github.com/pipeless-ai/pipeless
An open-source computer vision framework to build and deploy apps in minutes
artificial-intelligence cloud computer-vision deep-learning ffmpeg gstreamer inference inference-server machine-learning multimedia multimedia-applications object-detection perception pipeline-framework python stream-processing video video-processing vision-framework yolo
Last synced: 09 Apr 2026
https://github.com/underneathall/pinferencia
Python + Inference - Model Deployment library in Python. Simplest model inference server ever.
ai artificial-intelligence computer-vision data-science deep-learning huggingface inference inference-server machine-learning model-deployment model-serving modelserver nlp paddlepaddle predict python pytorch serving tensorflow transformers
Last synced: 08 Oct 2025
https://github.com/michael-a-kuykendall/shimmy
⚡ Python-free Rust inference server — OpenAI-API compatible. GGUF + SafeTensors, hot model swap, auto-discovery, single binary. FREE now, FREE forever.
api-server command-line-tool developer-tools gguf huggingface huggingface-models huggingface-transformers inference-server llama llamacpp llm-inference local-ai lora machine-learning ollama-api openai-compatible rust rust-crate transformers
Last synced: 13 Sep 2025
https://github.com/bmw-innovationlab/bmw-yolov4-inference-api-gpu
This is a repository for an nocode object detection inference API using the Yolov3 and Yolov4 Darknet framework.
alexeyab-darknet api bounding-boxes computer-vision deep-learning deeplearning detection-inference-api docker dockerfile gpu inference inference-gui inference-server neural-network no-code rest-api yolo yolo-gui yolov3 yolov4
Last synced: 02 Jul 2025
https://github.com/containers/podman-desktop-extension-ai-lab
Work with LLMs on a local environment using containers
ai containers inference-server llms local podman
Last synced: 16 May 2025
https://github.com/bmw-innovationlab/bmw-yolov4-inference-api-cpu
This is a repository for an nocode object detection inference API using the Yolov4 and Yolov3 Opencv.
api bounding-boxes computer-vision cpu cpu-inference-api deep-learning deep-neural-networks detection-inference-api docker inference inference-gui inference-server neural-network no-code object-detection opencv rest-api yolov3 yolov4 yolov4-darknet
Last synced: 02 Jul 2025
https://github.com/bmw-innovationlab/bmw-tensorflow-inference-api-cpu
This is a repository for an object detection inference API using the Tensorflow framework.
api bounding-boxes computer-vision computervision cpu deep-learning deeplearning detection-inference-api docker docker-ce docker-container docker-image inference inference-engine inference-server object-detection predictions rest-api tensorflow tensorflow-framework
Last synced: 23 Oct 2025
https://github.com/kibae/onnxruntime-server
ONNX Runtime Server: The ONNX Runtime Server is a server that provides TCP and HTTP/HTTPS REST APIs for ONNX inference.
ai contributions-welcome cuda deep-learning inference-server machine-learning nueral-networks onnx onnxruntime
Last synced: 05 Apr 2025
https://github.com/vertexclique/orkhon
Orkhon: ML Inference Framework and Server Runtime
async data-parallelism inference-server machine-learning multiprocessing python3 tensorflow
Last synced: 07 Apr 2025
https://github.com/autodeployai/ai-serving
Serving AI/ML models in the open standard formats PMML and ONNX with both HTTP (REST API) and gRPC endpoints
ai-serving inference inference-server onnx onnx-grpc onnx-inference onnx-models onnx-realtime onnx-rest pmml pmml-deployment pmml-grpc pmml-inference pmml-model pmml-realtime pmml-rest
Last synced: 21 Feb 2026
https://github.com/kf5i/k3ai
K3ai is a lightweight, fully automated, AI infrastructure-in-a-box solution that allows anyone to experiment quickly with Kubeflow pipelines. K3ai is perfect for anything from Edge to laptops.
artificial-intelligence datascience edge inference-server k3s kubeflow kubeflow-pipelines kubernetes machinelearning
Last synced: 12 Jul 2025
https://github.com/notai-tech/fastdeploy
Deploy DL/ ML inference pipelines with minimal extra code.
deep-learning docker falcon gevent gunicorn http-server inference-server model-deployment model-serving python pytorch serving streaming-audio tensorflow-serving tf-serving torchserve triton triton-inference-server triton-server websocket
Last synced: 13 Apr 2025
https://github.com/jundot/omlx
LLM inference server with continuous batching & SSD caching for Apple Silicon — managed from the macOS menu bar
apple-silicon inference-server llm macos mlx openai-api
Last synced: 27 May 2026
https://github.com/rubixml/server
A standalone inference server for trained Rubix ML estimators.
api http-server inference inference-engine inference-server infrastructure json-api machine-learning microservice ml-infrastructure model-deployment model-server php php-machine-learning php-ml rest-api rubix-ml rubix-server
Last synced: 30 Jul 2025
https://github.com/curtisgray/wingman
Wingman is the fastest and easiest way to run Llama models on your PC or Mac.
ai chatbot chatgpt download downloader gpu gpu-acceleration gpu-monitoring inference inference-engine inference-server linux llama llamacpp llm local macos openai windows
Last synced: 05 Oct 2025
https://github.com/friendliai/friendli-client
Friendli: the fastest serving engine for generative AI
ai generative-ai gpt gpt3 inference inference-engine inference-server llama2 llm llm-inference llm-ops llm-serving llmops llms mistral ml mlops serving stable-diffusion
Last synced: 05 Apr 2025
https://github.com/tensorchord/inference-benchmark
Benchmark for machine learning model online serving (LLM, embedding, Stable-Diffusion, Whisper)
benchmark inference-server llm stable-diffusion whisper
Last synced: 30 Apr 2025
https://github.com/tommylemon/cvauto
👁 零代码零标注 CV AI 自动化测试工具 🚀 免除大量人工画框和打标签等,直接零代码快速自动化测试 CV 计算机视觉 AI 人工智能图像识别算法:行人检测、动植物分类、人脸识别、OCR 车牌识别、旋转校正、舞蹈姿态、抠图分割 等,还可一键 下载测试报告、导出训练和测试数据集
ai ai-testing apijson classification computer-vision cv cv2 detection face-recognition inference-api inference-server ocr pose-estimation rotation segmentation test-automation ultralytics ultralytics-yolo yolo yolo11
Last synced: 15 Sep 2025
https://github.com/roboflow/inference-dashboard-example
Roboflow's inference server to analyze video streams. This project extracts insights from video frames at defined intervals and generates informative visualizations and CSV outputs.
inference inference-server object-detection predictions
Last synced: 05 May 2025
https://github.com/nglsg/uniapi
The Universal LLM Gateway - Integrate ANY AI Model with One Consistent API
ai ai-tools api-client api-integration api-wrapper chatbot cpp cross-platform cuda gpu-accelerated high-performance http-server inference-server language-model llm llm-integration openai-compatible rest-api universal-api
Last synced: 17 Jun 2025
https://github.com/hec-ovi/vllm-qwen
vLLM + Qwen3.6-27B (BF16) OpenAI-compatible inference server on AMD Strix Halo (Ryzen AI Max+ 395, gfx1151). Vision input, 256K context, /v1/responses with separated reasoning, via TheRock ROCm.
amd docker gfx1151 inference-server llm-serving local-llm multimodal-llm openai-compatible qwen qwen3 rocm ryzen-ai self-hosted strix-halo vllm
Last synced: 01 May 2026
https://github.com/pandruszkow/whisper-inference-server
A networked inference server for Whisper so you don't have to keep waiting for the audio model to reload for the x-hunderdth time.
flask inference-api inference-server python3 whisper-ai
Last synced: 25 Oct 2025
https://github.com/tensorchord/modelz-docs
Modelz is a developer-first platform for prototyping and deploying machine learning models.
generative-ai inference inference-server llm machine-learning modelz serverless
Last synced: 30 Apr 2025
https://github.com/geniusrise/vision
Vision and vision-multi-modal components for geniusrise framework
huggingface inference inference-server mlops multimodal vision
Last synced: 17 Jan 2026
https://github.com/dlzou/computron
Serving distributed deep learning models with model parallel swapping.
deep-learning inference-server model-parallelism
Last synced: 14 Apr 2025
https://github.com/stefanolusardi/tiny_inference_engine
Client/Server system to perform distributed inference on high load systems.
ai cmake conan cpp deep-neural-networks docker grpc inference-client inference-engine inference-server kserve onnxruntime
Last synced: 23 Apr 2025
https://github.com/keith-cy/melix
Local-first AI runtime for Apple Silicon with CLI and macOS operator workflows for LoRA training, benchmarking, and evaluation.
ai-runtime apple-silicon inference-server local-ai local-llm lora lora-training macos mlx mlx-lora model-ops qlora swift
Last synced: 30 Apr 2026
https://github.com/raspoli/mlx-serve
Local inference server for Apple Silicon — hot-swaps MLX models (LLM, vision, embeddings, TTS, STT) via OpenAI API
apple-silicon embeddings fastapi inference-server llm local-inference local-llm machine-learning macos mlx mlx-lm model-serving openai-api openai-compatible python speech-to-text text-to-speech unified-memory vision-language-model
Last synced: 04 Apr 2026
https://github.com/zhangjun/tensorrt-server
TensorRT Server
inference-engine inference-server onnx tensorrt
Last synced: 14 May 2025
https://github.com/zhangjun/infer_server
epoll inference inference-server rpc-server tcp-server threadpool
Last synced: 14 May 2026
https://github.com/cryptojones/macminim2pro_localmodelconfig
Memory-safe, LAN-accessible OpenAI-compatible server for Gemma 4 12B (4-bit MLX) on a 16 GB M2 Pro Mac mini
anthropic-api apple-silicon claude-code gemma inference-server local-llm mac-mini macos mlx omlx qwen3
Last synced: 12 Jun 2026