An open API service indexing awesome lists of open source software.

Projects in Awesome Lists tagged with inference-server

A curated list of projects in awesome lists tagged with inference-server .

https://github.com/containers/ramalama

The goal of RamaLama is to make working with AI boring.

ai containers inference-server llamacpp llm podman vllm

Last synced: 04 Feb 2026

https://github.com/michael-a-kuykendall/shimmy

⚡ Python-free Rust inference server — OpenAI-API compatible. GGUF + SafeTensors, hot model swap, auto-discovery, single binary. FREE now, FREE forever.

api-server command-line-tool developer-tools gguf huggingface huggingface-models huggingface-transformers inference-server llama llamacpp llm-inference local-ai lora machine-learning ollama-api openai-compatible rust rust-crate transformers

Last synced: 13 Sep 2025

https://github.com/containers/podman-desktop-extension-ai-lab

Work with LLMs on a local environment using containers

ai containers inference-server llms local podman

Last synced: 16 May 2025

https://github.com/kibae/onnxruntime-server

ONNX Runtime Server: The ONNX Runtime Server is a server that provides TCP and HTTP/HTTPS REST APIs for ONNX inference.

ai contributions-welcome cuda deep-learning inference-server machine-learning nueral-networks onnx onnxruntime

Last synced: 05 Apr 2025

https://github.com/vertexclique/orkhon

Orkhon: ML Inference Framework and Server Runtime

async data-parallelism inference-server machine-learning multiprocessing python3 tensorflow

Last synced: 07 Apr 2025

https://github.com/autodeployai/ai-serving

Serving AI/ML models in the open standard formats PMML and ONNX with both HTTP (REST API) and gRPC endpoints

ai-serving inference inference-server onnx onnx-grpc onnx-inference onnx-models onnx-realtime onnx-rest pmml pmml-deployment pmml-grpc pmml-inference pmml-model pmml-realtime pmml-rest

Last synced: 21 Feb 2026

https://github.com/kf5i/k3ai

K3ai is a lightweight, fully automated, AI infrastructure-in-a-box solution that allows anyone to experiment quickly with Kubeflow pipelines. K3ai is perfect for anything from Edge to laptops.

artificial-intelligence datascience edge inference-server k3s kubeflow kubeflow-pipelines kubernetes machinelearning

Last synced: 12 Jul 2025

https://github.com/jundot/omlx

LLM inference server with continuous batching & SSD caching for Apple Silicon — managed from the macOS menu bar

apple-silicon inference-server llm macos mlx openai-api

Last synced: 27 May 2026

https://github.com/tensorchord/inference-benchmark

Benchmark for machine learning model online serving (LLM, embedding, Stable-Diffusion, Whisper)

benchmark inference-server llm stable-diffusion whisper

Last synced: 30 Apr 2025

https://github.com/tommylemon/cvauto

👁 零代码零标注 CV AI 自动化测试工具 🚀 免除大量人工画框和打标签等,直接零代码快速自动化测试 CV 计算机视觉 AI 人工智能图像识别算法:行人检测、动植物分类、人脸识别、OCR 车牌识别、旋转校正、舞蹈姿态、抠图分割 等,还可一键 下载测试报告、导出训练和测试数据集

ai ai-testing apijson classification computer-vision cv cv2 detection face-recognition inference-api inference-server ocr pose-estimation rotation segmentation test-automation ultralytics ultralytics-yolo yolo yolo11

Last synced: 15 Sep 2025

https://github.com/roboflow/inference-dashboard-example

Roboflow's inference server to analyze video streams. This project extracts insights from video frames at defined intervals and generates informative visualizations and CSV outputs.

inference inference-server object-detection predictions

Last synced: 05 May 2025

https://github.com/hec-ovi/vllm-qwen

vLLM + Qwen3.6-27B (BF16) OpenAI-compatible inference server on AMD Strix Halo (Ryzen AI Max+ 395, gfx1151). Vision input, 256K context, /v1/responses with separated reasoning, via TheRock ROCm.

amd docker gfx1151 inference-server llm-serving local-llm multimodal-llm openai-compatible qwen qwen3 rocm ryzen-ai self-hosted strix-halo vllm

Last synced: 01 May 2026

https://github.com/pandruszkow/whisper-inference-server

A networked inference server for Whisper so you don't have to keep waiting for the audio model to reload for the x-hunderdth time.

flask inference-api inference-server python3 whisper-ai

Last synced: 25 Oct 2025

https://github.com/tensorchord/modelz-docs

Modelz is a developer-first platform for prototyping and deploying machine learning models.

generative-ai inference inference-server llm machine-learning modelz serverless

Last synced: 30 Apr 2025

https://github.com/geniusrise/vision

Vision and vision-multi-modal components for geniusrise framework

huggingface inference inference-server mlops multimodal vision

Last synced: 17 Jan 2026

https://github.com/dlzou/computron

Serving distributed deep learning models with model parallel swapping.

deep-learning inference-server model-parallelism

Last synced: 14 Apr 2025

https://github.com/stefanolusardi/tiny_inference_engine

Client/Server system to perform distributed inference on high load systems.

ai cmake conan cpp deep-neural-networks docker grpc inference-client inference-engine inference-server kserve onnxruntime

Last synced: 23 Apr 2025

https://github.com/keith-cy/melix

Local-first AI runtime for Apple Silicon with CLI and macOS operator workflows for LoRA training, benchmarking, and evaluation.

ai-runtime apple-silicon inference-server local-ai local-llm lora lora-training macos mlx mlx-lora model-ops qlora swift

Last synced: 30 Apr 2026

https://github.com/cryptojones/macminim2pro_localmodelconfig

Memory-safe, LAN-accessible OpenAI-compatible server for Gemma 4 12B (4-bit MLX) on a 16 GB M2 Pro Mac mini

anthropic-api apple-silicon claude-code gemma inference-server local-llm mac-mini macos mlx omlx qwen3

Last synced: 12 Jun 2026