An open API service indexing awesome lists of open source software.

Projects in Awesome Lists tagged with serving

A curated list of projects in awesome lists tagged with serving .

https://github.com/tensorflow/serving

A flexible, high-performance serving system for machine learning models

cpp deep-learning deep-neural-networks machine-learning ml neural-network python serving tensorflow

Last synced: 12 May 2025

https://github.com/volcano-sh/volcano

A Cloud Native Batch System (Project under CNCF)

ai batch-systems bigdata gene golang hpc kubernetes machine-learning serving training

Last synced: 31 Jan 2026

https://github.com/seldonio/seldon-core

An MLOps framework to package, deploy, monitor and manage thousands of production machine learning models

aiops deployment kubernetes machine-learning machine-learning-operations mlops production-machine-learning serving

Last synced: 14 May 2025

https://github.com/SeldonIO/seldon-core

An MLOps framework to package, deploy, monitor and manage thousands of production machine learning models

aiops deployment kubernetes machine-learning machine-learning-operations mlops production-machine-learning serving

Last synced: 27 Mar 2025

https://github.com/pytorch/serve

Serve, optimize and scale PyTorch models in production

cpu deep-learning docker gpu kubernetes machine-learning metrics mlops optimization pytorch serving

Last synced: 13 May 2025

https://github.com/Lightning-AI/LitServe

The easiest way to deploy agents, MCP servers, models, RAG, pipelines and more. No MLOps. No YAML.

ai api artificial-intelligence deep-learning developer-tools fastapi rest-api serving web

Last synced: 23 Aug 2025

https://github.com/paddlepaddle/fastdeploy

⚡️An Easy-to-use and Fast Deep Learning Model Deployment Toolkit for ☁️Cloud 📱Mobile and 📹Edge. Including Image, Video, Text and Audio 20+ main stream scenarios and 150+ SOTA models with end-to-end optimization, multi-platform and multi-framework support.

android graphcore intel jetson kunlun object-detection onnx onnxruntime openvino picodet rockchip serving stable-diffusion tensorrt uie yolov5 yolov8

Last synced: 23 Jan 2026

https://github.com/PaddlePaddle/FastDeploy

⚡️An Easy-to-use and Fast Deep Learning Model Deployment Toolkit for ☁️Cloud 📱Mobile and 📹Edge. Including Image, Video, Text and Audio 20+ main stream scenarios and 150+ SOTA models with end-to-end optimization, multi-platform and multi-framework support.

android graphcore intel jetson kunlun object-detection onnx onnxruntime openvino picodet rockchip serving stable-diffusion tensorrt uie yolov5 yolov8

Last synced: 20 Mar 2025

https://github.com/ray-project/llm-applications

A comprehensive guide to building RAG-based LLM applications for production.

anyscale fine-tuning llama2 llms machine-learning openai ray serving

Last synced: 11 Apr 2025

https://github.com/Delta-ML/delta

DELTA is a deep learning based natural language and speech processing platform. LF AI & DATA Projects: https://lfaidata.foundation/projects/delta/

asr custom-ops deep-learning emotion-recognition front-end inference nlp nlu ops seq2seq sequence-to-sequence serving speaker-verification speech speech-recognition tensorflow tensorflow-lite tensorflow-serving text-classification text-generation

Last synced: 07 Apr 2025

https://github.com/paddlepaddle/serving

A flexible, high-performance carrier for machine learning models(『飞桨』服务化部署框架)

dag deep-learning docker gpu micro-service microservice-toolkit online-service paddle paddle-serving pipeline prediction predictor python rpc-service serving

Last synced: 15 May 2025

https://github.com/PaddlePaddle/Serving

A flexible, high-performance carrier for machine learning models(『飞桨』服务化部署框架)

dag deep-learning docker gpu micro-service microservice-toolkit online-service paddle paddle-serving pipeline prediction predictor python rpc-service serving

Last synced: 20 Mar 2025

https://github.com/tobegit3hub/simple_tensorflow_serving

Generic and easy-to-use serving service for machine learning models

client deep-learning http machine-learning savedmodel serving tensorflow tensorflow-models

Last synced: 25 Oct 2025

https://github.com/openvinotoolkit/model_server

A scalable inference server for models optimized with OpenVINO™

ai cloud dag deep-learning edge genai inference kubernetes machine-learning model-serving openvino serving

Last synced: 14 May 2025

https://github.com/vectorch-ai/ScaleLLM

A high-performance inference system for large language models, designed for production environments.

cuda efficiency gpu inference llama llama3 llm llm-inference model performance production serving speculative transformer

Last synced: 09 May 2025

https://github.com/vectorch-ai/scalellm

A high-performance inference system for large language models, designed for production environments.

cuda efficiency gpu inference llama llama3 llm llm-inference model performance production serving speculative transformer

Last synced: 14 Apr 2025

https://github.com/deepjavalibrary/djl-serving

A universal scalable machine learning model deployment solution

deep-learning deployment djl inference pytorch serving

Last synced: 05 Apr 2025

https://github.com/netease-media/grps

Deep Learning Deployment Framework: Supports tf/torch/trt/trtllm/vllm and other NN frameworks. Support dynamic batching, and streaming modes. It is dual-language compatible with Python and C++, offering scalability, extensibility, and high performance. It helps users quickly deploy models and provide services through HTTP/RPC interfaces.

dynamic-batching serving tensorflow tensorrt tensorrt-llm torch triton-inference-server vllm

Last synced: 05 Apr 2025

https://github.com/NetEase-Media/grps

【深度学习模型部署框架】支持tf/torch/trt/trtllm/vllm以及更多nn框架,支持dynamic batching、streaming模式,支持python/c++双语言,可限制,可拓展,高性能。帮助用户快速地将模型部署到线上,并通过http/rpc接口方式提供服务。

dynamic-batching serving tensorflow tensorrt tensorrt-llm torch triton-inference-server vllm

Last synced: 04 Nov 2025

https://github.com/krystianity/keras-serving

bring keras-models to production with tensorflow-serving and nodejs + docker :pizza:

cpp docker grpc keras network neuronal nodejs production python serving tensorflow

Last synced: 17 Mar 2025

https://github.com/lightning-ai/litserve

Deploy AI models at scale. High-throughput serving engine for AI/ML models that uses the latest state-of-the-art model deployment techniques.

ai api serving

Last synced: 05 Apr 2025

https://github.com/ai-hypercomputer/gpu-recipes

Recipes for reproducing training and serving benchmarks for large machine learning models using GPUs on Google Cloud.

benchmarks distributed-training google-cloud-platform gpu serving

Last synced: 25 Jun 2025

https://github.com/angel-ml/serving

A stand alone industrial serving system for angel.

machine-learning serving serving-recommendation

Last synced: 30 Apr 2025

https://github.com/aws/sagemaker-sparkml-serving-container

This code is used to build & run a Docker container for performing predictions against a Spark ML Pipeline.

inference inference-pipeline machine-learning mleap mleap-serialized-spark pipeline sagemaker serving spark sparkml

Last synced: 20 Oct 2025

https://github.com/hydrospheredata/spark-ml-serving

Spark ML Lib serving library

inference scoring serving spark

Last synced: 15 Apr 2025

https://github.com/toshi0607/build-your-own-platform-with-knative

Knativeのコンポーネントを理解しながらFaaSプラットフォームをDIYするワークショップです

eventing gcr gke go golang knative knative-lambda-runtimes kubernetes pubsub serverless serving tekton tm watchdog

Last synced: 14 Jan 2026

https://github.com/daekeun-ml/genai-ko-llm

This hands-on lab walks you through a step-by-step approach to efficiently serving and fine-tuning large-scale Korean models on AWS infrastructure.

fine-tuning genai korean-llm peft sagemaker serving

Last synced: 12 Oct 2025

https://github.com/jeongukjae/lightgbm-serving

A lightweight server for LightGBM

lightgbm ml-serving serving

Last synced: 13 Apr 2025

https://github.com/secretflow/serving

SecretFlow-Serving is a serving system for privacy-preserving machine learning models.

federated-learning machine-learning privacy-preserving secure-multiparty-computation serving serving-ml

Last synced: 01 Feb 2026

https://github.com/galileo-galilei/kedro-serving

A kedro-plugin to serve Kedro Pipelines as API

fastapi kedro kedro-plugin mlops model-serving pipeline-serving serving

Last synced: 12 May 2025

https://github.com/ovh/serving-runtime

Exposes a serialized machine learning model through a HTTP API.

hdf5 inference machine-learning onnx serving tensorflow

Last synced: 08 Apr 2025

https://github.com/teachablehub/python-sdk

Python SDK for the TeachableHub's Machine-Learning Deployment Platform

deployment-automation machine-learning machine-learning-library mlops sdk-python serving teachable teachablehub

Last synced: 07 Jul 2025

https://github.com/hydrospheredata/hydro-serving-pytorch

Pytorch ONNX model serving runtime

python pytorch serving

Last synced: 09 Nov 2025

https://github.com/sbcd90/machine-learning-rest-server

This project implements a common rest server which can serve tensorflow-serving & xgboost models.

machine-learning proxygen rest serving tensorflow xgboost

Last synced: 13 Jun 2025

https://github.com/tataganesh/tf-serving-cnn-example

Basic example of Tensorflow Serving

model-deployment serving tensorflow-serving

Last synced: 31 Jul 2025

https://github.com/ahmadalsharef994/flask_mistral_mlops_example

Demonstrates how to deploy a Flask-based API for LLM inference using Mistral models, containerized with Docker for MLOps workflows.

docker flask inference llm mistral mlops nlp serving

Last synced: 14 Jul 2025

https://github.com/hydrospheredata/hydro-serving-cli

CLI for the Hydrosphere.io project.

cli hydrosphere serving yaml

Last synced: 15 Apr 2025

https://github.com/kozistr/catboost-server-rs

CatBoost server in Rust + gRPC

catboost grpc machine-learning rust server serving

Last synced: 28 Oct 2025

https://github.com/jaketae/image-classifier

Image classifier web application based on MobileNet, built using Flask, TensorFlow, and Matplotlib

flask image-classifier matplotlib serving tensorflow

Last synced: 25 Jun 2025

https://github.com/vjgpt/tensorflow-series

Here you can find how to train Tensorflow ML model on various algorithms and Deploying these model on production.

cnn-keras deployment devops production serving tensorflow tensorflow-models tensorflow-tutorials

Last synced: 29 Jan 2026

https://github.com/sauldoescode/transplacer

it reads files & suspends them in memory for performant serving/access

assetcache cache http2 http2-push memory-database push serving static

Last synced: 11 Jan 2026

https://github.com/siri1404/llm-infrastructure

End-to-end platform for serving Large Language Models with streaming capabilities, privacy-preserving audit trails, drift monitoring, and compliance support for SEC, MiFID II, FINRA, and GDPR standards.

ai audit compliance inference kafka llm mlops python serving

Last synced: 22 Nov 2025

https://github.com/gunh0/ml-dataset-automation-aws

📦 Automated dataset management for ML using Docker containers on AWS

automation docker serving

Last synced: 31 Dec 2025

https://github.com/darkmatter18/keras_model_deployment_flask

A Simple way to deploy your tensorflow.keras model using Flask

flask keras serving tensorflow

Last synced: 31 Mar 2025

https://github.com/mahdidjemaci/production-rag

🔍 Enhance retrieval accuracy with a production-ready RAG system that integrates semantic and lexical search for optimal results.

agents anyscale application deep-learning hacktoberfest langchain large-language-models llm-evaluation llmops ollama open-source openai prompt-engineering retrieval-augmented-generation retrieval-systems search serving typescript

Last synced: 05 Dec 2025