Projects in Awesome Lists tagged with serving
A curated list of projects in awesome lists tagged with serving .
https://github.com/ray-project/ray
Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
data-science deep-learning deployment distributed hyperparameter-optimization hyperparameter-search large-language-models llm llm-inference llm-serving machine-learning optimization parallel python pytorch ray reinforcement-learning rllib serving tensorflow
Last synced: 09 Sep 2025
https://github.com/tensorflow/serving
A flexible, high-performance serving system for machine learning models
cpp deep-learning deep-neural-networks machine-learning ml neural-network python serving tensorflow
Last synced: 12 May 2025
https://github.com/vespa-engine/vespa
AI + Data, online. https://vespa.ai
ai big-data cpp java machine-learning search-engine server serving serving-recommendation tensorflow vector-search vespa
Last synced: 04 Feb 2026
https://github.com/volcano-sh/volcano
A Cloud Native Batch System (Project under CNCF)
ai batch-systems bigdata gene golang hpc kubernetes machine-learning serving training
Last synced: 31 Jan 2026
https://github.com/seldonio/seldon-core
An MLOps framework to package, deploy, monitor and manage thousands of production machine learning models
aiops deployment kubernetes machine-learning machine-learning-operations mlops production-machine-learning serving
Last synced: 14 May 2025
https://github.com/SeldonIO/seldon-core
An MLOps framework to package, deploy, monitor and manage thousands of production machine learning models
aiops deployment kubernetes machine-learning machine-learning-operations mlops production-machine-learning serving
Last synced: 27 Mar 2025
https://github.com/ahkarami/deep-learning-in-production
In this repository, I will share some useful notes and references about deploying deep learning-based models in production.
angularjs c-plus-plus caffe2 convert-pytorch-models deep-learning deep-neural-networks flask keras model-serving mxnet production python pytorch react rest-api serving serving-pytorch-models tensorflow-models tesnorflow tutorial
Last synced: 14 May 2025
https://github.com/pytorch/serve
Serve, optimize and scale PyTorch models in production
cpu deep-learning docker gpu kubernetes machine-learning metrics mlops optimization pytorch serving
Last synced: 13 May 2025
https://github.com/ahkarami/Deep-Learning-in-Production
In this repository, I will share some useful notes and references about deploying deep learning-based models in production.
angularjs c-plus-plus caffe2 convert-pytorch-models deep-learning deep-neural-networks flask keras model-serving mxnet production python pytorch react rest-api serving serving-pytorch-models tensorflow-models tesnorflow tutorial
Last synced: 14 Mar 2025
https://github.com/Lightning-AI/LitServe
The easiest way to deploy agents, MCP servers, models, RAG, pipelines and more. No MLOps. No YAML.
ai api artificial-intelligence deep-learning developer-tools fastapi rest-api serving web
Last synced: 23 Aug 2025
https://github.com/paddlepaddle/fastdeploy
⚡️An Easy-to-use and Fast Deep Learning Model Deployment Toolkit for ☁️Cloud 📱Mobile and 📹Edge. Including Image, Video, Text and Audio 20+ main stream scenarios and 150+ SOTA models with end-to-end optimization, multi-platform and multi-framework support.
android graphcore intel jetson kunlun object-detection onnx onnxruntime openvino picodet rockchip serving stable-diffusion tensorrt uie yolov5 yolov8
Last synced: 23 Jan 2026
https://github.com/PaddlePaddle/FastDeploy
⚡️An Easy-to-use and Fast Deep Learning Model Deployment Toolkit for ☁️Cloud 📱Mobile and 📹Edge. Including Image, Video, Text and Audio 20+ main stream scenarios and 150+ SOTA models with end-to-end optimization, multi-platform and multi-framework support.
android graphcore intel jetson kunlun object-detection onnx onnxruntime openvino picodet rockchip serving stable-diffusion tensorrt uie yolov5 yolov8
Last synced: 20 Mar 2025
https://github.com/georgia-tech-db/evadb
Database system for AI-powered apps
agent ai auto-gpt chatgpt data-analysis database eva gpt-4 gpt4all hacktoberfest huggingface labeling langchain llm object-detection serving video-analytics
Last synced: 14 May 2025
https://github.com/tobegit3hub/tensorflow_template_application
TensorFlow template application for deep learning
cnn csv deep-learning inference libsvm lstm machine-learning mlp serving tensorboard tensorflow tfrecords wide-and-deep
Last synced: 15 May 2025
https://github.com/ray-project/llm-applications
A comprehensive guide to building RAG-based LLM applications for production.
anyscale fine-tuning llama2 llms machine-learning openai ray serving
Last synced: 11 Apr 2025
https://github.com/Delta-ML/delta
DELTA is a deep learning based natural language and speech processing platform. LF AI & DATA Projects: https://lfaidata.foundation/projects/delta/
asr custom-ops deep-learning emotion-recognition front-end inference nlp nlu ops seq2seq sequence-to-sequence serving speaker-verification speech speech-recognition tensorflow tensorflow-lite tensorflow-serving text-classification text-generation
Last synced: 07 Apr 2025
https://github.com/ray-project/aviary
RayLLM - LLMs on Ray
distributed-systems large-language-models llm llm-inference llm-serving llmops ray serving transformers
Last synced: 12 Jan 2026
https://github.com/ray-project/ray-llm
RayLLM - LLMs on Ray
distributed-systems large-language-models llm llm-inference llm-serving llmops ray serving transformers
Last synced: 25 Feb 2025
https://github.com/paddlepaddle/serving
A flexible, high-performance carrier for machine learning models(『飞桨』服务化部署框架)
dag deep-learning docker gpu micro-service microservice-toolkit online-service paddle paddle-serving pipeline prediction predictor python rpc-service serving
Last synced: 15 May 2025
https://github.com/PaddlePaddle/Serving
A flexible, high-performance carrier for machine learning models(『飞桨』服务化部署框架)
dag deep-learning docker gpu micro-service microservice-toolkit online-service paddle paddle-serving pipeline prediction predictor python rpc-service serving
Last synced: 20 Mar 2025
https://github.com/tobegit3hub/simple_tensorflow_serving
Generic and easy-to-use serving service for machine learning models
client deep-learning http machine-learning savedmodel serving tensorflow tensorflow-models
Last synced: 25 Oct 2025
https://github.com/openvinotoolkit/model_server
A scalable inference server for models optimized with OpenVINO™
ai cloud dag deep-learning edge genai inference kubernetes machine-learning model-serving openvino serving
Last synced: 14 May 2025
https://github.com/underneathall/pinferencia
Python + Inference - Model Deployment library in Python. Simplest model inference server ever.
ai artificial-intelligence computer-vision data-science deep-learning huggingface inference inference-server machine-learning model-deployment model-serving modelserver nlp paddlepaddle predict python pytorch serving tensorflow transformers
Last synced: 08 Oct 2025
https://github.com/polyaxon/haupt
Lineage metadata API, artifacts streams, sandbox, API, and spaces for Polyaxon
bokeh data-processing data-profiling data-science data-visualization deep-learning jupyter lineage machine-learning matplotlib mlops models plotly python pytorch serving tensorflow tracking ui visualization
Last synced: 14 May 2025
https://github.com/vectorch-ai/ScaleLLM
A high-performance inference system for large language models, designed for production environments.
cuda efficiency gpu inference llama llama3 llm llm-inference model performance production serving speculative transformer
Last synced: 09 May 2025
https://github.com/bodywork-ml/bodywork-core
ML pipeline orchestration and model deployments on Kubernetes.
batch cicd continuous-deployment data-science devops framework kubernetes machine-learning mlops orchestration pipeline python serving
Last synced: 19 Apr 2025
https://github.com/vectorch-ai/scalellm
A high-performance inference system for large language models, designed for production environments.
cuda efficiency gpu inference llama llama3 llm llm-inference model performance production serving speculative transformer
Last synced: 14 Apr 2025
https://github.com/Hydrospheredata/hydro-serving
MLOps Platform
machine-learning models pipelines realtime scikit-learn scoring serverless serving spark tensorflow
Last synced: 15 Mar 2025
https://github.com/hydrospheredata/hydro-serving
MLOps Platform
machine-learning models pipelines realtime scikit-learn scoring serverless serving spark tensorflow
Last synced: 16 May 2025
https://github.com/deepjavalibrary/djl-serving
A universal scalable machine learning model deployment solution
deep-learning deployment djl inference pytorch serving
Last synced: 05 Apr 2025
https://github.com/netease-media/grps
Deep Learning Deployment Framework: Supports tf/torch/trt/trtllm/vllm and other NN frameworks. Support dynamic batching, and streaming modes. It is dual-language compatible with Python and C++, offering scalability, extensibility, and high performance. It helps users quickly deploy models and provide services through HTTP/RPC interfaces.
dynamic-batching serving tensorflow tensorrt tensorrt-llm torch triton-inference-server vllm
Last synced: 05 Apr 2025
https://github.com/NetEase-Media/grps
【深度学习模型部署框架】支持tf/torch/trt/trtllm/vllm以及更多nn框架,支持dynamic batching、streaming模式,支持python/c++双语言,可限制,可拓展,高性能。帮助用户快速地将模型部署到线上,并通过http/rpc接口方式提供服务。
dynamic-batching serving tensorflow tensorrt tensorrt-llm torch triton-inference-server vllm
Last synced: 04 Nov 2025
https://github.com/krystianity/keras-serving
bring keras-models to production with tensorflow-serving and nodejs + docker :pizza:
cpp docker grpc keras network neuronal nodejs production python serving tensorflow
Last synced: 17 Mar 2025
https://github.com/clearml/clearml-serving
ClearML - Model-Serving Orchestration and Repository Solution
ai clearml deep-learning devops kubernetes machine-learning mlops model-serving serving serving-ml serving-pytorch-models tensorflow-serving triton triton-inference-server
Last synced: 17 Jun 2025
https://github.com/torchpipe/torchpipe
Serving Inside Pytorch
deployment inference llm-serving pipeline-parallelism pytorch ray serve serving tensorrt torch2trt triton-inference-server
Last synced: 18 Mar 2025
https://github.com/lightning-ai/litserve
Deploy AI models at scale. High-throughput serving engine for AI/ML models that uses the latest state-of-the-art model deployment techniques.
Last synced: 05 Apr 2025
https://github.com/notai-tech/fastdeploy
Deploy DL/ ML inference pipelines with minimal extra code.
deep-learning docker falcon gevent gunicorn http-server inference-server model-deployment model-serving python pytorch serving streaming-audio tensorflow-serving tf-serving torchserve triton triton-inference-server triton-server websocket
Last synced: 13 Apr 2025
https://github.com/balavenkatesh3322/model_deployment
A collection of model deployment library and technique.
aws azure caffe data-science deep-learning keras machine-learning model model-deployment model-server model-serving mxnet neural-network pytorch serving serving-pytorch-models serving-recommendation serving-tensors tensorflow
Last synced: 22 Apr 2025
https://github.com/ai-hypercomputer/gpu-recipes
Recipes for reproducing training and serving benchmarks for large machine learning models using GPUs on Google Cloud.
benchmarks distributed-training google-cloud-platform gpu serving
Last synced: 25 Jun 2025
https://github.com/angel-ml/serving
A stand alone industrial serving system for angel.
machine-learning serving serving-recommendation
Last synced: 30 Apr 2025
https://github.com/aws/sagemaker-sparkml-serving-container
This code is used to build & run a Docker container for performing predictions against a Spark ML Pipeline.
inference inference-pipeline machine-learning mleap mleap-serialized-spark pipeline sagemaker serving spark sparkml
Last synced: 20 Oct 2025
https://github.com/hydrospheredata/spark-ml-serving
Spark ML Lib serving library
inference scoring serving spark
Last synced: 15 Apr 2025
https://github.com/toshi0607/build-your-own-platform-with-knative
Knativeのコンポーネントを理解しながらFaaSプラットフォームをDIYするワークショップです
eventing gcr gke go golang knative knative-lambda-runtimes kubernetes pubsub serverless serving tekton tm watchdog
Last synced: 14 Jan 2026
https://github.com/friendliai/friendli-client
Friendli: the fastest serving engine for generative AI
ai generative-ai gpt gpt3 inference inference-engine inference-server llama2 llm llm-inference llm-ops llm-serving llmops llms mistral ml mlops serving stable-diffusion
Last synced: 05 Apr 2025
https://github.com/deep-diver/lora-deployment
LoRA fine-tuned Stable Diffusion Deployment
generative-ai huggingface-inference-endpoint serving stable-diffusion
Last synced: 05 May 2025
https://github.com/modzy/sdk-python
Python library for Modzy Machine Learning Operations (MLOps) Platform
ai-security api-client deployment docker drift-detection explainable-ai kuberenetes machine-learning machine-learning-operations microservices mlops model-deployment model-serving production-machine-learning python serving
Last synced: 29 Jun 2025
https://github.com/laactechnology/foxcross
AsyncIO serving for data science models
async data-science dataframe http machine-learning pandas python pytorch rest-api scikit-learn serving
Last synced: 24 Oct 2025
https://github.com/daekeun-ml/genai-ko-llm
This hands-on lab walks you through a step-by-step approach to efficiently serving and fine-tuning large-scale Korean models on AWS infrastructure.
fine-tuning genai korean-llm peft sagemaker serving
Last synced: 12 Oct 2025
https://github.com/jeongukjae/lightgbm-serving
A lightweight server for LightGBM
Last synced: 13 Apr 2025
https://github.com/secretflow/serving
SecretFlow-Serving is a serving system for privacy-preserving machine learning models.
federated-learning machine-learning privacy-preserving secure-multiparty-computation serving serving-ml
Last synced: 01 Feb 2026
https://github.com/galileo-galilei/kedro-serving
A kedro-plugin to serve Kedro Pipelines as API
fastapi kedro kedro-plugin mlops model-serving pipeline-serving serving
Last synced: 12 May 2025
https://github.com/ovh/serving-runtime
Exposes a serialized machine learning model through a HTTP API.
hdf5 inference machine-learning onnx serving tensorflow
Last synced: 08 Apr 2025
https://github.com/feast-dev/feast-java-old
Feast Java Components
featurestore machine-learning metadata serving
Last synced: 12 Apr 2025
https://github.com/teachablehub/python-sdk
Python SDK for the TeachableHub's Machine-Learning Deployment Platform
deployment-automation machine-learning machine-learning-library mlops sdk-python serving teachable teachablehub
Last synced: 07 Jul 2025
https://github.com/yas-sim/openvino-model-server-wrapper
Python wrapper class for OpenVINO Model Server. User can submit inference request to OVMS with just a few lines of code.
ai area-intrusion-detection cloud deep-learning edge grpc grpc-client inference intel line-crossing-detection model-serving object-tracking openvino openvino-docker openvino-model-server python serving tensorflow-serving triton-inference-server
Last synced: 01 Aug 2025
https://github.com/hydrospheredata/hydro-serving-pytorch
Pytorch ONNX model serving runtime
Last synced: 09 Nov 2025
https://github.com/sbcd90/machine-learning-rest-server
This project implements a common rest server which can serve tensorflow-serving & xgboost models.
machine-learning proxygen rest serving tensorflow xgboost
Last synced: 13 Jun 2025
https://github.com/zhangjun/tf_serving_client_brpc
tensorflow serving client using brpc
brpc client deep-learning serving tensorflow tensorflow-serving
Last synced: 09 Apr 2025
https://github.com/assassingq/scsyerp-web
智能仓储 VUE前端
java microservices-architecture serving
Last synced: 27 Jun 2025
https://github.com/modzy/sdk-go
The Golang library for Modzy Machine Learning Operations (MLOps) Platform
ai-security api-client api-client-go docker drift-detection explainable-ai golang kubernetes machine-learning-operations microservices mlops model-serving production-machine-learning serving
Last synced: 25 Jan 2026
https://github.com/tataganesh/tf-serving-cnn-example
Basic example of Tensorflow Serving
model-deployment serving tensorflow-serving
Last synced: 31 Jul 2025
https://github.com/hydrospheredata/hydro-serving-cli
CLI for the Hydrosphere.io project.
Last synced: 15 Apr 2025
https://github.com/kozistr/catboost-server-rs
CatBoost server in Rust + gRPC
catboost grpc machine-learning rust server serving
Last synced: 28 Oct 2025
https://github.com/jaketae/image-classifier
Image classifier web application based on MobileNet, built using Flask, TensorFlow, and Matplotlib
flask image-classifier matplotlib serving tensorflow
Last synced: 25 Jun 2025
https://github.com/vjgpt/tensorflow-series
Here you can find how to train Tensorflow ML model on various algorithms and Deploying these model on production.
cnn-keras deployment devops production serving tensorflow tensorflow-models tensorflow-tutorials
Last synced: 29 Jan 2026
https://github.com/sauldoescode/transplacer
it reads files & suspends them in memory for performant serving/access
assetcache cache http2 http2-push memory-database push serving static
Last synced: 11 Jan 2026
https://github.com/siri1404/llm-infrastructure
End-to-end platform for serving Large Language Models with streaming capabilities, privacy-preserving audit trails, drift monitoring, and compliance support for SEC, MiFID II, FINRA, and GDPR standards.
ai audit compliance inference kafka llm mlops python serving
Last synced: 22 Nov 2025
https://github.com/gunh0/ml-dataset-automation-aws
📦 Automated dataset management for ML using Docker containers on AWS
Last synced: 31 Dec 2025
https://github.com/darkmatter18/keras_model_deployment_flask
A Simple way to deploy your tensorflow.keras model using Flask
flask keras serving tensorflow
Last synced: 31 Mar 2025
https://github.com/aslisabanci/xgboost_demo
Demonstrating how to build an XGBoost model and deploy it to Algorithmia, from a Jupyter notebook
algorithmia algorithmia-api inference jupyter-notebook model-serving python sentiment-analysis sentiment-classification serving xgboost xgboost-algorithm xgboost-model xgboost-python
Last synced: 28 Feb 2025
https://github.com/ionboleac/serve-torch-deployments
A proof-of-concept on how to install and use Torchserve in various mode
docker k8s microk8s minikube model-explanation model-management model-predictions python pytorch serving serving-pytorch-models torchserve
Last synced: 18 Apr 2025
https://github.com/mahdidjemaci/production-rag
🔍 Enhance retrieval accuracy with a production-ready RAG system that integrates semantic and lexical search for optimal results.
agents anyscale application deep-learning hacktoberfest langchain large-language-models llm-evaluation llmops ollama open-source openai prompt-engineering retrieval-augmented-generation retrieval-systems search serving typescript
Last synced: 05 Dec 2025