Projects in Awesome Lists tagged with model-serving
A curated list of projects in awesome lists tagged with model-serving .
https://github.com/vllm-project/vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
amd cuda deepseek gpt hpu inference inferentia llama llm llm-serving llmops mlops model-serving pytorch qwen rocm tpu trainium transformer xpu
Last synced: 29 Jan 2026
https://github.com/bentoml/bentoml
The easiest way to serve AI apps and models - Build Model Inference APIs, Job queues, LLM apps, Multi-model pipelines, and more!
ai-inference deep-learning generative-ai inference-platform llm llm-inference llm-serving llmops machine-learning ml-engineering mlops model-inference-service model-serving multimodal python
Last synced: 06 Mar 2026
https://github.com/bentoml/BentoML
The easiest way to serve AI apps and models - Build Model Inference APIs, Job queues, LLM apps, Multi-model pipelines, and much more!
ai-inference deep-learning generative-ai inference-platform llm llm-inference llm-serving llmops machine-learning ml-engineering mlops model-inference-service model-serving multimodal python
Last synced: 12 Mar 2025
https://github.com/ahkarami/deep-learning-in-production
In this repository, I will share some useful notes and references about deploying deep learning-based models in production.
angularjs c-plus-plus caffe2 convert-pytorch-models deep-learning deep-neural-networks flask keras model-serving mxnet production python pytorch react rest-api serving serving-pytorch-models tensorflow-models tesnorflow tutorial
Last synced: 14 May 2025
https://github.com/beclab/olares
Olares: An Open-Source Personal Cloud to Reclaim Your Data
ai-agents ai-privacy edge-ai home-automation home-cloud home-server homelab homeserver kubernetes local-ai mcp model-serving personal-cloud self-hosted
Last synced: 16 Apr 2026
https://github.com/ahkarami/Deep-Learning-in-Production
In this repository, I will share some useful notes and references about deploying deep learning-based models in production.
angularjs c-plus-plus caffe2 convert-pytorch-models deep-learning deep-neural-networks flask keras model-serving mxnet production python pytorch react rest-api serving serving-pytorch-models tensorflow-models tesnorflow tutorial
Last synced: 14 Mar 2025
https://github.com/FedML-AI/FedML
FEDML - The unified and scalable ML library for large-scale distributed training, model serving, and federated learning. FEDML Launch, a cross-cloud scheduler, further enables running any AI jobs on any GPU cloud or on-premise cluster. Built on this library, TensorOpera AI (https://TensorOpera.ai) is your generative AI platform at scale.
ai-agent deep-learning distributed-training edge-ai federated-learning inference-engine machine-learning mlops model-deployment model-serving on-device-training
Last synced: 04 Apr 2025
https://github.com/kserve/kserve
Standardized Serverless ML Inference Platform on Kubernetes
artificial-intelligence genai hacktoberfest istio k8s knative kserve kubeflow kubernetes llm-inference machine-learning mlops model-interpretability model-serving pytorch service-mesh sklearn tensorflow xgboost
Last synced: 14 Mar 2026
https://github.com/fedml-ai/fedml
FEDML - The unified and scalable ML library for large-scale distributed training, model serving, and federated learning. FEDML Launch, a cross-cloud scheduler, further enables running any AI jobs on any GPU cloud or on-premise cluster. Built on this library, TensorOpera AI (https://TensorOpera.ai) is your generative AI platform at scale.
ai-agent deep-learning distributed-training edge-ai federated-learning inference-engine machine-learning mlops model-deployment model-serving on-device-training
Last synced: 08 May 2025
https://github.com/modeltc/lightllm
LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance.
deep-learning gpt llama llm model-serving nlp openai-triton
Last synced: 13 May 2025
https://github.com/predibase/lorax
Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs
fine-tuning gpt llama llm llm-inference llm-serving llmops lora model-serving pytorch transformers
Last synced: 12 May 2025
https://github.com/HuaizhengZhang/Awesome-System-for-Machine-Learning
🚀 Awesome System for Machine Learning ⚡️ AI System Papers and Industry Practice. ⚡️ System for Machine Learning, LLM (Large Language Model), GenAI (Generative AI). 🍻 OSDI, NSDI, SIGCOMM, SoCC, MLSys, etc. 🗃️ Llama3, Mistral, etc. 🧑💻 Video Tutorials.
ai-infra genai large-language-models llmsys mlsys model-serving model-training
Last synced: 09 Apr 2025
https://github.com/ModelTC/lightllm
LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance.
deep-learning gpt llama llm model-serving nlp openai-triton
Last synced: 20 Mar 2025
https://github.com/tensorchord/envd
🏕️ Reproducible development environment
buildkit developer-tools development-environment docker hacktoberfest llmops mlops mlops-workflow model-serving
Last synced: 03 Oct 2025
https://github.com/microsoft/aici
AICI: Prompts as (Wasm) Programs
ai inference language-model llm llm-framework llm-inference llm-serving llmops model-serving rust transformer wasm wasmtime
Last synced: 14 May 2025
https://github.com/beclab/Olares
Olares: An Open-Source Sovereign Cloud OS for Local AI
ai-agents ai-privacy edge-ai home-automation homelab homeserver kubernetes local-ai mcp model-serving nas self-hosted
Last synced: 02 May 2025
https://github.com/mlrun/mlrun
MLRun is an open source MLOps platform for quickly building and managing continuous ML applications across their lifecycle. MLRun integrates into your development and CI/CD environment and automates the delivery of production data, ML pipelines, and online applications.
data-engineering data-science experiment-tracking kubernetes machine-learning mlops mlops-workflow model-serving python workflow
Last synced: 18 Feb 2026
https://github.com/logicalclocks/hopsworks
Hopsworks - Data-Intensive AI platform with a Feature Store
aws azure data-science feature-engineering feature-management feature-store gcp governance hopsworks kserve machine-learning ml mlops model-serving pyspark python serverless
Last synced: 14 May 2025
https://github.com/basetenlabs/truss
The simplest way to serve AI/ML models in production
artificial-intelligence easy-to-use falcon inference-api inference-server machine-learning model-serving open-source packaging stable-diffusion whisper wizardlm
Last synced: 02 Apr 2026
https://github.com/zhihu/zhilight
A highly optimized LLM inference acceleration engine for Llama and its variants.
cuda deepseek-r1 gpt inference-engine llama llm llm-inference llm-serving model-serving pytorch
Last synced: 15 May 2025
https://github.com/alibaba/rtp-llm
RTP-LLM: Alibaba's high-performance LLM inference engine for diverse applications.
gpt inference llama llm llm-serving llmops model-serving
Last synced: 14 Oct 2025
https://github.com/mosecorg/mosec
A high-performance ML model serving framework, offers dynamic batching and CPU/GPU pipelines to fully exploit your compute machine
cv deep-learning gpu hacktoberfest jax llm llm-serving machine-learning machine-learning-platform mlops model-serving mxnet nerual-network python pytorch rust tensorflow tts
Last synced: 14 May 2025
https://github.com/bentoml/yatai
Model Deployment at Scale on Kubernetes 🦄️
bentoml k8s kubernetes machine-learning mlops model-deployment model-serving
Last synced: 16 May 2025
https://github.com/bentoml/Yatai
Model Deployment at Scale on Kubernetes 🦄️
bentoml k8s kubernetes machine-learning mlops model-deployment model-serving
Last synced: 24 Mar 2025
https://github.com/kitops-ml/kitops
An open source DevOps tool for packaging and versioning AI/ML models, datasets, code, and configuration into an OCI artifact.
ai code datasets devops devops-tools gguf hacktoberfest kubernetes kubernetes-deployment ml mlops mlops-tools model-interpretability model-serving models opensource platform-engineering pytorch sklearn tensorflow
Last synced: 15 May 2025
https://github.com/vllm-project/vllm-ascend
Community maintained hardware plugin for vLLM on Ascend
ascend inference llm llm-serving llmops mlops model-serving transformer vllm
Last synced: 27 Feb 2026
https://github.com/efeslab/nanoflow
A throughput-oriented high-performance serving framework for LLMs
cuda inference llama2 llm llm-serving model-serving
Last synced: 16 May 2025
https://github.com/efeslab/Nanoflow
A throughput-oriented high-performance serving framework for LLMs
cuda inference llama2 llm llm-serving model-serving
Last synced: 21 Apr 2025
https://github.com/kitops-ml/kitops?tab=readme-ov-file
An open source DevOps tool for packaging and versioning AI/ML models, datasets, code, and configuration into an OCI artifact.
ai code datasets devops devops-tools gguf hacktoberfest kubernetes kubernetes-deployment ml mlops mlops-tools model-interpretability model-serving models opensource platform-engineering pytorch sklearn tensorflow
Last synced: 28 Apr 2025
https://github.com/openvinotoolkit/model_server
A scalable inference server for models optimized with OpenVINO™
ai cloud dag deep-learning edge genai inference kubernetes machine-learning model-serving openvino serving
Last synced: 14 May 2025
https://github.com/jozu-ai/kitops
An open source DevOps tool for packaging and versioning AI/ML models, datasets, code, and configuration into an OCI artifact.
ai code datasets devops devops-tools gguf hacktoberfest kubernetes kubernetes-deployment ml mlops mlops-tools model-interpretability model-serving models opensource platform-engineering pytorch sklearn tensorflow
Last synced: 16 Mar 2025
https://github.com/underneathall/pinferencia
Python + Inference - Model Deployment library in Python. Simplest model inference server ever.
ai artificial-intelligence computer-vision data-science deep-learning huggingface inference inference-server machine-learning model-deployment model-serving modelserver nlp paddlepaddle predict python pytorch serving tensorflow transformers
Last synced: 08 Oct 2025
https://github.com/ServerlessLLM/ServerlessLLM
Serverless LLM Serving for Everyone.
cuda huggingface-transformers large-language-models model-as-a-service model-serving pytorch serverless-inference
Last synced: 07 May 2025
https://github.com/eightBEC/fastapi-ml-skeleton
FastAPI Skeleton App to serve machine learning models production-ready.
fastapi machine-learning model-serving python python3
Last synced: 15 Mar 2025
https://github.com/serverlessllm/serverlessllm
Serverless LLM Serving for Everyone.
cuda huggingface-transformers large-language-models model-as-a-service model-serving pytorch serverless-inference
Last synced: 15 May 2025
https://github.com/sgl-project/ome
Open Model Engine (OME) — Kubernetes operator for LLM serving, GPU scheduling, and model lifecycle management. Works with SGLang, vLLM, TensorRT-LLM, and Triton
deepseek k8s kimi-k2 llama llm llm-inference model-as-a-service model-serving multi-node-kubernetes oracle-cloud pd-disaggregation qwen sglang vllm
Last synced: 17 Mar 2026
https://github.com/bentoml/BentoDiffusion
BentoDiffusion: A collection of diffusion models served with BentoML
ai diffusion-models fine-tuning kubernetes lora model-serving stable-diffusion
Last synced: 17 Aug 2025
https://github.com/bentoml/bentodiffusion
BentoDiffusion: A collection of diffusion models served with BentoML
ai diffusion-models fine-tuning kubernetes lora model-serving stable-diffusion
Last synced: 16 May 2025
https://github.com/ai-hypercomputer/jetstream
JetStream is a throughput and memory optimized engine for LLM inference on XLA devices, starting with TPUs (and GPUs in future -- PRs welcome).
gemma gpt gpu inference jax large-language-models llama llama2 llm llm-inference llmops mlops model-serving pytorch tpu transformer
Last synced: 23 Oct 2025
https://github.com/AI-Hypercomputer/JetStream
JetStream is a throughput and memory optimized engine for LLM inference on XLA devices, starting with TPUs (and GPUs in future -- PRs welcome).
gemma gpt gpu inference jax large-language-models llama llama2 llm llm-inference llmops mlops model-serving pytorch tpu transformer
Last synced: 31 Mar 2025
https://github.com/aniketmaurya/chitra
A multi-functional library for full-stack Deep Learning. Simplifies Model Building, API development, and Model Deployment.
bounding-boxes deep-learning fastapi gradcam hacktoberfest image-classification image-dataset image-processing machine-learning mlops model-deployment model-interpretation model-serving model-visualization object-detection python pytorch tensorflow visualization
Last synced: 15 May 2025
https://github.com/lightbend/kafka-with-akka-streams-kafka-streams-tutorial
Code samples for the Lightbend tutorial on writing microservices with Akka Streams, Kafka Streams, and Kafka
akka kafka-streams model-serving
Last synced: 02 May 2025
https://github.com/clearml/clearml-serving
ClearML - Model-Serving Orchestration and Repository Solution
ai clearml deep-learning devops kubernetes machine-learning mlops model-serving serving serving-ml serving-pytorch-models tensorflow-serving triton triton-inference-server
Last synced: 17 Jun 2025
https://github.com/FederatedAI/FATE-Serving
A scalable, high-performance serving system for federated learning models
federated-learning inference model-serving model-versioning monitor
Last synced: 16 Nov 2025
https://github.com/bentoml/gallery
BentoML Example Projects 🎨
aws-lambda aws-sagemaker azure-machine-learning bentoml data-science gallery gcp-cloud-functions machine-learning machine-learning-library machine-learning-workflow model-deployment model-management model-serving serverless
Last synced: 04 Feb 2026
https://github.com/project-monai/monai-deploy-app-sdk
MONAI Deploy App SDK offers a framework and associated tools to design, develop and verify AI-driven applications in the healthcare imaging domain.
ai deep-learning deploy dicom healthcare image-processing machine-learning medical-imaging ml ml-infrastructure ml-platform mlops model-deployment model-serving monai pipeline python pytorch workflow
Last synced: 16 May 2025
https://github.com/alvarobartt/serving-pytorch-models
Serving PyTorch models with TorchServe :fire:
image-classification machine-learning mlops model-deployment model-serving pytorch pytorch-cnn serve-pytorch torchserve
Last synced: 12 Apr 2025
https://github.com/notai-tech/fastdeploy
Deploy DL/ ML inference pipelines with minimal extra code.
deep-learning docker falcon gevent gunicorn http-server inference-server model-deployment model-serving python pytorch serving streaming-audio tensorflow-serving tf-serving torchserve triton triton-inference-server triton-server websocket
Last synced: 13 Apr 2025
https://github.com/nimbleboxai/nbox
The official python package for NimbleBox. Exposes all APIs as CLIs and contains modules to make ML 🌸
data-science machine-learning ml-infrastructure ml-platform ml-service mlops mlops-automation mlops-pipeline mlops-tool mlops-workflow model-deployment model-management model-monitoring model-serving practical-mlops
Last synced: 14 Dec 2025
https://github.com/kspviswa/pyomlx
A wannabe Ollama equivalent for Apple MlX models
Last synced: 05 Mar 2026
https://github.com/aporia-ai/inferencedb
🚀 Stream inferences of real-time ML models in production to any data lake (Experimental)
kafka machine-learning mlops model-monitoring model-serving s3
Last synced: 30 Apr 2025
https://github.com/balavenkatesh3322/model_deployment
A collection of model deployment library and technique.
aws azure caffe data-science deep-learning keras machine-learning model model-deployment model-server model-serving mxnet neural-network pytorch serving serving-pytorch-models serving-recommendation serving-tensors tensorflow
Last synced: 22 Apr 2025
https://github.com/thu-pacman/chitu
High-performance inference framework for large language models, focusing on efficiency, flexibility, and availability.
deepseek gpu llm llm-serving model-serving pytorch
Last synced: 17 Mar 2025
https://github.com/alibaba/servegen
A framework for generating realistic LLM serving workloads
deepseek llm llm-serving model-serving qwen
Last synced: 14 Oct 2025
https://github.com/kspviswa/PyOMlx
A wannabe Ollama equivalent for Apple MlX models
Last synced: 10 Apr 2025
https://github.com/ai-hypercomputer/jetstream-pytorch
PyTorch/XLA integration with JetStream (https://github.com/google/JetStream) for LLM inference"
attention batching gemma inference llama llama2 llm llm-inference model-serving pytorch tpu
Last synced: 27 Oct 2025
https://github.com/messense/fasttext-serving
fastText model serving service
fasttext model-server model-serving nlp
Last synced: 04 Apr 2025
https://github.com/bentoml/clip-api-service
CLIP as a service - Embed image and sentences, object recognition, visual reasoning, image classification and reverse image search
ai-applications clip cloud-native mlops model-inference model-inference-service model-serving openai-clip
Last synced: 04 May 2025
https://github.com/bentoml/BentoOCR
Turn any OCR models into online inference API endpoint 🚀 🌖
ai-applications model-deployment model-serving ocr ocr-python
Last synced: 04 May 2025
https://github.com/bentoml/transformers-nlp-service
Online Inference API for NLP Transformer models - summarization, text classification, sentiment analysis and more
llm llmops mlops model-deployment model-inference-service model-serving nlp nlp-machine-learning online-inference transformer
Last synced: 04 May 2025
https://github.com/alvarobartt/serving-tensorflow-models
Serving TensorFlow models with TensorFlow Serving :orange_book:
image-classification machine-learning mlops model-deployment model-serving serve-tensorflow-models tensorflow tensorflow-serving
Last synced: 12 Apr 2025
https://github.com/instill-ai/console
📺 Instill Console for 🔮 Instill Core: https://github.com/instill-ai/instill-core
computer-vision console data-connector data-pipeline deep-learning frontend image-classification model-serving no-code object-detection structured-data ui unstructured-data vdp versatile-data-pipeline vision-ai
Last synced: 01 Mar 2026
https://github.com/galileo-galilei/kedro-mlflow-tutorial
A tutorial on how to use kedro-mlflow plugin (https://github.com/Galileo-Galilei/kedro-mlflow) to synchronize training and inference and serve kedro pipeline
kedro kedro-mlflow kedro-tutorial mlflow mlops model-serving
Last synced: 12 May 2025
https://github.com/lightbend/kubeflow-recommender
Kubeflow example of machine learning/model serving
kubeflow machine-learning model-serving
Last synced: 02 May 2025
https://github.com/ml-libs/mlserve
mlserve turns your python models into RESTful API, serves web page with form generated to match your input data.
machine-learning mlserve model-deployment model-serving scikit-learn
Last synced: 28 Jan 2026
https://github.com/modzy/sdk-python
Python library for Modzy Machine Learning Operations (MLOps) Platform
ai-security api-client deployment docker drift-detection explainable-ai kuberenetes machine-learning machine-learning-operations microservices mlops model-deployment model-serving production-machine-learning python serving
Last synced: 29 Jun 2025
https://github.com/animator/titus2
Titus 2 : Portable Format for Analytics (PFA) implementation for Python 3.4+
analytics inference inference-engine ml-engine model-deployment model-evaluation model-serving pfa pfa-standard pmml python scoring scoring-engine titus
Last synced: 09 Mar 2026
https://github.com/a2i2/surround
Surround is a framework for building AI driven microservices in Python, https://surround.readthedocs.io/en/latest/
data-science machine-learning model-serving pipeline-framework python
Last synced: 14 Jan 2026
https://github.com/bentoml/fraud-detection-model-serving
Online model serving with Fraud Detection model trained with XGBoost on IEEE-CIS dataset
ai-applications fraud-detection model-deployment model-serving
Last synced: 07 Aug 2025
https://github.com/h2oai/mlops-dai-runtimes
Production ready templates for deploying Driverless AI (DAI) scorers. https://h2oai.github.io/dai-deployment-templates/
h2o h2oai machine-learning model-deployment model-server model-serving mojo
Last synced: 07 Apr 2025
https://github.com/modzy/sdk-javascript
The official JavaScript SDK for the Modzy Machine Learning Operations (MLOps) Platform.
ai-security api-client api-rest drift-detection explainable-ai javascript kubernetes machine-learning machine-learning-operations microservices mlops model-serving production-machine-learning
Last synced: 17 Mar 2026
https://github.com/ksm26/efficiently-serving-llms
Learn the ins and outs of efficiently serving Large Language Models (LLMs). Dive into optimization techniques, including KV caching and Low Rank Adapters (LoRA), and gain hands-on experience with Predibase’s LoRAX framework inference server.
batch-processing deep-learning-techniques inference-optimization large-scale-deployment machine-learning-operations model-acceleration model-inference-service model-serving optimization-techniques performance-enhancement scalability-strategies server-optimization serving-infrastructure text-generation
Last synced: 02 Aug 2025
https://github.com/galileo-galilei/kedro-serving
A kedro-plugin to serve Kedro Pipelines as API
fastapi kedro kedro-plugin mlops model-serving pipeline-serving serving
Last synced: 12 May 2025
https://github.com/bentoml/diffusers-examples
API serving for your diffusers models
bentoml diffusers model-deployment model-serving
Last synced: 22 Jul 2025
https://github.com/rishit-dagli/tfserving-demos
TF Serving demos
jupyter-notebook model-serving python3 tensorflow tensorflow-model-server tensorflow-serving tensorflow2
Last synced: 07 May 2025
https://github.com/kristofferv98/whisper_turboapi
An optimized FastAPI server for OpenAI's Whisper whisper-large-v3-turbo model using MLX optimization
ai api asynchronous audio audio-processing fastapi huggingface machine-learning macos mlx model-serving nlp openai optimization python speech-to-text synchronous transcription whisper whisper-turbo
Last synced: 12 May 2025
https://github.com/yas-sim/openvino-model-server-wrapper
Python wrapper class for OpenVINO Model Server. User can submit inference request to OVMS with just a few lines of code.
ai area-intrusion-detection cloud deep-learning edge grpc grpc-client inference intel line-crossing-detection model-serving object-tracking openvino openvino-docker openvino-model-server python serving tensorflow-serving triton-inference-server
Last synced: 01 Aug 2025
https://github.com/adrien-legros/rhods-mnist
Data science pipelines and model serving using Red Hat OpenShift Data Science
data-science model-serving openshift-ai pipelines redhat rhoai rhods
Last synced: 17 Jan 2026
https://github.com/logicalclocks/machine-learning-api
Hopsworks Machine Learning Api 🚀 Model management with a model registry and model serving
Last synced: 09 Apr 2025
https://github.com/peva3/smarterrouter
SmarterRouter: An intelligent LLM gateway and VRAM-aware router for Ollama, llama.cpp, and OpenAI. Features semantic caching, model profiling, and automatic failover for local AI labs.
ai-cache ai-gateway docker fastapi gpu-monitoring llm llm-proxy llm-router local-llm model-serving ollama ollama-api openai-proxy self-hosted self-hosted-ai semantic-cache
Last synced: 27 Feb 2026
https://github.com/riccorl/ner-serve
Simple NER model using Docker, FastAPI, ONNX and Multilingual Mini-LM.
backend deep-learning fastapi huggingface huggingface-transformers model-serving named-entity-recognition natural-language-processing ner nlp onnx onnxruntime pytorch transformers
Last synced: 05 Aug 2025
https://github.com/alibaba/aiopsserving
Open source code for AIOpsServing
ai-ops alicloud-compatible machine-learning mlflow-compatible model-benchmarking model-serving
Last synced: 14 Oct 2025
https://github.com/unaidedelf8777/faster-outlines
A Lazy, high throughput and blazing fast structured text generation backend.
ai llama llm llm-serving llmops model-serving performance transformer
Last synced: 27 Jun 2025
https://github.com/rapidrabbit76/fastapi-deep-learning-model-micro-batching-serving
FastAPI pytorch model serving with micro batching
Last synced: 24 Jun 2025
https://github.com/zerohertz/yolo-serving-cookbook
📸 YOLO Serving Cookbook based on Triton Inference Server 📸
docker docker-compose fastapi gradio k8s kubernetes mlops model-serving onnx pytorch triton-inference-server yolo yolov5
Last synced: 18 Mar 2025
https://github.com/modzy/sdk-go
The Golang library for Modzy Machine Learning Operations (MLOps) Platform
ai-security api-client api-client-go docker drift-detection explainable-ai golang kubernetes machine-learning-operations microservices mlops model-serving production-machine-learning serving
Last synced: 25 Jan 2026
https://github.com/algorithmiaio/algorithmia-modeldeployment-action
Algorithmia Github Action capable of running Jupyter notebooks to create the ML model, uploading the model and updating the algorithm at Algorithmia
algorithmia ci-cd githubaction-workflow githubactions machine-learning model-deployment model-serving
Last synced: 06 Oct 2025
https://github.com/saivarunk/krypton
Model Server for ML and DL Models built using FastAPI
deep-learning fastapi machine-learning model-serving rest-api
Last synced: 07 Sep 2025
https://github.com/prassanna-ravishankar/modalkit
A powerful Python framework for deploying ML models on Modal with production-ready features
Last synced: 05 Sep 2025
https://github.com/Aquila-Network/AquilaHub
Load and serve Neural Encoder Models
machine-learning model-serving neural-search personal-search vector-search-engine
Last synced: 12 May 2025
https://github.com/wtlow003/modal-llm-serving
Examples of serving LLM on Modal.
llm lmdeploy modal model-serving openai openai-api sglang vllm
Last synced: 05 Mar 2025
https://github.com/algorithmiaio/githubactions-modeldeployment-template
Template ML repository to get started with Algorithmia Model Deployment Github Action integration
algorithmia cicd github-actions inference machine-learning model-deployment model-serving
Last synced: 12 Apr 2026
https://github.com/dudeperf3ct/8-fastapi-tests-gcp-gke
docker fastapi gke mlops model-serving
Last synced: 04 Mar 2026
https://github.com/md-emon-hasan/bentoml
BentoML is a high-performance model serving framework it provides various scripts and configurations to help streamline and deployment process.
ai bentoml data-science ml-engineering mlops model-deployment model-serving
Last synced: 02 Mar 2025
https://github.com/mpolinowski/ray-serve-model
Using Ray Serve for ML Model Serving
consensus model-serving python ray
Last synced: 17 Apr 2026
https://github.com/algorithmiaio/githubactions-modeldeployment-demo-algorithmiaalgo
Demo ML repository, using Algorithmia Model Deployment Github Action, to auto deploy on an algorithm hosted on Algorithmia
algorithmia ci-cd githubaction-workflow githubactions jupyter-notebook machine-learning model-deployment model-serving xgboost
Last synced: 05 Apr 2025
https://github.com/dudeperf3ct/6-ml-fastapi-aws-serverless
aws codepipeline docker elasticbeanstalk fastapi mlops model-serving
Last synced: 14 Apr 2026
https://github.com/ronylpatil/mlflow-pipeline
Built an E2E MLFlow Pipeline & hosted on AWS.
mlflow-tracking model-registry model-serving
Last synced: 13 Apr 2025
https://github.com/amine-akrout/smoker_detection
Smoker Detection deep learning model served via a Web App using TensorFlow, tensorflow-serving, flask and Docker compose
deep-learning docker docker-compose flask inceptionv3 keras model-deployment model-serving tensorflow tesnorflow-serving transfer-learning
Last synced: 15 Apr 2026