Projects in Awesome Lists tagged with model-serving

https://github.com/vllm-project/vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

amd cuda gpt hpu inference inferentia llama llm llm-serving llmops mlops model-serving pytorch rocm tpu trainium transformer xpu

Last synced: 16 Dec 2024

https://github.com/bentoml/bentoml

The easiest way to serve AI apps and models - Build Model Inference APIs, Job queues, LLM apps, Multi-model pipelines, and much more!

ai-inference deep-learning generative-ai inference-platform llm llm-inference llm-serving llmops machine-learning ml-engineering mlops model-inference-service model-serving multimodal python

Last synced: 16 Dec 2024

https://github.com/bentoml/BentoML

The most flexible way to serve AI/ML models in production - Build Model Inference Service, LLM APIs, Inference Graph/Pipelines, Compound AI systems, Multi-Modal, RAG as a Service, and more!

deep-learning generative-ai inference-platform llm llm-inference llm-serving llmops machine-learning ml-engineering mlops model-inference-service model-serving multimodal python

Last synced: 24 Oct 2024

https://github.com/ahkarami/deep-learning-in-production

In this repository, I will share some useful notes and references about deploying deep learning-based models in production.

angularjs c-plus-plus caffe2 convert-pytorch-models deep-learning deep-neural-networks flask keras model-serving mxnet production python pytorch react rest-api serving serving-pytorch-models tensorflow-models tesnorflow tutorial

Last synced: 17 Dec 2024

https://github.com/ahkarami/Deep-Learning-in-Production

In this repository, I will share some useful notes and references about deploying deep learning-based models in production.

angularjs c-plus-plus caffe2 convert-pytorch-models deep-learning deep-neural-networks flask keras model-serving mxnet production python pytorch react rest-api serving serving-pytorch-models tensorflow-models tesnorflow tutorial

Last synced: 26 Oct 2024

https://github.com/fedml-ai/fedml

FEDML - The unified and scalable ML library for large-scale distributed training, model serving, and federated learning. FEDML Launch, a cross-cloud scheduler, further enables running any AI jobs on any GPU cloud or on-premise cluster. Built on this library, TensorOpera AI (https://TensorOpera.ai) is your generative AI platform at scale.

ai-agent deep-learning distributed-training edge-ai federated-learning inference-engine machine-learning mlops model-deployment model-serving on-device-training

Last synced: 16 Dec 2024

https://github.com/FedML-AI/FedML

FEDML - The unified and scalable ML library for large-scale distributed training, model serving, and federated learning. FEDML Launch, a cross-cloud scheduler, further enables running any AI jobs on any GPU cloud or on-premise cluster. Built on this library, TensorOpera AI (https://TensorOpera.ai) is your generative AI platform at scale.

ai-agent deep-learning distributed-training edge-ai federated-learning inference-engine machine-learning mlops model-deployment model-serving on-device-training

Last synced: 05 Nov 2024

https://github.com/kserve/kserve

Standardized Serverless ML Inference Platform on Kubernetes

artificial-intelligence genai hacktoberfest istio k8s knative kserve kubeflow kubernetes llm-inference machine-learning mlops model-interpretability model-serving pytorch service-mesh sklearn tensorflow xgboost

Last synced: 16 Dec 2024

https://github.com/HuaizhengZhang/Awesome-System-for-Machine-Learning

🚀 Awesome System for Machine Learning ⚡️ AI System Papers and Industry Practice. ⚡️ System for Machine Learning, LLM (Large Language Model), GenAI (Generative AI). 🍻 OSDI, NSDI, SIGCOMM, SoCC, MLSys, etc. 🗃️ Llama3, Mistral, etc. 🧑‍💻 Video Tutorials.

ai-infra genai large-language-models llmsys mlsys model-serving model-training

Last synced: 05 Nov 2024

https://github.com/modeltc/lightllm

LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance.

deep-learning gpt llama llm model-serving nlp openai-triton

Last synced: 18 Dec 2024

https://github.com/predibase/lorax

Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs

fine-tuning gpt llama llm llm-inference llm-serving llmops lora model-serving pytorch transformers

Last synced: 17 Dec 2024

https://github.com/tensorchord/envd

🏕️ Reproducible development environment

buildkit developer-tools development-environment docker hacktoberfest llmops mlops mlops-workflow model-serving

Last synced: 17 Dec 2024

https://github.com/ModelTC/lightllm

LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance.

deep-learning gpt llama llm model-serving nlp openai-triton

Last synced: 28 Oct 2024

https://github.com/microsoft/aici

AICI: Prompts as (Wasm) Programs

ai inference language-model llm llm-framework llm-inference llm-serving llmops model-serving rust transformer wasm wasmtime

Last synced: 19 Dec 2024

https://github.com/mlrun/mlrun

MLRun is an open source MLOps platform for quickly building and managing continuous ML applications across their lifecycle. MLRun integrates into your development and CI/CD environment and automates the delivery of production data, ML pipelines, and online applications.

data-engineering data-science experiment-tracking kubernetes machine-learning mlops mlops-workflow model-serving python workflow

Last synced: 09 Nov 2024

https://github.com/logicalclocks/hopsworks

Hopsworks - Data-Intensive AI platform with a Feature Store

aws azure data-science feature-engineering feature-management feature-store gcp governance hopsworks kserve machine-learning ml mlops model-serving pyspark python serverless

Last synced: 19 Dec 2024

https://github.com/basetenlabs/truss

The simplest way to serve AI/ML models in production

artificial-intelligence easy-to-use falcon inference-api inference-server machine-learning model-serving open-source packaging stable-diffusion whisper wizardlm

Last synced: 17 Dec 2024

https://github.com/mosecorg/mosec

A high-performance ML model serving framework, offers dynamic batching and CPU/GPU pipelines to fully exploit your compute machine

cv deep-learning gpu hacktoberfest jax llm llm-serving machine-learning machine-learning-platform mlops model-serving mxnet nerual-network python pytorch rust tensorflow tts

Last synced: 20 Dec 2024

https://github.com/bentoml/yatai

Model Deployment at Scale on Kubernetes 🦄️

bentoml k8s kubernetes machine-learning mlops model-deployment model-serving

Last synced: 18 Dec 2024

https://github.com/bentoml/Yatai

Model Deployment at Scale on Kubernetes 🦄️

bentoml k8s kubernetes machine-learning mlops model-deployment model-serving

Last synced: 29 Oct 2024

https://github.com/efeslab/nanoflow

A throughput-oriented high-performance serving framework for LLMs

cuda inference llama2 llm llm-serving model-serving

Last synced: 20 Dec 2024

https://github.com/underneathall/pinferencia

Python + Inference - Model Deployment library in Python. Simplest model inference server ever.

ai artificial-intelligence computer-vision data-science deep-learning huggingface inference inference-server machine-learning model-deployment model-serving modelserver nlp paddlepaddle predict python pytorch serving tensorflow transformers

Last synced: 15 Dec 2024

https://github.com/jozu-ai/kitops

An open source DevOps tool for packaging and versioning AI/ML models, datasets, code, and configuration into an OCI artifact.

ai code datasets devops devops-tools gguf hacktoberfest kubernetes kubernetes-deployment ml mlops mlops-tools model-interpretability model-serving models opensource platform-engineering pytorch sklearn tensorflow

Last synced: 21 Dec 2024

https://github.com/eightBEC/fastapi-ml-skeleton

FastAPI Skeleton App to serve machine learning models production-ready.

fastapi machine-learning model-serving python python3

Last synced: 26 Oct 2024

https://github.com/bentoml/bentodiffusion

BentoDiffusion: A collection of diffusion models served with BentoML

ai diffusion-models fine-tuning kubernetes lora model-serving stable-diffusion

Last synced: 16 Dec 2024

https://github.com/bentoml/BentoDiffusion

BentoDiffusion: A collection of diffusion models served with BentoML

ai diffusion-models fine-tuning kubernetes lora model-serving stable-diffusion

Last synced: 17 Dec 2024

https://github.com/ai-hypercomputer/jetstream

JetStream is a throughput and memory optimized engine for LLM inference on XLA devices, starting with TPUs (and GPUs in future -- PRs welcome).

gemma gpt gpu inference jax large-language-models llama llama2 llm llm-inference llmops mlops model-serving pytorch tpu transformer

Last synced: 16 Dec 2024

https://github.com/aniketmaurya/chitra

A multi-functional library for full-stack Deep Learning. Simplifies Model Building, API development, and Model Deployment.

bounding-boxes deep-learning fastapi gradcam hacktoberfest image-classification image-dataset image-processing machine-learning mlops model-deployment model-interpretation model-serving model-visualization object-detection python pytorch tensorflow visualization

Last synced: 20 Dec 2024

https://github.com/AI-Hypercomputer/JetStream

JetStream is a throughput and memory optimized engine for LLM inference on XLA devices, starting with TPUs (and GPUs in future -- PRs welcome).

gemma gpt gpu inference jax large-language-models llama llama2 llm llm-inference llmops mlops model-serving pytorch tpu transformer

Last synced: 01 Nov 2024

https://github.com/lightbend/kafka-with-akka-streams-kafka-streams-tutorial

Code samples for the Lightbend tutorial on writing microservices with Akka Streams, Kafka Streams, and Kafka

akka kafka-streams model-serving

Last synced: 12 Nov 2024

https://github.com/google/jetstream

JetStream is a throughput and memory optimized engine for LLM inference on XLA devices, starting with TPUs (and GPUs in future -- PRs welcome).

gemma gpt gpu inference jax large-language-models llama llama2 llm llm-inference llmops mlops model-serving pytorch tpu transformer

Last synced: 29 Sep 2024

https://github.com/alvarobartt/serving-pytorch-models

Serving PyTorch models with TorchServe :fire:

image-classification machine-learning mlops model-deployment model-serving pytorch pytorch-cnn serve-pytorch torchserve

Last synced: 14 Oct 2024

https://github.com/notai-tech/fastdeploy

Deploy DL/ ML inference pipelines with minimal extra code.

deep-learning docker falcon gevent gunicorn http-server inference-server model-deployment model-serving python pytorch serving streaming-audio tensorflow-serving tf-serving torchserve triton triton-inference-server triton-server websocket

Last synced: 18 Dec 2024

https://github.com/project-monai/monai-deploy-app-sdk

MONAI Deploy App SDK offers a framework and associated tools to design, develop and verify AI-driven applications in the healthcare imaging domain.

ai deep-learning deploy dicom healthcare image-processing machine-learning medical-imaging ml ml-infrastructure ml-platform mlops model-deployment model-serving monai pipeline python pytorch workflow

Last synced: 15 Dec 2024

https://github.com/balavenkatesh3322/model_deployment

A collection of model deployment library and technique.

aws azure caffe data-science deep-learning keras machine-learning model model-deployment model-server model-serving mxnet neural-network pytorch serving serving-pytorch-models serving-recommendation serving-tensors tensorflow

Last synced: 10 Nov 2024

https://github.com/messense/fasttext-serving

fastText model serving service

fasttext model-server model-serving nlp

Last synced: 16 Dec 2024

https://github.com/bentoml/clip-api-service

CLIP as a service - Embed image and sentences, object recognition, visual reasoning, image classification and reverse image search

ai-applications clip cloud-native mlops model-inference model-inference-service model-serving openai-clip

Last synced: 13 Nov 2024

https://github.com/bentoml/ocr-as-a-service

Turn any OCR models into online inference API endpoint 🚀 🌖

ai-applications model-deployment model-serving ocr ocr-python

Last synced: 13 Nov 2024

https://github.com/alvarobartt/serving-tensorflow-models

Serving TensorFlow models with TensorFlow Serving :orange_book:

image-classification machine-learning mlops model-deployment model-serving serve-tensorflow-models tensorflow tensorflow-serving

Last synced: 14 Oct 2024

https://github.com/bentoml/transformers-nlp-service

Online Inference API for NLP Transformer models - summarization, text classification, sentiment analysis and more

llm llmops mlops model-deployment model-inference-service model-serving nlp nlp-machine-learning online-inference transformer

Last synced: 13 Nov 2024

https://github.com/kspviswa/pyomlx

A wannabe Ollama equivalent for Apple MlX models

chatbot mlx model-serving

Last synced: 06 Dec 2024

https://github.com/kspviswa/PyOMlx

A wannabe Ollama equivalent for Apple MlX models

chatbot mlx model-serving

Last synced: 07 Nov 2024

https://github.com/ai-hypercomputer/jetstream-pytorch

PyTorch/XLA integration with JetStream (https://github.com/google/JetStream) for LLM inference"

attention batching gemma inference llama llama2 llm llm-inference model-serving pytorch tpu

Last synced: 18 Dec 2024

https://github.com/lightbend/kubeflow-recommender

Kubeflow example of machine learning/model serving

kubeflow machine-learning model-serving

Last synced: 12 Nov 2024

https://github.com/ml-libs/mlserve

mlserve turns your python models into RESTful API, serves web page with form generated to match your input data.

machine-learning mlserve model-deployment model-serving scikit-learn

Last synced: 14 Oct 2024

https://github.com/yuruofeifei/mms

MXNet Model Serving

model-serving mxnet

Last synced: 08 Nov 2024

https://github.com/galileo-galilei/kedro-mlflow-tutorial

A tutorial on how to use kedro-mlflow plugin (https://github.com/Galileo-Galilei/kedro-mlflow) to synchronize training and inference and serve kedro pipeline

kedro kedro-mlflow kedro-tutorial mlflow mlops model-serving

Last synced: 18 Nov 2024

https://github.com/animator/titus2

Titus 2 : Portable Format for Analytics (PFA) implementation for Python 3.4+

analytics inference inference-engine ml-engine model-deployment model-evaluation model-serving pfa pfa-standard pmml python scoring scoring-engine titus

Last synced: 28 Nov 2024

https://github.com/h2oai/mlops-dai-runtimes

Production ready templates for deploying Driverless AI (DAI) scorers. https://h2oai.github.io/dai-deployment-templates/

h2o h2oai machine-learning model-deployment model-server model-serving mojo

Last synced: 06 Nov 2024

https://github.com/bentoml/fraud-detection-model-serving

Online model serving with Fraud Detection model trained with XGBoost on IEEE-CIS dataset

ai-applications fraud-detection model-deployment model-serving

Last synced: 13 Nov 2024

https://github.com/galileo-galilei/kedro-serving

A kedro-plugin to serve Kedro Pipelines as API

fastapi kedro kedro-plugin mlops model-serving pipeline-serving serving

Last synced: 18 Nov 2024

https://github.com/rishit-dagli/tfserving-demos

TF Serving demos

jupyter-notebook model-serving python3 tensorflow tensorflow-model-server tensorflow-serving tensorflow2

Last synced: 23 Oct 2024

https://github.com/bentoml/diffusers-examples

API serving for your diffusers models

bentoml diffusers model-deployment model-serving

Last synced: 13 Nov 2024

https://github.com/logicalclocks/machine-learning-api

Hopsworks Machine Learning Api 🚀 Model management with a model registry and model serving

model-registry model-serving

Last synced: 11 Nov 2024

https://github.com/yas-sim/openvino-model-server-wrapper

Python wrapper class for OpenVINO Model Server. User can submit inference request to OVMS with just a few lines of code.

ai area-intrusion-detection cloud deep-learning edge grpc grpc-client inference intel line-crossing-detection model-serving object-tracking openvino openvino-docker openvino-model-server python serving tensorflow-serving triton-inference-server

Last synced: 16 Nov 2024

https://github.com/zerohertz/yolo-serving-cookbook

📸 YOLO Serving Cookbook based on Triton Inference Server 📸

docker docker-compose fastapi gradio k8s kubernetes mlops model-serving onnx pytorch triton-inference-server yolo yolov5

Last synced: 27 Oct 2024

https://github.com/saivarunk/krypton

Model Server for ML and DL Models built using FastAPI

deep-learning fastapi machine-learning model-serving rest-api

Last synced: 10 Nov 2024

https://github.com/riccorl/ner-serve

Simple NER model using Docker, FastAPI, ONNX and Multilingual Mini-LM.

backend deep-learning fastapi huggingface huggingface-transformers model-serving named-entity-recognition natural-language-processing ner nlp onnx onnxruntime pytorch transformers

Last synced: 08 Nov 2024

https://github.com/algorithmiaio/algorithmia-modeldeployment-action

Algorithmia Github Action capable of running Jupyter notebooks to create the ML model, uploading the model and updating the algorithm at Algorithmia

algorithmia ci-cd githubaction-workflow githubactions machine-learning model-deployment model-serving

Last synced: 18 Dec 2024

https://github.com/rapidrabbit76/fastapi-deep-learning-model-micro-batching-serving

FastAPI pytorch model serving with micro batching

fastapi model-serving pytorch

Last synced: 23 Nov 2024

https://github.com/Aquila-Network/AquilaHub

Load and serve Neural Encoder Models

machine-learning model-serving neural-search personal-search vector-search-engine

Last synced: 18 Nov 2024

https://github.com/dudeperf3ct/11-cortex-deploy

aws-lambda cortex docker fastapi mlops model-serving transformers

Last synced: 08 Nov 2024

https://github.com/dudeperf3ct/12-serverless-deploy

aws-lambda docker fastapi mlops model-serving serverless-framework transformers

Last synced: 08 Nov 2024

https://github.com/dudeperf3ct/6-ml-fastapi-aws-serverless

aws codepipeline docker elasticbeanstalk fastapi mlops model-serving

Last synced: 08 Nov 2024

https://github.com/md-emon-hasan/bentoml

BentoML is a high-performance model serving framework it provides various scripts and configurations to help streamline and deployment process.

ai bentoml data-science ml-engineering mlops model-deployment model-serving

Last synced: 13 Nov 2024

https://github.com/algorithmiaio/githubactions-modeldeployment-demo-algorithmiaalgo

Demo ML repository, using Algorithmia Model Deployment Github Action, to auto deploy on an algorithm hosted on Algorithmia

algorithmia ci-cd githubaction-workflow githubactions jupyter-notebook machine-learning model-deployment model-serving xgboost

Last synced: 18 Dec 2024

https://github.com/algorithmiaio/githubactions-modeldeployment-template

Template ML repository to get started with Algorithmia Model Deployment Github Action integration

algorithmia cicd github-actions inference machine-learning model-deployment model-serving

Last synced: 18 Dec 2024

https://github.com/dudeperf3ct/8-fastapi-tests-gcp-gke

docker fastapi gke mlops model-serving

Last synced: 08 Nov 2024

https://github.com/mpolinowski/ray-serve-model

Using Ray Serve for ML Model Serving

consensus model-serving python ray

Last synced: 30 Nov 2024

https://github.com/algorithmiaio/githubactions-modeldeployment-demo-githubalgo

Demo ML repository, using Algorithmia Model Deployment Github Action, to auto deploy on an Algorithmia algorithm backed by Github

algorithmia ci-cd githubaction-workflow githubactions jupyter-notebook machine-learning model-deployment model-serving xgboost

Last synced: 18 Dec 2024

https://github.com/kristofferv98/whisper_turboapi

An optimized FastAPI server for OpenAI's Whisper whisper-large-v3-turbo model using MLX turbo optimization

ai api asynchronous audio audio-processing fastapi huggingface machine-learning macos mlx model-serving nlp openai optimization python speech-to-text synchronous transcription whisper whisper-turbo

Last synced: 14 Dec 2024

https://github.com/wtlow003/modal-llm-serving

Examples of serving LLM on Modal.

llm lmdeploy modal model-serving openai openai-api sglang vllm

Last synced: 15 Nov 2024

https://github.com/duketemon/python-model-registry

Model Registry is the service that exposes API to save, fetch and delete machine learning models

machine-learning model-registry model-serving python

Last synced: 18 Dec 2024

https://github.com/redhat-na-ssa/demo-triton-yolo

Customize Nvidia Triton to use OpenShift Source to Image building

data-science model-serving nvidia openshift triton

Last synced: 04 Dec 2024