Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
Projects in Awesome Lists tagged with model-serving
A curated list of projects in awesome lists tagged with model-serving .
https://github.com/vllm-project/vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
amd cuda gpt hpu inference inferentia llama llm llm-serving llmops mlops model-serving pytorch rocm tpu trainium transformer xpu
Last synced: 16 Dec 2024
https://github.com/bentoml/bentoml
The easiest way to serve AI apps and models - Build Model Inference APIs, Job queues, LLM apps, Multi-model pipelines, and much more!
ai-inference deep-learning generative-ai inference-platform llm llm-inference llm-serving llmops machine-learning ml-engineering mlops model-inference-service model-serving multimodal python
Last synced: 16 Dec 2024
https://github.com/bentoml/BentoML
The most flexible way to serve AI/ML models in production - Build Model Inference Service, LLM APIs, Inference Graph/Pipelines, Compound AI systems, Multi-Modal, RAG as a Service, and more!
deep-learning generative-ai inference-platform llm llm-inference llm-serving llmops machine-learning ml-engineering mlops model-inference-service model-serving multimodal python
Last synced: 24 Oct 2024
https://github.com/ahkarami/deep-learning-in-production
In this repository, I will share some useful notes and references about deploying deep learning-based models in production.
angularjs c-plus-plus caffe2 convert-pytorch-models deep-learning deep-neural-networks flask keras model-serving mxnet production python pytorch react rest-api serving serving-pytorch-models tensorflow-models tesnorflow tutorial
Last synced: 17 Dec 2024
https://github.com/ahkarami/Deep-Learning-in-Production
In this repository, I will share some useful notes and references about deploying deep learning-based models in production.
angularjs c-plus-plus caffe2 convert-pytorch-models deep-learning deep-neural-networks flask keras model-serving mxnet production python pytorch react rest-api serving serving-pytorch-models tensorflow-models tesnorflow tutorial
Last synced: 26 Oct 2024
https://github.com/fedml-ai/fedml
FEDML - The unified and scalable ML library for large-scale distributed training, model serving, and federated learning. FEDML Launch, a cross-cloud scheduler, further enables running any AI jobs on any GPU cloud or on-premise cluster. Built on this library, TensorOpera AI (https://TensorOpera.ai) is your generative AI platform at scale.
ai-agent deep-learning distributed-training edge-ai federated-learning inference-engine machine-learning mlops model-deployment model-serving on-device-training
Last synced: 16 Dec 2024
https://github.com/FedML-AI/FedML
FEDML - The unified and scalable ML library for large-scale distributed training, model serving, and federated learning. FEDML Launch, a cross-cloud scheduler, further enables running any AI jobs on any GPU cloud or on-premise cluster. Built on this library, TensorOpera AI (https://TensorOpera.ai) is your generative AI platform at scale.
ai-agent deep-learning distributed-training edge-ai federated-learning inference-engine machine-learning mlops model-deployment model-serving on-device-training
Last synced: 05 Nov 2024
https://github.com/kserve/kserve
Standardized Serverless ML Inference Platform on Kubernetes
artificial-intelligence genai hacktoberfest istio k8s knative kserve kubeflow kubernetes llm-inference machine-learning mlops model-interpretability model-serving pytorch service-mesh sklearn tensorflow xgboost
Last synced: 16 Dec 2024
https://github.com/HuaizhengZhang/Awesome-System-for-Machine-Learning
🚀 Awesome System for Machine Learning ⚡️ AI System Papers and Industry Practice. ⚡️ System for Machine Learning, LLM (Large Language Model), GenAI (Generative AI). 🍻 OSDI, NSDI, SIGCOMM, SoCC, MLSys, etc. 🗃️ Llama3, Mistral, etc. 🧑💻 Video Tutorials.
ai-infra genai large-language-models llmsys mlsys model-serving model-training
Last synced: 05 Nov 2024
https://github.com/modeltc/lightllm
LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance.
deep-learning gpt llama llm model-serving nlp openai-triton
Last synced: 18 Dec 2024
https://github.com/predibase/lorax
Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs
fine-tuning gpt llama llm llm-inference llm-serving llmops lora model-serving pytorch transformers
Last synced: 17 Dec 2024
https://github.com/tensorchord/envd
🏕️ Reproducible development environment
buildkit developer-tools development-environment docker hacktoberfest llmops mlops mlops-workflow model-serving
Last synced: 17 Dec 2024
https://github.com/ModelTC/lightllm
LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance.
deep-learning gpt llama llm model-serving nlp openai-triton
Last synced: 28 Oct 2024
https://github.com/microsoft/aici
AICI: Prompts as (Wasm) Programs
ai inference language-model llm llm-framework llm-inference llm-serving llmops model-serving rust transformer wasm wasmtime
Last synced: 19 Dec 2024
https://github.com/mlrun/mlrun
MLRun is an open source MLOps platform for quickly building and managing continuous ML applications across their lifecycle. MLRun integrates into your development and CI/CD environment and automates the delivery of production data, ML pipelines, and online applications.
data-engineering data-science experiment-tracking kubernetes machine-learning mlops mlops-workflow model-serving python workflow
Last synced: 09 Nov 2024
https://github.com/logicalclocks/hopsworks
Hopsworks - Data-Intensive AI platform with a Feature Store
aws azure data-science feature-engineering feature-management feature-store gcp governance hopsworks kserve machine-learning ml mlops model-serving pyspark python serverless
Last synced: 19 Dec 2024
https://github.com/basetenlabs/truss
The simplest way to serve AI/ML models in production
artificial-intelligence easy-to-use falcon inference-api inference-server machine-learning model-serving open-source packaging stable-diffusion whisper wizardlm
Last synced: 17 Dec 2024
https://github.com/mosecorg/mosec
A high-performance ML model serving framework, offers dynamic batching and CPU/GPU pipelines to fully exploit your compute machine
cv deep-learning gpu hacktoberfest jax llm llm-serving machine-learning machine-learning-platform mlops model-serving mxnet nerual-network python pytorch rust tensorflow tts
Last synced: 20 Dec 2024
https://github.com/bentoml/yatai
Model Deployment at Scale on Kubernetes 🦄️
bentoml k8s kubernetes machine-learning mlops model-deployment model-serving
Last synced: 18 Dec 2024
https://github.com/bentoml/Yatai
Model Deployment at Scale on Kubernetes 🦄️
bentoml k8s kubernetes machine-learning mlops model-deployment model-serving
Last synced: 29 Oct 2024
https://github.com/efeslab/nanoflow
A throughput-oriented high-performance serving framework for LLMs
cuda inference llama2 llm llm-serving model-serving
Last synced: 20 Dec 2024
https://github.com/underneathall/pinferencia
Python + Inference - Model Deployment library in Python. Simplest model inference server ever.
ai artificial-intelligence computer-vision data-science deep-learning huggingface inference inference-server machine-learning model-deployment model-serving modelserver nlp paddlepaddle predict python pytorch serving tensorflow transformers
Last synced: 15 Dec 2024
https://github.com/jozu-ai/kitops
An open source DevOps tool for packaging and versioning AI/ML models, datasets, code, and configuration into an OCI artifact.
ai code datasets devops devops-tools gguf hacktoberfest kubernetes kubernetes-deployment ml mlops mlops-tools model-interpretability model-serving models opensource platform-engineering pytorch sklearn tensorflow
Last synced: 21 Dec 2024
https://github.com/eightBEC/fastapi-ml-skeleton
FastAPI Skeleton App to serve machine learning models production-ready.
fastapi machine-learning model-serving python python3
Last synced: 26 Oct 2024
https://github.com/bentoml/bentodiffusion
BentoDiffusion: A collection of diffusion models served with BentoML
ai diffusion-models fine-tuning kubernetes lora model-serving stable-diffusion
Last synced: 16 Dec 2024
https://github.com/bentoml/BentoDiffusion
BentoDiffusion: A collection of diffusion models served with BentoML
ai diffusion-models fine-tuning kubernetes lora model-serving stable-diffusion
Last synced: 17 Dec 2024
https://github.com/ai-hypercomputer/jetstream
JetStream is a throughput and memory optimized engine for LLM inference on XLA devices, starting with TPUs (and GPUs in future -- PRs welcome).
gemma gpt gpu inference jax large-language-models llama llama2 llm llm-inference llmops mlops model-serving pytorch tpu transformer
Last synced: 16 Dec 2024
https://github.com/aniketmaurya/chitra
A multi-functional library for full-stack Deep Learning. Simplifies Model Building, API development, and Model Deployment.
bounding-boxes deep-learning fastapi gradcam hacktoberfest image-classification image-dataset image-processing machine-learning mlops model-deployment model-interpretation model-serving model-visualization object-detection python pytorch tensorflow visualization
Last synced: 20 Dec 2024
https://github.com/AI-Hypercomputer/JetStream
JetStream is a throughput and memory optimized engine for LLM inference on XLA devices, starting with TPUs (and GPUs in future -- PRs welcome).
gemma gpt gpu inference jax large-language-models llama llama2 llm llm-inference llmops mlops model-serving pytorch tpu transformer
Last synced: 01 Nov 2024
https://github.com/lightbend/kafka-with-akka-streams-kafka-streams-tutorial
Code samples for the Lightbend tutorial on writing microservices with Akka Streams, Kafka Streams, and Kafka
akka kafka-streams model-serving
Last synced: 12 Nov 2024
https://github.com/google/jetstream
JetStream is a throughput and memory optimized engine for LLM inference on XLA devices, starting with TPUs (and GPUs in future -- PRs welcome).
gemma gpt gpu inference jax large-language-models llama llama2 llm llm-inference llmops mlops model-serving pytorch tpu transformer
Last synced: 29 Sep 2024
https://github.com/alvarobartt/serving-pytorch-models
Serving PyTorch models with TorchServe :fire:
image-classification machine-learning mlops model-deployment model-serving pytorch pytorch-cnn serve-pytorch torchserve
Last synced: 14 Oct 2024
https://github.com/notai-tech/fastdeploy
Deploy DL/ ML inference pipelines with minimal extra code.
deep-learning docker falcon gevent gunicorn http-server inference-server model-deployment model-serving python pytorch serving streaming-audio tensorflow-serving tf-serving torchserve triton triton-inference-server triton-server websocket
Last synced: 18 Dec 2024
https://github.com/project-monai/monai-deploy-app-sdk
MONAI Deploy App SDK offers a framework and associated tools to design, develop and verify AI-driven applications in the healthcare imaging domain.
ai deep-learning deploy dicom healthcare image-processing machine-learning medical-imaging ml ml-infrastructure ml-platform mlops model-deployment model-serving monai pipeline python pytorch workflow
Last synced: 15 Dec 2024
https://github.com/balavenkatesh3322/model_deployment
A collection of model deployment library and technique.
aws azure caffe data-science deep-learning keras machine-learning model model-deployment model-server model-serving mxnet neural-network pytorch serving serving-pytorch-models serving-recommendation serving-tensors tensorflow
Last synced: 10 Nov 2024
https://github.com/messense/fasttext-serving
fastText model serving service
fasttext model-server model-serving nlp
Last synced: 16 Dec 2024
https://github.com/bentoml/clip-api-service
CLIP as a service - Embed image and sentences, object recognition, visual reasoning, image classification and reverse image search
ai-applications clip cloud-native mlops model-inference model-inference-service model-serving openai-clip
Last synced: 13 Nov 2024
https://github.com/bentoml/ocr-as-a-service
Turn any OCR models into online inference API endpoint 🚀 🌖
ai-applications model-deployment model-serving ocr ocr-python
Last synced: 13 Nov 2024
https://github.com/alvarobartt/serving-tensorflow-models
Serving TensorFlow models with TensorFlow Serving :orange_book:
image-classification machine-learning mlops model-deployment model-serving serve-tensorflow-models tensorflow tensorflow-serving
Last synced: 14 Oct 2024
https://github.com/bentoml/transformers-nlp-service
Online Inference API for NLP Transformer models - summarization, text classification, sentiment analysis and more
llm llmops mlops model-deployment model-inference-service model-serving nlp nlp-machine-learning online-inference transformer
Last synced: 13 Nov 2024
https://github.com/kspviswa/pyomlx
A wannabe Ollama equivalent for Apple MlX models
Last synced: 06 Dec 2024
https://github.com/kspviswa/PyOMlx
A wannabe Ollama equivalent for Apple MlX models
Last synced: 07 Nov 2024
https://github.com/ai-hypercomputer/jetstream-pytorch
PyTorch/XLA integration with JetStream (https://github.com/google/JetStream) for LLM inference"
attention batching gemma inference llama llama2 llm llm-inference model-serving pytorch tpu
Last synced: 18 Dec 2024
https://github.com/lightbend/kubeflow-recommender
Kubeflow example of machine learning/model serving
kubeflow machine-learning model-serving
Last synced: 12 Nov 2024
https://github.com/ml-libs/mlserve
mlserve turns your python models into RESTful API, serves web page with form generated to match your input data.
machine-learning mlserve model-deployment model-serving scikit-learn
Last synced: 14 Oct 2024
https://github.com/galileo-galilei/kedro-mlflow-tutorial
A tutorial on how to use kedro-mlflow plugin (https://github.com/Galileo-Galilei/kedro-mlflow) to synchronize training and inference and serve kedro pipeline
kedro kedro-mlflow kedro-tutorial mlflow mlops model-serving
Last synced: 18 Nov 2024
https://github.com/animator/titus2
Titus 2 : Portable Format for Analytics (PFA) implementation for Python 3.4+
analytics inference inference-engine ml-engine model-deployment model-evaluation model-serving pfa pfa-standard pmml python scoring scoring-engine titus
Last synced: 28 Nov 2024
https://github.com/h2oai/mlops-dai-runtimes
Production ready templates for deploying Driverless AI (DAI) scorers. https://h2oai.github.io/dai-deployment-templates/
h2o h2oai machine-learning model-deployment model-server model-serving mojo
Last synced: 06 Nov 2024
https://github.com/bentoml/fraud-detection-model-serving
Online model serving with Fraud Detection model trained with XGBoost on IEEE-CIS dataset
ai-applications fraud-detection model-deployment model-serving
Last synced: 13 Nov 2024
https://github.com/galileo-galilei/kedro-serving
A kedro-plugin to serve Kedro Pipelines as API
fastapi kedro kedro-plugin mlops model-serving pipeline-serving serving
Last synced: 18 Nov 2024
https://github.com/rishit-dagli/tfserving-demos
TF Serving demos
jupyter-notebook model-serving python3 tensorflow tensorflow-model-server tensorflow-serving tensorflow2
Last synced: 23 Oct 2024
https://github.com/bentoml/diffusers-examples
API serving for your diffusers models
bentoml diffusers model-deployment model-serving
Last synced: 13 Nov 2024
https://github.com/logicalclocks/machine-learning-api
Hopsworks Machine Learning Api 🚀 Model management with a model registry and model serving
Last synced: 11 Nov 2024
https://github.com/yas-sim/openvino-model-server-wrapper
Python wrapper class for OpenVINO Model Server. User can submit inference request to OVMS with just a few lines of code.
ai area-intrusion-detection cloud deep-learning edge grpc grpc-client inference intel line-crossing-detection model-serving object-tracking openvino openvino-docker openvino-model-server python serving tensorflow-serving triton-inference-server
Last synced: 16 Nov 2024
https://github.com/zerohertz/yolo-serving-cookbook
📸 YOLO Serving Cookbook based on Triton Inference Server 📸
docker docker-compose fastapi gradio k8s kubernetes mlops model-serving onnx pytorch triton-inference-server yolo yolov5
Last synced: 27 Oct 2024
https://github.com/saivarunk/krypton
Model Server for ML and DL Models built using FastAPI
deep-learning fastapi machine-learning model-serving rest-api
Last synced: 10 Nov 2024
https://github.com/riccorl/ner-serve
Simple NER model using Docker, FastAPI, ONNX and Multilingual Mini-LM.
backend deep-learning fastapi huggingface huggingface-transformers model-serving named-entity-recognition natural-language-processing ner nlp onnx onnxruntime pytorch transformers
Last synced: 08 Nov 2024
https://github.com/algorithmiaio/algorithmia-modeldeployment-action
Algorithmia Github Action capable of running Jupyter notebooks to create the ML model, uploading the model and updating the algorithm at Algorithmia
algorithmia ci-cd githubaction-workflow githubactions machine-learning model-deployment model-serving
Last synced: 18 Dec 2024
https://github.com/rapidrabbit76/fastapi-deep-learning-model-micro-batching-serving
FastAPI pytorch model serving with micro batching
Last synced: 23 Nov 2024
https://github.com/Aquila-Network/AquilaHub
Load and serve Neural Encoder Models
machine-learning model-serving neural-search personal-search vector-search-engine
Last synced: 18 Nov 2024
https://github.com/dudeperf3ct/11-cortex-deploy
aws-lambda cortex docker fastapi mlops model-serving transformers
Last synced: 08 Nov 2024
https://github.com/dudeperf3ct/12-serverless-deploy
aws-lambda docker fastapi mlops model-serving serverless-framework transformers
Last synced: 08 Nov 2024
https://github.com/dudeperf3ct/6-ml-fastapi-aws-serverless
aws codepipeline docker elasticbeanstalk fastapi mlops model-serving
Last synced: 08 Nov 2024
https://github.com/md-emon-hasan/bentoml
BentoML is a high-performance model serving framework it provides various scripts and configurations to help streamline and deployment process.
ai bentoml data-science ml-engineering mlops model-deployment model-serving
Last synced: 13 Nov 2024
https://github.com/algorithmiaio/githubactions-modeldeployment-demo-algorithmiaalgo
Demo ML repository, using Algorithmia Model Deployment Github Action, to auto deploy on an algorithm hosted on Algorithmia
algorithmia ci-cd githubaction-workflow githubactions jupyter-notebook machine-learning model-deployment model-serving xgboost
Last synced: 18 Dec 2024
https://github.com/algorithmiaio/githubactions-modeldeployment-template
Template ML repository to get started with Algorithmia Model Deployment Github Action integration
algorithmia cicd github-actions inference machine-learning model-deployment model-serving
Last synced: 18 Dec 2024
https://github.com/dudeperf3ct/8-fastapi-tests-gcp-gke
docker fastapi gke mlops model-serving
Last synced: 08 Nov 2024
https://github.com/mpolinowski/ray-serve-model
Using Ray Serve for ML Model Serving
consensus model-serving python ray
Last synced: 30 Nov 2024
https://github.com/algorithmiaio/githubactions-modeldeployment-demo-githubalgo
Demo ML repository, using Algorithmia Model Deployment Github Action, to auto deploy on an Algorithmia algorithm backed by Github
algorithmia ci-cd githubaction-workflow githubactions jupyter-notebook machine-learning model-deployment model-serving xgboost
Last synced: 18 Dec 2024
https://github.com/kristofferv98/whisper_turboapi
An optimized FastAPI server for OpenAI's Whisper whisper-large-v3-turbo model using MLX turbo optimization
ai api asynchronous audio audio-processing fastapi huggingface machine-learning macos mlx model-serving nlp openai optimization python speech-to-text synchronous transcription whisper whisper-turbo
Last synced: 14 Dec 2024
https://github.com/wtlow003/modal-llm-serving
Examples of serving LLM on Modal.
llm lmdeploy modal model-serving openai openai-api sglang vllm
Last synced: 15 Nov 2024
https://github.com/duketemon/python-model-registry
Model Registry is the service that exposes API to save, fetch and delete machine learning models
machine-learning model-registry model-serving python
Last synced: 18 Dec 2024
https://github.com/redhat-na-ssa/demo-triton-yolo
Customize Nvidia Triton to use OpenShift Source to Image building
data-science model-serving nvidia openshift triton
Last synced: 04 Dec 2024