Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

Projects in Awesome Lists tagged with model-serving

A curated list of projects in awesome lists tagged with model-serving .

https://github.com/vllm-project/vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

amd cuda gpt hpu inference inferentia llama llm llm-serving llmops mlops model-serving pytorch rocm tpu trainium transformer xpu

Last synced: 16 Dec 2024

https://github.com/bentoml/bentoml

The easiest way to serve AI apps and models - Build Model Inference APIs, Job queues, LLM apps, Multi-model pipelines, and much more!

ai-inference deep-learning generative-ai inference-platform llm llm-inference llm-serving llmops machine-learning ml-engineering mlops model-inference-service model-serving multimodal python

Last synced: 16 Dec 2024

https://github.com/bentoml/BentoML

The most flexible way to serve AI/ML models in production - Build Model Inference Service, LLM APIs, Inference Graph/Pipelines, Compound AI systems, Multi-Modal, RAG as a Service, and more!

deep-learning generative-ai inference-platform llm llm-inference llm-serving llmops machine-learning ml-engineering mlops model-inference-service model-serving multimodal python

Last synced: 24 Oct 2024

https://github.com/fedml-ai/fedml

FEDML - The unified and scalable ML library for large-scale distributed training, model serving, and federated learning. FEDML Launch, a cross-cloud scheduler, further enables running any AI jobs on any GPU cloud or on-premise cluster. Built on this library, TensorOpera AI (https://TensorOpera.ai) is your generative AI platform at scale.

ai-agent deep-learning distributed-training edge-ai federated-learning inference-engine machine-learning mlops model-deployment model-serving on-device-training

Last synced: 16 Dec 2024

https://github.com/FedML-AI/FedML

FEDML - The unified and scalable ML library for large-scale distributed training, model serving, and federated learning. FEDML Launch, a cross-cloud scheduler, further enables running any AI jobs on any GPU cloud or on-premise cluster. Built on this library, TensorOpera AI (https://TensorOpera.ai) is your generative AI platform at scale.

ai-agent deep-learning distributed-training edge-ai federated-learning inference-engine machine-learning mlops model-deployment model-serving on-device-training

Last synced: 05 Nov 2024

https://github.com/HuaizhengZhang/Awesome-System-for-Machine-Learning

🚀 Awesome System for Machine Learning ⚡️ AI System Papers and Industry Practice. ⚡️ System for Machine Learning, LLM (Large Language Model), GenAI (Generative AI). 🍻 OSDI, NSDI, SIGCOMM, SoCC, MLSys, etc. 🗃️ Llama3, Mistral, etc. 🧑‍💻 Video Tutorials.

ai-infra genai large-language-models llmsys mlsys model-serving model-training

Last synced: 05 Nov 2024

https://github.com/modeltc/lightllm

LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance.

deep-learning gpt llama llm model-serving nlp openai-triton

Last synced: 18 Dec 2024

https://github.com/predibase/lorax

Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs

fine-tuning gpt llama llm llm-inference llm-serving llmops lora model-serving pytorch transformers

Last synced: 17 Dec 2024

https://github.com/ModelTC/lightllm

LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance.

deep-learning gpt llama llm model-serving nlp openai-triton

Last synced: 28 Oct 2024

https://github.com/mlrun/mlrun

MLRun is an open source MLOps platform for quickly building and managing continuous ML applications across their lifecycle. MLRun integrates into your development and CI/CD environment and automates the delivery of production data, ML pipelines, and online applications.

data-engineering data-science experiment-tracking kubernetes machine-learning mlops mlops-workflow model-serving python workflow

Last synced: 09 Nov 2024

https://github.com/mosecorg/mosec

A high-performance ML model serving framework, offers dynamic batching and CPU/GPU pipelines to fully exploit your compute machine

cv deep-learning gpu hacktoberfest jax llm llm-serving machine-learning machine-learning-platform mlops model-serving mxnet nerual-network python pytorch rust tensorflow tts

Last synced: 20 Dec 2024

https://github.com/bentoml/yatai

Model Deployment at Scale on Kubernetes 🦄️

bentoml k8s kubernetes machine-learning mlops model-deployment model-serving

Last synced: 18 Dec 2024

https://github.com/bentoml/Yatai

Model Deployment at Scale on Kubernetes 🦄️

bentoml k8s kubernetes machine-learning mlops model-deployment model-serving

Last synced: 29 Oct 2024

https://github.com/efeslab/nanoflow

A throughput-oriented high-performance serving framework for LLMs

cuda inference llama2 llm llm-serving model-serving

Last synced: 20 Dec 2024

https://github.com/jozu-ai/kitops

An open source DevOps tool for packaging and versioning AI/ML models, datasets, code, and configuration into an OCI artifact.

ai code datasets devops devops-tools gguf hacktoberfest kubernetes kubernetes-deployment ml mlops mlops-tools model-interpretability model-serving models opensource platform-engineering pytorch sklearn tensorflow

Last synced: 21 Dec 2024

https://github.com/eightBEC/fastapi-ml-skeleton

FastAPI Skeleton App to serve machine learning models production-ready.

fastapi machine-learning model-serving python python3

Last synced: 26 Oct 2024

https://github.com/bentoml/bentodiffusion

BentoDiffusion: A collection of diffusion models served with BentoML

ai diffusion-models fine-tuning kubernetes lora model-serving stable-diffusion

Last synced: 16 Dec 2024

https://github.com/bentoml/BentoDiffusion

BentoDiffusion: A collection of diffusion models served with BentoML

ai diffusion-models fine-tuning kubernetes lora model-serving stable-diffusion

Last synced: 17 Dec 2024

https://github.com/ai-hypercomputer/jetstream

JetStream is a throughput and memory optimized engine for LLM inference on XLA devices, starting with TPUs (and GPUs in future -- PRs welcome).

gemma gpt gpu inference jax large-language-models llama llama2 llm llm-inference llmops mlops model-serving pytorch tpu transformer

Last synced: 16 Dec 2024

https://github.com/AI-Hypercomputer/JetStream

JetStream is a throughput and memory optimized engine for LLM inference on XLA devices, starting with TPUs (and GPUs in future -- PRs welcome).

gemma gpt gpu inference jax large-language-models llama llama2 llm llm-inference llmops mlops model-serving pytorch tpu transformer

Last synced: 01 Nov 2024

https://github.com/lightbend/kafka-with-akka-streams-kafka-streams-tutorial

Code samples for the Lightbend tutorial on writing microservices with Akka Streams, Kafka Streams, and Kafka

akka kafka-streams model-serving

Last synced: 12 Nov 2024

https://github.com/google/jetstream

JetStream is a throughput and memory optimized engine for LLM inference on XLA devices, starting with TPUs (and GPUs in future -- PRs welcome).

gemma gpt gpu inference jax large-language-models llama llama2 llm llm-inference llmops mlops model-serving pytorch tpu transformer

Last synced: 29 Sep 2024

https://github.com/project-monai/monai-deploy-app-sdk

MONAI Deploy App SDK offers a framework and associated tools to design, develop and verify AI-driven applications in the healthcare imaging domain.

ai deep-learning deploy dicom healthcare image-processing machine-learning medical-imaging ml ml-infrastructure ml-platform mlops model-deployment model-serving monai pipeline python pytorch workflow

Last synced: 15 Dec 2024

https://github.com/messense/fasttext-serving

fastText model serving service

fasttext model-server model-serving nlp

Last synced: 16 Dec 2024

https://github.com/bentoml/clip-api-service

CLIP as a service - Embed image and sentences, object recognition, visual reasoning, image classification and reverse image search

ai-applications clip cloud-native mlops model-inference model-inference-service model-serving openai-clip

Last synced: 13 Nov 2024

https://github.com/bentoml/ocr-as-a-service

Turn any OCR models into online inference API endpoint 🚀 🌖

ai-applications model-deployment model-serving ocr ocr-python

Last synced: 13 Nov 2024

https://github.com/bentoml/transformers-nlp-service

Online Inference API for NLP Transformer models - summarization, text classification, sentiment analysis and more

llm llmops mlops model-deployment model-inference-service model-serving nlp nlp-machine-learning online-inference transformer

Last synced: 13 Nov 2024

https://github.com/kspviswa/pyomlx

A wannabe Ollama equivalent for Apple MlX models

chatbot mlx model-serving

Last synced: 06 Dec 2024

https://github.com/kspviswa/PyOMlx

A wannabe Ollama equivalent for Apple MlX models

chatbot mlx model-serving

Last synced: 07 Nov 2024

https://github.com/ai-hypercomputer/jetstream-pytorch

PyTorch/XLA integration with JetStream (https://github.com/google/JetStream) for LLM inference"

attention batching gemma inference llama llama2 llm llm-inference model-serving pytorch tpu

Last synced: 18 Dec 2024

https://github.com/lightbend/kubeflow-recommender

Kubeflow example of machine learning/model serving

kubeflow machine-learning model-serving

Last synced: 12 Nov 2024

https://github.com/ml-libs/mlserve

mlserve turns your python models into RESTful API, serves web page with form generated to match your input data.

machine-learning mlserve model-deployment model-serving scikit-learn

Last synced: 14 Oct 2024

https://github.com/yuruofeifei/mms

MXNet Model Serving

model-serving mxnet

Last synced: 08 Nov 2024

https://github.com/galileo-galilei/kedro-mlflow-tutorial

A tutorial on how to use kedro-mlflow plugin (https://github.com/Galileo-Galilei/kedro-mlflow) to synchronize training and inference and serve kedro pipeline

kedro kedro-mlflow kedro-tutorial mlflow mlops model-serving

Last synced: 18 Nov 2024

https://github.com/animator/titus2

Titus 2 : Portable Format for Analytics (PFA) implementation for Python 3.4+

analytics inference inference-engine ml-engine model-deployment model-evaluation model-serving pfa pfa-standard pmml python scoring scoring-engine titus

Last synced: 28 Nov 2024

https://github.com/h2oai/mlops-dai-runtimes

Production ready templates for deploying Driverless AI (DAI) scorers. https://h2oai.github.io/dai-deployment-templates/

h2o h2oai machine-learning model-deployment model-server model-serving mojo

Last synced: 06 Nov 2024

https://github.com/bentoml/fraud-detection-model-serving

Online model serving with Fraud Detection model trained with XGBoost on IEEE-CIS dataset

ai-applications fraud-detection model-deployment model-serving

Last synced: 13 Nov 2024

https://github.com/galileo-galilei/kedro-serving

A kedro-plugin to serve Kedro Pipelines as API

fastapi kedro kedro-plugin mlops model-serving pipeline-serving serving

Last synced: 18 Nov 2024

https://github.com/bentoml/diffusers-examples

API serving for your diffusers models

bentoml diffusers model-deployment model-serving

Last synced: 13 Nov 2024

https://github.com/logicalclocks/machine-learning-api

Hopsworks Machine Learning Api 🚀 Model management with a model registry and model serving

model-registry model-serving

Last synced: 11 Nov 2024

https://github.com/saivarunk/krypton

Model Server for ML and DL Models built using FastAPI

deep-learning fastapi machine-learning model-serving rest-api

Last synced: 10 Nov 2024

https://github.com/algorithmiaio/algorithmia-modeldeployment-action

Algorithmia Github Action capable of running Jupyter notebooks to create the ML model, uploading the model and updating the algorithm at Algorithmia

algorithmia ci-cd githubaction-workflow githubactions machine-learning model-deployment model-serving

Last synced: 18 Dec 2024

https://github.com/rapidrabbit76/fastapi-deep-learning-model-micro-batching-serving

FastAPI pytorch model serving with micro batching

fastapi model-serving pytorch

Last synced: 23 Nov 2024

https://github.com/md-emon-hasan/bentoml

BentoML is a high-performance model serving framework it provides various scripts and configurations to help streamline and deployment process.

ai bentoml data-science ml-engineering mlops model-deployment model-serving

Last synced: 13 Nov 2024

https://github.com/algorithmiaio/githubactions-modeldeployment-demo-algorithmiaalgo

Demo ML repository, using Algorithmia Model Deployment Github Action, to auto deploy on an algorithm hosted on Algorithmia

algorithmia ci-cd githubaction-workflow githubactions jupyter-notebook machine-learning model-deployment model-serving xgboost

Last synced: 18 Dec 2024

https://github.com/algorithmiaio/githubactions-modeldeployment-template

Template ML repository to get started with Algorithmia Model Deployment Github Action integration

algorithmia cicd github-actions inference machine-learning model-deployment model-serving

Last synced: 18 Dec 2024

https://github.com/mpolinowski/ray-serve-model

Using Ray Serve for ML Model Serving

consensus model-serving python ray

Last synced: 30 Nov 2024

https://github.com/algorithmiaio/githubactions-modeldeployment-demo-githubalgo

Demo ML repository, using Algorithmia Model Deployment Github Action, to auto deploy on an Algorithmia algorithm backed by Github

algorithmia ci-cd githubaction-workflow githubactions jupyter-notebook machine-learning model-deployment model-serving xgboost

Last synced: 18 Dec 2024

https://github.com/kristofferv98/whisper_turboapi

An optimized FastAPI server for OpenAI's Whisper whisper-large-v3-turbo model using MLX turbo optimization

ai api asynchronous audio audio-processing fastapi huggingface machine-learning macos mlx model-serving nlp openai optimization python speech-to-text synchronous transcription whisper whisper-turbo

Last synced: 14 Dec 2024

https://github.com/wtlow003/modal-llm-serving

Examples of serving LLM on Modal.

llm lmdeploy modal model-serving openai openai-api sglang vllm

Last synced: 15 Nov 2024

https://github.com/duketemon/python-model-registry

Model Registry is the service that exposes API to save, fetch and delete machine learning models

machine-learning model-registry model-serving python

Last synced: 18 Dec 2024

https://github.com/redhat-na-ssa/demo-triton-yolo

Customize Nvidia Triton to use OpenShift Source to Image building

data-science model-serving nvidia openshift triton

Last synced: 04 Dec 2024