https://github.com/autonomi-ai/nos

⚡️ A fast and flexible PyTorch inference server that runs locally, on any cloud or AI HW.
https://github.com/autonomi-ai/nos

computer-vision generative-ai inference inference-acceleration llm-inference machine-learning

Last synced: 1 day ago
JSON representation

⚡️ A fast and flexible PyTorch inference server that runs locally, on any cloud or AI HW.

Host: GitHub
URL: https://github.com/autonomi-ai/nos
Owner: autonomi-ai
License: apache-2.0
Created: 2023-04-16T22:20:05.000Z (about 3 years ago)
Default Branch: main
Last Pushed: 2024-06-08T19:22:47.000Z (about 2 years ago)
Last Synced: 2026-06-08T14:04:06.542Z (27 days ago)
Topics: computer-vision, generative-ai, inference, inference-acceleration, llm-inference, machine-learning
Language: Python
Homepage: https://docs.nos.run/
Size: 16.5 MB
Stars: 147
Watchers: 1
Forks: 12
Open Issues: 60
Metadata Files:
- Readme: README.md
- Contributing: docs/CONTRIBUTING.md
- License: LICENSE
- Support: docs/support.md
- Roadmap: docs/roadmap.md

Awesome Lists containing this project

README

          





Website | Docs | Tutorials | Playground | Blog | Discord






















**NOS** is a fast and flexible PyTorch inference server that runs on any cloud or AI HW.

## 🛠️ Key Features

- 👩‍💻 **Easy-to-use**: Built for [PyTorch](https://pytorch.org/) and designed to optimize, serve and auto-scale Pytorch models in production without compromising on developer experience.

- 🥷 **Multi-modal & Multi-model**: Serve multiple foundational AI models ([LLMs](https://github.com/autonomi-ai/nos/blob/main/nos/models/llm.py), [Diffusion](https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0), [Embeddings](https://github.com/autonomi-ai/nos/blob/main/nos/models/clip.py), [Speech-to-Text](https://github.com/autonomi-ai/nos/blob/main/nos/models/clip.py) and [Object Detection](https://github.com/autonomi-ai/nos/blob/main/nos/models/yolox.py)) simultaneously, in a single server.

- ⚙️ **HW-aware Runtime:** Deploy PyTorch models effortlessly on modern AI accelerators (NVIDIA GPUs, AWS Inferentia2, AMD - coming soon, and even CPUs).

- ☁️ **Cloud-agnostic Containers:** Run on any cloud (AWS, GCP, Azure, Lambda Labs, On-Prem) with our ready-to-use inference server containers.

## 🔥 What's New

* **[Feb 2024]** ✍️ [blog] [Introducing the NOS Inferentia2 (`inf2`) runtime](https://docs.nos.run/docs/blog/introducing-the-nos-inferentia2-runtime.html).

* **[Jan 2024]** ✍️ [blog] [Serving LLMs on a budget](https://docs.nos.run/docs/blog/serving-llms-on-a-budget.html) with [SkyServe](https://skypilot.readthedocs.io/en/latest/serving/sky-serve.html).

* **[Jan 2024]** 📚 [docs] [NOS x SkyPilot Integration](https://docs.nos.run/docs/integrations/skypilot.html) page!

* **[Jan 2024]** ✍️ [blog] [Getting started with NOS tutorials](https://docs.nos.run/docs/blog/-getting-started-with-nos-tutorials.html) is available [here](./examples/tutorials/)!

* **[Dec 2023]** 🛝 [repo] We open-sourced the [NOS playground](https://github.com/autonomi-ai/nos-playground) to help you get started with more examples built on NOS!

## 🚀 Quickstart

We highly recommend that you go to our [quickstart guide](https://docs.nos.run/docs/quickstart.html) to get started. To install the NOS client, you can run the following command:

```bash

conda create -n nos python=3.8 -y

conda activate nos

pip install torch-nos

```

Once the client is installed, you can start the NOS server via the NOS `serve` CLI. This will automatically detect your local environment, download the docker runtime image and spin up the NOS server:

```bash

nos serve up --http --logging-level INFO

```

You are now ready to run your [first inference request](#👩‍💻-what-can-nos-do) with NOS! You can run any of the following commands to try things out. You can set the logging level to `DEBUG` if you want more detailed information from the server.

## 👩‍💻 **What can NOS do?**

### 💬 Chat / LLM Agents (ChatGPT-as-a-Service)

---

NOS provides an OpenAI-compatible server with streaming support so that you can connect your favorite OpenAI-compatible LLM client to talk to NOS.






 API / Usage




gRPC API ⚡

```python

from nos.client import Client

client = Client()

model = client.Module("TinyLlama/TinyLlama-1.1B-Chat-v1.0")

response = model.chat(message="Tell me a story of 1000 words with emojis", _stream=True)

```

REST API

```bash

curl \

-X POST http://localhost:8000/v1/chat/completions \

-H "Content-Type: application/json" \

-d '{

    "model": "TinyLlama/TinyLlama-1.1B-Chat-v1.0",

    "messages": [{

        "role": "user",

        "content": "Tell me a story of 1000 words with emojis"

    }],

    "temperature": 0.7,

    "stream": true

  }'

```

### 🏞️ Image Generation (Stable-Diffusion-as-a-Service)

---

Build MidJourney discord bots in seconds.






 API / Usage




gRPC API ⚡

```python

from nos.client import Client

client = Client()

sdxl = client.Module("stabilityai/stable-diffusion-xl-base-1-0")

image, = sdxl(prompts=["hippo with glasses in a library, cartoon styling"],

              width=1024, height=1024, num_images=1)

```

REST API

```bash

curl \

-X POST http://localhost:8000/v1/infer \

-H 'Content-Type: application/json' \

-d '{

    "model_id": "stabilityai/stable-diffusion-xl-base-1-0",

    "inputs": {

        "prompts": ["hippo with glasses in a library, cartoon styling"],

        "width": 1024, "height": 1024,

        "num_images": 1

    }

}'

```

### 🧠 Text & Image Embedding (CLIP-as-a-Service)

---

Build [scalable semantic search of images/videos](https://docs.nos.run/docs/demos/video-search.html) in minutes.






 API / Usage




gRPC API ⚡

```python

from nos.client import Client

client = Client()

clip = client.Module("openai/clip-vit-base-patch32")

txt_vec = clip.encode_text(texts=["fox jumped over the moon"])

```

REST API

```bash

curl \

-X POST http://localhost:8000/v1/infer \

-H 'Content-Type: application/json' \

-d '{

    "model_id": "openai/clip-vit-base-patch32",

    "method": "encode_text",

    "inputs": {

        "texts": ["fox jumped over the moon"]

    }

}'

```

### 🎙️ Audio Transcription (Whisper-as-a-Service)

---

Perform [real-time audio transcription](./examples/tutorials/04-serving-multiple-models/) using Whisper.






 API / Usage




gRPC API ⚡

```python

from pathlib import Path

from nos.client import Client

client = Client()

model = client.Module("openai/whisper-small.en")

with client.UploadFile(Path("audio.wav")) as remote_path:

  response = model(path=remote_path)

# {"chunks": ...}

```

REST API

```bash

curl \

-X POST http://localhost:8000/v1/infer/file \

-H 'accept: application/json' \

-H 'Content-Type: multipart/form-data' \

-F 'model_id=openai/whisper-small.en' \

-F 'file=@audio.wav'

```

### 🧐 Object Detection (YOLOX-as-a-Service)

---

Run classical computer-vision tasks in 2 lines of code.






 API / Usage




gRPC API ⚡

```python

from pathlib import Path

from nos.client import Client

client = Client()

model = client.Module("yolox/medium")

response = model(images=[Image.open("image.jpg")])

```

REST API

```bash

curl \

-X POST http://localhost:8000/v1/infer/file \

-H 'accept: application/json' \

-H 'Content-Type: multipart/form-data' \

-F 'model_id=yolox/medium' \

-F 'file=@image.jpg'

```

### ⚒️ Custom models

---

Want to run models not supported by NOS? You can easily add your own models following the examples in the [NOS Playground](https://github.com/autonomi-ai/nos-playground/tree/main/examples).

## 📄 License

This project is licensed under the [Apache-2.0 License](LICENSE).

## 📡 Telemetry

NOS collects anonymous usage data using [Sentry](https://sentry.io/). This is used to help us understand how the community is using NOS and to help us prioritize features. You can opt-out of telemetry by setting `NOS_TELEMETRY_ENABLED=0`.

## 🤝 Contributing

We welcome contributions! Please see our [contributing guide](CONTRIBUTING.md) for more information.

## 🔗  Quick Links

* 💬 Send us an email at [support@autonomi.ai](mailto:support@autonomi.ai) or join our [Discord](https://discord.gg/QAGgvTuvgg) for help.

* 📣 Follow us on [Twitter](https://twitter.com/autonomi\_ai), and [LinkedIn](https://www.linkedin.com/company/autonomi-ai) to keep up-to-date on our products.




 .md-typeset h1, .md-content__button { display: none; }

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/autonomi-ai/nos

Awesome Lists containing this project

README