https://github.com/autonomi-ai/nos
⚡️ A fast and flexible PyTorch inference server that runs locally, on any cloud or AI HW.
https://github.com/autonomi-ai/nos
computer-vision generative-ai inference inference-acceleration llm-inference machine-learning
Last synced: 1 day ago
JSON representation
⚡️ A fast and flexible PyTorch inference server that runs locally, on any cloud or AI HW.
- Host: GitHub
- URL: https://github.com/autonomi-ai/nos
- Owner: autonomi-ai
- License: apache-2.0
- Created: 2023-04-16T22:20:05.000Z (about 3 years ago)
- Default Branch: main
- Last Pushed: 2024-06-08T19:22:47.000Z (about 2 years ago)
- Last Synced: 2026-06-08T14:04:06.542Z (27 days ago)
- Topics: computer-vision, generative-ai, inference, inference-acceleration, llm-inference, machine-learning
- Language: Python
- Homepage: https://docs.nos.run/
- Size: 16.5 MB
- Stars: 147
- Watchers: 1
- Forks: 12
- Open Issues: 60
-
Metadata Files:
- Readme: README.md
- Contributing: docs/CONTRIBUTING.md
- License: LICENSE
- Support: docs/support.md
- Roadmap: docs/roadmap.md
Awesome Lists containing this project
README
Website | Docs | Tutorials | Playground | Blog | Discord
**NOS** is a fast and flexible PyTorch inference server that runs on any cloud or AI HW.
## 🛠️ Key Features
- 👩💻 **Easy-to-use**: Built for [PyTorch](https://pytorch.org/) and designed to optimize, serve and auto-scale Pytorch models in production without compromising on developer experience.
- 🥷 **Multi-modal & Multi-model**: Serve multiple foundational AI models ([LLMs](https://github.com/autonomi-ai/nos/blob/main/nos/models/llm.py), [Diffusion](https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0), [Embeddings](https://github.com/autonomi-ai/nos/blob/main/nos/models/clip.py), [Speech-to-Text](https://github.com/autonomi-ai/nos/blob/main/nos/models/clip.py) and [Object Detection](https://github.com/autonomi-ai/nos/blob/main/nos/models/yolox.py)) simultaneously, in a single server.
- ⚙️ **HW-aware Runtime:** Deploy PyTorch models effortlessly on modern AI accelerators (NVIDIA GPUs, AWS Inferentia2, AMD - coming soon, and even CPUs).
- ☁️ **Cloud-agnostic Containers:** Run on any cloud (AWS, GCP, Azure, Lambda Labs, On-Prem) with our ready-to-use inference server containers.
## 🔥 What's New
* **[Feb 2024]** ✍️ [blog] [Introducing the NOS Inferentia2 (`inf2`) runtime](https://docs.nos.run/docs/blog/introducing-the-nos-inferentia2-runtime.html).
* **[Jan 2024]** ✍️ [blog] [Serving LLMs on a budget](https://docs.nos.run/docs/blog/serving-llms-on-a-budget.html) with [SkyServe](https://skypilot.readthedocs.io/en/latest/serving/sky-serve.html).
* **[Jan 2024]** 📚 [docs] [NOS x SkyPilot Integration](https://docs.nos.run/docs/integrations/skypilot.html) page!
* **[Jan 2024]** ✍️ [blog] [Getting started with NOS tutorials](https://docs.nos.run/docs/blog/-getting-started-with-nos-tutorials.html) is available [here](./examples/tutorials/)!
* **[Dec 2023]** 🛝 [repo] We open-sourced the [NOS playground](https://github.com/autonomi-ai/nos-playground) to help you get started with more examples built on NOS!
## 🚀 Quickstart
We highly recommend that you go to our [quickstart guide](https://docs.nos.run/docs/quickstart.html) to get started. To install the NOS client, you can run the following command:
```bash
conda create -n nos python=3.8 -y
conda activate nos
pip install torch-nos
```
Once the client is installed, you can start the NOS server via the NOS `serve` CLI. This will automatically detect your local environment, download the docker runtime image and spin up the NOS server:
```bash
nos serve up --http --logging-level INFO
```
You are now ready to run your [first inference request](#👩💻-what-can-nos-do) with NOS! You can run any of the following commands to try things out. You can set the logging level to `DEBUG` if you want more detailed information from the server.
## 👩💻 **What can NOS do?**
### 💬 Chat / LLM Agents (ChatGPT-as-a-Service)
---
NOS provides an OpenAI-compatible server with streaming support so that you can connect your favorite OpenAI-compatible LLM client to talk to NOS.

API / Usage
gRPC API ⚡
```python
from nos.client import Client
client = Client()
model = client.Module("TinyLlama/TinyLlama-1.1B-Chat-v1.0")
response = model.chat(message="Tell me a story of 1000 words with emojis", _stream=True)
```
REST API
```bash
curl \
-X POST http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "TinyLlama/TinyLlama-1.1B-Chat-v1.0",
"messages": [{
"role": "user",
"content": "Tell me a story of 1000 words with emojis"
}],
"temperature": 0.7,
"stream": true
}'
```
### 🏞️ Image Generation (Stable-Diffusion-as-a-Service)
---
Build MidJourney discord bots in seconds.

API / Usage
gRPC API ⚡
```python
from nos.client import Client
client = Client()
sdxl = client.Module("stabilityai/stable-diffusion-xl-base-1-0")
image, = sdxl(prompts=["hippo with glasses in a library, cartoon styling"],
width=1024, height=1024, num_images=1)
```
REST API
```bash
curl \
-X POST http://localhost:8000/v1/infer \
-H 'Content-Type: application/json' \
-d '{
"model_id": "stabilityai/stable-diffusion-xl-base-1-0",
"inputs": {
"prompts": ["hippo with glasses in a library, cartoon styling"],
"width": 1024, "height": 1024,
"num_images": 1
}
}'
```
### 🧠 Text & Image Embedding (CLIP-as-a-Service)
---
Build [scalable semantic search of images/videos](https://docs.nos.run/docs/demos/video-search.html) in minutes.

API / Usage
gRPC API ⚡
```python
from nos.client import Client
client = Client()
clip = client.Module("openai/clip-vit-base-patch32")
txt_vec = clip.encode_text(texts=["fox jumped over the moon"])
```
REST API
```bash
curl \
-X POST http://localhost:8000/v1/infer \
-H 'Content-Type: application/json' \
-d '{
"model_id": "openai/clip-vit-base-patch32",
"method": "encode_text",
"inputs": {
"texts": ["fox jumped over the moon"]
}
}'
```
### 🎙️ Audio Transcription (Whisper-as-a-Service)
---
Perform [real-time audio transcription](./examples/tutorials/04-serving-multiple-models/) using Whisper.

API / Usage
gRPC API ⚡
```python
from pathlib import Path
from nos.client import Client
client = Client()
model = client.Module("openai/whisper-small.en")
with client.UploadFile(Path("audio.wav")) as remote_path:
response = model(path=remote_path)
# {"chunks": ...}
```
REST API
```bash
curl \
-X POST http://localhost:8000/v1/infer/file \
-H 'accept: application/json' \
-H 'Content-Type: multipart/form-data' \
-F 'model_id=openai/whisper-small.en' \
-F 'file=@audio.wav'
```
### 🧐 Object Detection (YOLOX-as-a-Service)
---
Run classical computer-vision tasks in 2 lines of code.

API / Usage
gRPC API ⚡
```python
from pathlib import Path
from nos.client import Client
client = Client()
model = client.Module("yolox/medium")
response = model(images=[Image.open("image.jpg")])
```
REST API
```bash
curl \
-X POST http://localhost:8000/v1/infer/file \
-H 'accept: application/json' \
-H 'Content-Type: multipart/form-data' \
-F 'model_id=yolox/medium' \
-F 'file=@image.jpg'
```
### ⚒️ Custom models
---
Want to run models not supported by NOS? You can easily add your own models following the examples in the [NOS Playground](https://github.com/autonomi-ai/nos-playground/tree/main/examples).
## 📄 License
This project is licensed under the [Apache-2.0 License](LICENSE).
## 📡 Telemetry
NOS collects anonymous usage data using [Sentry](https://sentry.io/). This is used to help us understand how the community is using NOS and to help us prioritize features. You can opt-out of telemetry by setting `NOS_TELEMETRY_ENABLED=0`.
## 🤝 Contributing
We welcome contributions! Please see our [contributing guide](CONTRIBUTING.md) for more information.
## 🔗 Quick Links
* 💬 Send us an email at [support@autonomi.ai](mailto:support@autonomi.ai) or join our [Discord](https://discord.gg/QAGgvTuvgg) for help.
* 📣 Follow us on [Twitter](https://twitter.com/autonomi\_ai), and [LinkedIn](https://www.linkedin.com/company/autonomi-ai) to keep up-to-date on our products.
.md-typeset h1, .md-content__button { display: none; }