Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/nikolaydubina/basic-openai-pytorch-server
Minimal HTTP inference server in OpenAI API with Pytorch and CUDA
https://github.com/nikolaydubina/basic-openai-pytorch-server
cuda docker llm openai pytorch server
Last synced: 4 days ago
JSON representation
Minimal HTTP inference server in OpenAI API with Pytorch and CUDA
- Host: GitHub
- URL: https://github.com/nikolaydubina/basic-openai-pytorch-server
- Owner: nikolaydubina
- License: mit
- Created: 2024-10-17T11:28:49.000Z (4 months ago)
- Default Branch: master
- Last Pushed: 2024-10-20T09:54:41.000Z (4 months ago)
- Last Synced: 2025-01-28T17:44:05.705Z (11 days ago)
- Topics: cuda, docker, llm, openai, pytorch, server
- Language: Python
- Homepage:
- Size: 22.5 KB
- Stars: 1
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
Minimal HTTP inference server in OpenAI API[^1].
[![Hits](https://hits.sh/github.com/nikolaydubina/basic-openai-pytorch-server.svg?view=today-total&extraCount=40)](https://hits.sh/github.com/nikolaydubina/basic-openai-pytorch-server/)
> _When you don't want to install countless frameworks, generators, etc. When all you need is small Docker file and single main for http server._
> [!WARNING]
> Limited OpenAI API compatibility.- 100 lines of code
- CUDA
- Pytorch
- HuggingFace models (e.g. Llama 3.2 11B Vision)
- OpenTelemetry
- JSON schema output
- 150 token/s: NVIDIA L4 (GCP `g2-standard-8`) Llama 3.2 11B Vision Instruct[^1]: https://github.com/openai/openai-openapi