https://github.com/ittia-research/inference
Embedding inferencing on TPU with PyTorch XLA
https://github.com/ittia-research/inference
embedded inference tpu
Last synced: 6 months ago
JSON representation
Embedding inferencing on TPU with PyTorch XLA
- Host: GitHub
- URL: https://github.com/ittia-research/inference
- Owner: ittia-research
- License: mit
- Created: 2025-04-10T05:01:11.000Z (6 months ago)
- Default Branch: main
- Last Pushed: 2025-04-10T05:09:17.000Z (6 months ago)
- Last Synced: 2025-04-24T00:12:37.254Z (6 months ago)
- Topics: embedded, inference, tpu
- Language: Python
- Homepage:
- Size: 9.77 KB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
- Roadmap: docs/roadmap.md
Awesome Lists containing this project
README
Embedding inferencing on TPU with PyTorch XLA.
§
VISITOR: Why another inferencing server?
PROJECT: There are a lot of good inferencing project out there. The project is mainly for edge cases in which we can not find a good existing project yet, mainly embedding inferencing on TPU device. We have noticed that vLLM supports TPU and embedding, but we can not make it working. If you know open-source project that works well, let us know.## Features
- Supported devices: Google TPU
- Supported models: Alibaba-NLP/gte-Qwen2-1.5B-instruct
- API: REST compateble with OpenAI format## Quick Start
```bash
git pull https://github.com/ittia-research/inference
cd inference
docker compose up -d
docker compose logs -f
```## Other Docs
[Roadmap](./docs/roadmap.md)## Acknowledgements
- TPU Research Cloud team at Google