https://github.com/ittia-research/inference

Embedding inferencing on TPU with PyTorch XLA
https://github.com/ittia-research/inference

embedded inference tpu

Last synced: 6 months ago
JSON representation

Embedding inferencing on TPU with PyTorch XLA

Host: GitHub
URL: https://github.com/ittia-research/inference
Owner: ittia-research
License: mit
Created: 2025-04-10T05:01:11.000Z (6 months ago)
Default Branch: main
Last Pushed: 2025-04-10T05:09:17.000Z (6 months ago)
Last Synced: 2025-04-24T00:12:37.254Z (6 months ago)
Topics: embedded, inference, tpu
Language: Python
Homepage:
Size: 9.77 KB
Stars: 0
Watchers: 0
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE
- Roadmap: docs/roadmap.md

Awesome Lists containing this project

README

          Embedding inferencing on TPU with PyTorch XLA.

§  

VISITOR: Why another inferencing server?  

PROJECT: There are a lot of good inferencing project out there. The project is mainly for edge cases in which we can not find a good existing project yet, mainly embedding inferencing on TPU device. We have noticed that vLLM supports TPU and embedding, but we can not make it working. If you know open-source project that works well, let us know.

## Features

- Supported devices: Google TPU

- Supported models: Alibaba-NLP/gte-Qwen2-1.5B-instruct

- API: REST compateble with OpenAI format

## Quick Start

```bash

git pull https://github.com/ittia-research/inference

cd inference

docker compose up -d

docker compose logs -f

```

## Other Docs

[Roadmap](./docs/roadmap.md)

## Acknowledgements

- TPU Research Cloud team at Google

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/ittia-research/inference

Awesome Lists containing this project

README