https://github.com/winstxnhdw/llm-api
A fast CPU-based API for Llama 3.2 using CTranslate2, hosted on Hugging Face Spaces.
https://github.com/winstxnhdw/llm-api
ctranslate2 docker huggingface huggingface-spaces llama transformers uv
Last synced: about 2 months ago
JSON representation
A fast CPU-based API for Llama 3.2 using CTranslate2, hosted on Hugging Face Spaces.
- Host: GitHub
- URL: https://github.com/winstxnhdw/llm-api
- Owner: winstxnhdw
- Created: 2023-12-03T10:13:57.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2025-03-05T00:12:19.000Z (about 2 months ago)
- Last Synced: 2025-03-05T01:20:47.948Z (about 2 months ago)
- Topics: ctranslate2, docker, huggingface, huggingface-spaces, llama, transformers, uv
- Language: Python
- Homepage: https://huggingface.co/spaces/winstxnhdw/llm-api
- Size: 552 KB
- Stars: 0
- Watchers: 3
- Forks: 1
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# llm-api
[](https://github.com/winstxnhdw/llm-api/actions/workflows/main.yml)
[](https://github.com/winstxnhdw/llm-api/actions/workflows/deploy.yml)
[](https://github.com/winstxnhdw/llm-api/actions/workflows/formatter.yml)
[](https://github.com/winstxnhdw/llm-api/actions/workflows/warmer.yml)[](https://huggingface.co/spaces/winstxnhdw/llm-api)
[](https://github.com/winstxnhdw/llm-api/compare)A fast CPU-based API for Llama-3.2, hosted on Hugging Face Spaces. To achieve faster executions, we are using [CTranslate2](https://github.com/OpenNMT/CTranslate2) as our inference engine.
## Usage
Simply cURL the endpoint like in the following.
```bash
curl -N 'https://winstxnhdw-llm-api.hf.space/api/v1/chat' \
-H 'Content-Type: application/json' \
-d \
'{
"instruction": "What is the capital of Japan?"
}'
```## Development
There are a few ways to run `llm-api` locally for development.
### Local
If you spin up the server using `uv`, you may access the Swagger UI at [localhost:49494/schema/swagger](http://localhost:49494/schema/swagger).
```bash
uv run llm-api
```### Docker
You can access the Swagger UI at [localhost:7860/schema/swagger](http://localhost:7860/schema/swagger) after spinning the server up with Docker.
```bash
docker build -f Dockerfile.build -t llm-api .
docker run --rm -e APP_PORT=7860 -p 7860:7860 llm-api
```