https://github.com/winstxnhdw/llm-api
A fast CPU-based API for Llama 3.2 using CTranslate2, hosted on Hugging Face Spaces.
https://github.com/winstxnhdw/llm-api
ctranslate2 docker huggingface huggingface-spaces llama transformers uv
Last synced: 3 days ago
JSON representation
A fast CPU-based API for Llama 3.2 using CTranslate2, hosted on Hugging Face Spaces.
- Host: GitHub
- URL: https://github.com/winstxnhdw/llm-api
- Owner: winstxnhdw
- Created: 2023-12-03T10:13:57.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2025-06-23T00:00:34.000Z (11 days ago)
- Last Synced: 2025-06-23T00:29:12.825Z (11 days ago)
- Topics: ctranslate2, docker, huggingface, huggingface-spaces, llama, transformers, uv
- Language: Python
- Homepage: https://huggingface.co/spaces/winstxnhdw/llm-api
- Size: 851 KB
- Stars: 0
- Watchers: 2
- Forks: 2
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# llm-api
[](https://github.com/winstxnhdw/llm-api/actions/workflows/main.yml)
[](https://github.com/winstxnhdw/llm-api/actions/workflows/deploy.yml)
[](https://github.com/winstxnhdw/llm-api/actions/workflows/formatter.yml)[](https://huggingface.co/spaces/winstxnhdw/llm-api)
[](https://github.com/winstxnhdw/llm-api/compare)A fast CPU-based API for Llama-3.2, hosted on Hugging Face Spaces. To achieve faster executions, we are using [CTranslate2](https://github.com/OpenNMT/CTranslate2) as our inference engine.
## Usage
Simply cURL the endpoint like in the following.
```bash
curl -N 'https://winstxnhdw-llm-api.hf.space/api/v1/chat' \
-H 'Content-Type: application/json' \
-d \
'{
"messages": [
{
"role": "user",
"content": "What is the capital of France?"
}
]
}'
```## Development
There are a few ways to run `llm-api` locally for development.
### Local
If you spin up the server using `uv`, you may access the Swagger UI at [localhost:49494/schema/swagger](http://localhost:49494/schema/swagger).
```bash
uv run llm-api
```### Docker
You can access the Swagger UI at [localhost:7860/schema/swagger](http://localhost:7860/schema/swagger) after spinning the server up with Docker.
```bash
docker build -f Dockerfile.build -t llm-api .
docker run --rm -e SERVER_PORT=7860 -p 7860:7860 llm-api
```