https://github.com/dougeeai/llama-cpp-python-wheels

Pre-built wheels for llama-cpp-python across platforms and CUDA versions
https://github.com/dougeeai/llama-cpp-python-wheels

ampere cuda cuda13 gguf llama-cpp-python llm machine-learning prebuilt python313 rtx3060 rtx3070 rtx3080 rtx3090 wheels windows

Last synced: 4 months ago
JSON representation

Pre-built wheels for llama-cpp-python across platforms and CUDA versions

Host: GitHub
URL: https://github.com/dougeeai/llama-cpp-python-wheels
Owner: dougeeai
License: mit
Created: 2025-11-01T18:51:26.000Z (4 months ago)
Default Branch: main
Last Pushed: 2025-11-03T04:45:17.000Z (4 months ago)
Last Synced: 2025-11-03T05:26:00.228Z (4 months ago)
Topics: ampere, cuda, cuda13, gguf, llama-cpp-python, llm, machine-learning, prebuilt, python313, rtx3060, rtx3070, rtx3080, rtx3090, wheels, windows
Homepage:
Size: 12.7 KB
Stars: 0
Watchers: 0
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

          # llama-cpp-python-wheels

Pre-built wheels for llama-cpp-python across platforms and CUDA versions.

## Available Wheels

### RTX 30 Series & Ampere Professional (Ampere - sm_86)

**Supported GPUs:** RTX 3060, RTX 3060 Ti, RTX 3070, RTX 3070 Ti, RTX 3080, RTX 3080 Ti, RTX 3090, RTX 3090 Ti, RTX A2000, RTX A4000, RTX A4500, RTX A5000, RTX A5500, RTX A6000

| File | OS | Python | CUDA | Driver | Size |

|------|-----|--------|------|--------|------|

| [llama_cpp_python-0.3.16+cuda13.0.sm86.ampere-cp313-cp313-win_amd64.whl](https://github.com/dougeeai/llama-cpp-python-wheels/releases/download/v0.3.16-cuda13.0-sm86-py313/llama_cpp_python-0.3.16+cuda13.0.sm86.ampere-cp313-cp313-win_amd64.whl) | Windows | 3.13 | 13.0 | 580+ | 61.4 MB |

| [llama_cpp_python-0.3.16+cuda13.0.sm86.ampere-cp312-cp312-win_amd64.whl](https://github.com/dougeeai/llama-cpp-python-wheels/releases/download/v0.3.16-cuda13.0-sm86-py312/llama_cpp_python-0.3.16+cuda13.0.sm86.ampere-cp312-cp312-win_amd64.whl) | Windows | 3.12 | 13.0 | 580+ | 61.4 MB |

| [llama_cpp_python-0.3.16+cuda13.0.sm86.ampere-cp311-cp311-win_amd64.whl](https://github.com/dougeeai/llama-cpp-python-wheels/releases/download/v0.3.16-cuda13.0-sm86-py311/llama_cpp_python-0.3.16+cuda13.0.sm86.ampere-cp311-cp311-win_amd64.whl) | Windows | 3.11 | 13.0 | 580+ | 61.4 MB |

| [llama_cpp_python-0.3.16+cuda13.0.sm86.ampere-cp310-cp310-win_amd64.whl](https://github.com/dougeeai/llama-cpp-python-wheels/releases/download/v0.3.16-cuda13.0-sm86-py310/llama_cpp_python-0.3.16+cuda13.0.sm86.ampere-cp310-cp310-win_amd64.whl) | Windows | 3.10 | 13.0 | 580+ | 61.4 MB |

| [llama_cpp_python-0.3.16+cuda12.1.sm86.ampere-cp313-cp313-win_amd64.whl](https://github.com/dougeeai/llama-cpp-python-wheels/releases/download/v0.3.16-cuda12.1-sm86-py313/llama_cpp_python-0.3.16+cuda12.1.sm86.ampere-cp313-cp313-win_amd64.whl) | Windows | 3.13 | 12.1 | 525.60.13+ | 92.2 MB |

| [llama_cpp_python-0.3.16+cuda12.1.sm86.ampere-cp312-cp312-win_amd64.whl](https://github.com/dougeeai/llama-cpp-python-wheels/releases/download/v0.3.16-cuda12.1-sm86-py312/llama_cpp_python-0.3.16+cuda12.1.sm86.ampere-cp312-cp312-win_amd64.whl) | Windows | 3.12 | 12.1 | 525.60.13+ | 61.4 MB |

| [llama_cpp_python-0.3.16+cuda12.1.sm86.ampere-cp311-cp311-win_amd64.whl](https://github.com/dougeeai/llama-cpp-python-wheels/releases/download/v0.3.16-cuda12.1-sm86-py311/llama_cpp_python-0.3.16+cuda12.1.sm86.ampere-cp311-cp311-win_amd64.whl) | Windows | 3.11 | 12.1 | 525.60.13+ | 61.4 MB |

| [llama_cpp_python-0.3.16+cuda12.1.sm86.ampere-cp310-cp310-win_amd64.whl](https://github.com/dougeeai/llama-cpp-python-wheels/releases/download/v0.3.16-cuda12.1-sm86-py310/llama_cpp_python-0.3.16+cuda12.1.sm86.ampere-cp310-cp310-win_amd64.whl) | Windows | 3.10 | 12.1 | 525.60.13+ | 61.4 MB |

| [llama_cpp_python-0.3.16+cuda11.8.sm86.ampere-cp313-cp313-win_amd64.whl](https://github.com/dougeeai/llama-cpp-python-wheels/releases/download/v0.3.16-cuda11.8-sm86-py313/llama_cpp_python-0.3.16+cuda11.8.sm86.ampere-cp313-cp313-win_amd64.whl) | Windows | 3.13 | 11.8 | 450.80.02+ | 100.6 MB |

| [llama_cpp_python-0.3.16+cuda11.8.sm86.ampere-cp312-cp312-win_amd64.whl](https://github.com/dougeeai/llama-cpp-python-wheels/releases/download/v0.3.16-cuda11.8-sm86-py312/llama_cpp_python-0.3.16+cuda11.8.sm86.ampere-cp312-cp312-win_amd64.whl) | Windows | 3.12 | 11.8 | 450.80.02+ | 100.6 MB |

| [llama_cpp_python-0.3.16+cuda11.8.sm86.ampere-cp311-cp311-win_amd64.whl](https://github.com/dougeeai/llama-cpp-python-wheels/releases/download/v0.3.16-cuda11.8-sm86-py311/llama_cpp_python-0.3.16+cuda11.8.sm86.ampere-cp311-cp311-win_amd64.whl) | Windows | 3.11 | 11.8 | 450.80.02+ | 100.6 MB |

| [llama_cpp_python-0.3.16+cuda11.8.sm86.ampere-cp310-cp310-win_amd64.whl](https://github.com/dougeeai/llama-cpp-python-wheels/releases/download/v0.3.16-cuda11.8-sm86-py310/llama_cpp_python-0.3.16+cuda11.8.sm86.ampere-cp310-cp310-win_amd64.whl) | Windows | 3.10 | 11.8 | 450.80.02+ | 100.6 MB |

### RTX 40 Series & Ada Professional (Ada Lovelace - sm_89)

**Supported GPUs:** RTX 4060, RTX 4060 Ti, RTX 4070, RTX 4070 Ti, RTX 4070 Ti Super, RTX 4080, RTX 4080 Super, RTX 4090, RTX 6000 Ada, RTX 5000 Ada, RTX 4500 Ada, RTX 4000 Ada, RTX 4000 SFF Ada, L40, L40S, L4

| File | OS | Python | CUDA | Driver | Size |

|------|-----|--------|------|--------|------|

| [llama_cpp_python-0.3.16+cuda13.0.sm89.ada-cp313-cp313-win_amd64.whl](https://github.com/dougeeai/llama-cpp-python-wheels/releases/download/v0.3.16-cuda13.0-sm89-py313/llama_cpp_python-0.3.16+cuda13.0.sm89.ada-cp313-cp313-win_amd64.whl) | Windows | 3.13 | 13.0 | 580+ | 61.4 MB |

| [llama_cpp_python-0.3.16+cuda13.0.sm89.ada-cp312-cp312-win_amd64.whl](https://github.com/dougeeai/llama-cpp-python-wheels/releases/download/v0.3.16-cuda13.0-sm89-py312/llama_cpp_python-0.3.16+cuda13.0.sm89.ada-cp312-cp312-win_amd64.whl) | Windows | 3.12 | 13.0 | 580+ | 61.4 MB |

| [llama_cpp_python-0.3.16+cuda13.0.sm89.ada-cp311-cp311-win_amd64.whl](https://github.com/dougeeai/llama-cpp-python-wheels/releases/download/v0.3.16-cuda13.0-sm89-py311/llama_cpp_python-0.3.16+cuda13.0.sm89.ada-cp311-cp311-win_amd64.whl) | Windows | 3.11 | 13.0 | 580+ | 61.4 MB |

| [llama_cpp_python-0.3.16+cuda13.0.sm89.ada-cp310-cp310-win_amd64.whl](https://github.com/dougeeai/llama-cpp-python-wheels/releases/download/v0.3.16-cuda13.0-sm89-py310/llama_cpp_python-0.3.16+cuda13.0.sm89.ada-cp310-cp310-win_amd64.whl) | Windows | 3.10 | 13.0 | 580+ | 61.3 MB |

| [llama_cpp_python-0.3.16+cuda12.1.sm89.ada-cp313-cp313-win_amd64.whl](https://github.com/dougeeai/llama-cpp-python-wheels/releases/download/v0.3.16-cuda12.1-sm89-py313/llama_cpp_python-0.3.16+cuda12.1.sm89.ada-cp313-cp313-win_amd64.whl) | Windows | 3.13 | 12.1 | 525.60.13+ | 100.6 MB |

| [llama_cpp_python-0.3.16+cuda12.1.sm89.ada-cp312-cp312-win_amd64.whl](https://github.com/dougeeai/llama-cpp-python-wheels/releases/download/v0.3.16-cuda12.1-sm89-py312/llama_cpp_python-0.3.16+cuda12.1.sm89.ada-cp312-cp312-win_amd64.whl) | Windows | 3.12 | 12.1 | 525.60.13+ | 100.6 MB |

| [llama_cpp_python-0.3.16+cuda12.1.sm89.ada-cp311-cp311-win_amd64.whl](https://github.com/dougeeai/llama-cpp-python-wheels/releases/download/v0.3.16-cuda12.1-sm89-py311/llama_cpp_python-0.3.16+cuda12.1.sm89.ada-cp311-cp311-win_amd64.whl) | Windows | 3.11 | 12.1 | 525.60.13+ | 100.6 MB |

| [llama_cpp_python-0.3.16+cuda12.1.sm89.ada-cp310-cp310-win_amd64.whl](https://github.com/dougeeai/llama-cpp-python-wheels/releases/download/v0.3.16-cuda12.1-sm89-py310/llama_cpp_python-0.3.16+cuda12.1.sm89.ada-cp310-cp310-win_amd64.whl) | Windows | 3.10 | 12.1 | 525.60.13+ | 100.6 MB |

| [llama_cpp_python-0.3.16+cuda11.8.sm89.ada-cp313-cp313-win_amd64.whl](https://github.com/dougeeai/llama-cpp-python-wheels/releases/download/v0.3.16-cuda11.8-sm89-py313/llama_cpp_python-0.3.16+cuda11.8.sm89.ada-cp313-cp313-win_amd64.whl) | Windows | 3.13 | 11.8 | 450.80.02+ | 100.5 MB |

| [llama_cpp_python-0.3.16+cuda11.8.sm89.ada-cp312-cp312-win_amd64.whl](https://github.com/dougeeai/llama-cpp-python-wheels/releases/download/v0.3.16-cuda11.8-sm89-py312/llama_cpp_python-0.3.16+cuda11.8.sm89.ada-cp312-cp312-win_amd64.whl) | Windows | 3.12 | 11.8 | 450.80.02+ | 100.5 MB |

| [llama_cpp_python-0.3.16+cuda11.8.sm89.ada-cp311-cp311-win_amd64.whl](https://github.com/dougeeai/llama-cpp-python-wheels/releases/download/v0.3.16-cuda11.8-sm89-py311/llama_cpp_python-0.3.16+cuda11.8.sm89.ada-cp311-cp311-win_amd64.whl) | Windows | 3.11 | 11.8 | 450.80.02+ | 100.5 MB |

| [llama_cpp_python-0.3.16+cuda11.8.sm89.ada-cp310-cp310-win_amd64.whl](https://github.com/dougeeai/llama-cpp-python-wheels/releases/download/v0.3.16-cuda11.8-sm89-py310/llama_cpp_python-0.3.16+cuda11.8.sm89.ada-cp310-cp310-win_amd64.whl) | Windows | 3.10 | 11.8 | 450.80.02+ | 100.5 MB |

## Installation

Download the appropriate wheel from [Releases](../../releases) and install:

```bash

pip install llama_cpp_python-[version]+cuda[cuda_version].sm[arch].[gpu]-cp[python]-cp[python]-win_amd64.whl

```

## Verification

```python

from llama_cpp import Llama

print("llama-cpp-python with CUDA support installed successfully")

```

## Build Notes

Built with:

- Visual Studio 2019/2022 Build Tools

- CUDA Toolkit 11.8, 12.1, 13.0

- CMAKE_CUDA_ARCHITECTURES=86 (Ampere) or 89 (Ada)

## License

MIT

Wheels are built from [llama-cpp-python](https://github.com/abetlen/llama-cpp-python) (MIT License)

## Contributing

**Need a different configuration?**

Open an [issue](https://github.com/dougeeai/llama-cpp-python-wheels/issues) with:

- OS (Windows/Linux/macOS)

- Python version

- CUDA version (if applicable)

- GPU model

I'll try to build it if I have access to similar hardware.

## Contact

Questions or issues? Open a [GitHub issue](https://github.com/dougeeai/llama-cpp-python-wheels/issues).

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/dougeeai/llama-cpp-python-wheels

Awesome Lists containing this project

README