https://github.com/kressety/ggufaction
An automated tool that uses GitHub Actions to download models from Hugging Face, convert them to GGUF format, perform Q8_0 quantization, and upload them to ModelScope | 通过 GitHub Actions 从 Hugging Face 下载模型,将其转换为 GGUF 格式并进行 Q8_0 量化,然后上传到 ModelScope
https://github.com/kressety/ggufaction
Last synced: about 1 month ago
JSON representation
An automated tool that uses GitHub Actions to download models from Hugging Face, convert them to GGUF format, perform Q8_0 quantization, and upload them to ModelScope | 通过 GitHub Actions 从 Hugging Face 下载模型,将其转换为 GGUF 格式并进行 Q8_0 量化,然后上传到 ModelScope
- Host: GitHub
- URL: https://github.com/kressety/ggufaction
- Owner: kressety
- License: mit
- Created: 2025-02-23T06:58:57.000Z (2 months ago)
- Default Branch: main
- Last Pushed: 2025-02-23T13:32:53.000Z (2 months ago)
- Last Synced: 2025-03-11T12:43:14.019Z (about 1 month ago)
- Language: Python
- Homepage:
- Size: 51.8 KB
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
English | [中文](README_zh.md)
# GGUFAction
**GGUFAction** is an automated tool that uses GitHub Actions to download models from Hugging Face, convert them to GGUF format, perform Q8_0 quantization, and upload them to ModelScope. It caches unsupported models to avoid repeated failures and generates a `README.md` with the original model card and quantization details.
## Features
- **Automated Conversion**: Downloads models from Hugging Face, converts them to GGUF format, and quantizes to Q8_0 using `llama.cpp`.
- **Smart Caching**: Records models unsupported for GGUF conversion and terminates early on subsequent attempts.
- **Model Upload**: Uploads the quantized model and a generated `README.md` to ModelScope.
- **Documentation Generation**: Fetches the original model card (YAML section wrapped in `---`) from Hugging Face and adds quantization instructions.## Prerequisites
- **GitHub Account**: Required to run Actions and store Secrets.
- **Hugging Face Account**: Obtain an API token (`HF_API_KEY`).
- **ModelScope Account**: Obtain an API token (`MS_API_KEY`) and username (`MS_USERNAME`).## Installation
1. **Clone the Repository**:
```bash
git clone https://github.com/your-username/GGUFAction.git
cd GGUFAction
```2. **Set Up Secrets**:
In your GitHub repository's `Settings > Secrets and variables > Actions > Secrets`, add:
- `HF_API_KEY`: Hugging Face API token.
- `MS_API_KEY`: ModelScope API token.
- `MS_USERNAME`: ModelScope username.3. **Dependencies**:
The project uses Python 3.13 and the following libraries (listed in `requirements.txt`):
```
huggingface_hub
modelscope
torch
sentencepiece
numpy
transformers
requests
--extra-index-url https://download.pytorch.org/whl/cpu
```
Install dependencies:
```bash
pip install -r requirements.txt
```## Usage
### Via GitHub Actions
1. **Trigger the Workflow**:
- Go to the `Actions` tab in your repository.
- Select the `Model Quantization` workflow.
- Click `Run workflow` and enter the Hugging Face model `repo_id` (e.g., `Classical/Yinka`).2. **Check Results**:
- Review the Actions log (`quantization.log`) to confirm conversion and upload status.
- Check the resulting `{MS_USERNAME}/{model_name}-Q8_0-GGUF` repository on ModelScope.### Local Execution
1. **Set Environment Variables**:
```bash
export HF_API_KEY="your_hf_token"
export MS_API_KEY="your_ms_token"
export MS_USERNAME="your_ms_username"
export REPO_ID="Classical/Yinka"
```2. **Build llama.cpp**:
```bash
git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build -DCMAKE_BUILD_TYPE=Release
cmake --build build --config Release -j$(nproc)
cd ..
```3. **Run the Script**:
```bash
python script.py
```## Output
- **Quantized Model**: `model_q8.gguf`, uploaded to ModelScope.
- **README.md**: Contains the original model card (wrapped in `---`) and quantization details, e.g.:
```markdown
---
tags:
- mteb
model-index:
...
---
## Classical/Yinka-Q8_0-GGUF
This model has been quantized to Q8_0 format using `llama.cpp`...
```## Known Limitations
- **Model Compatibility**: Not all Hugging Face models support GGUF conversion (e.g., some Mistral variants). Unsupported models are cached in `unsupported_models.txt`.
- **Build Time**: Recompiling `llama.cpp` each run takes approximately 1-2 minutes.## Contributing
We welcome Issues and Pull Requests! Potential improvements include:
- Supporting additional quantization formats (e.g., Q4_0).
- Optimizing `llama.cpp` build time.
- Enhancing model card parsing logic.## License
This project is licensed under the [MIT License](LICENSE). See the root directory for details.## Acknowledgments
- [llama.cpp](https://github.com/ggerganov/llama.cpp): Provides GGUF conversion and quantization tools.
- [Hugging Face](https://huggingface.co) and [ModelScope](https://modelscope.cn): Model hosting platforms.