https://github.com/arcxteam/gguf-convert-model
Auto GGUF Converter for HuggingFace Hub Models with Multiple Quantizations (GGUF Format)
https://github.com/arcxteam/gguf-convert-model
ai ai-models bf16 cmake convert-gguf gguf gguf-editor gguf-models gguf-quantization huggingface huggingface-models llama-cpp machine-learning safetensors tensorflow transformers
Last synced: 7 days ago
JSON representation
Auto GGUF Converter for HuggingFace Hub Models with Multiple Quantizations (GGUF Format)
- Host: GitHub
- URL: https://github.com/arcxteam/gguf-convert-model
- Owner: arcxteam
- License: mit
- Created: 2025-11-09T20:39:09.000Z (7 months ago)
- Default Branch: main
- Last Pushed: 2026-02-11T16:32:32.000Z (4 months ago)
- Last Synced: 2026-06-10T06:45:49.611Z (7 days ago)
- Topics: ai, ai-models, bf16, cmake, convert-gguf, gguf, gguf-editor, gguf-models, gguf-quantization, huggingface, huggingface-models, llama-cpp, machine-learning, safetensors, tensorflow, transformers
- Language: Python
- Homepage:
- Size: 130 KB
- Stars: 2
- Watchers: 0
- Forks: 0
- Open Issues: 5
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
GGUF LLMs Converter for Huggingface Hub Models with Multiple Quantizations (GGUF-Format)
Automated conversion of any Huggingface model to multiple GGUF LLMs quantization formats
Supports continuous monitoring, auto-detection, and universal deployment modes
---
## ๐ Overview
**Universal GGUF LLMs Converter** is a production-ready, Docker-based solution for automatically converting HuggingFace models to GGUF format with multiple quantization types. Built with `llama.cpp` integration and intelligent tokenizer detection, this tool streamlines the conversion workflow for both personal and community models.
### Key Features
- ๐ **Continuous Monitoring**: Automatically detects and converts new model updates from HuggingFace repositories
- ๐ค **Auto-Detection**: Intelligent tokenizer detection for 50+ popular model architectures (Qwen, Llama, Mistral, Phi, Gemma, etc.)
- ๐ฆ **Multiple Quantization**: Supports F16, F32, BF16, and all K-quant formats (Q2_K to Q8_0)
- ๐ฏ **Flexible Deploy**: Three (3) upload modes - same repository, new repository, or local-only storage
- ๐งน **Smart Cleanup**: Automatic temporary file management to prevent storage used
- ๐ณ **Docker**: Fully container with optimized build times and resource usage
- ๐ **Progress Tracking**: Clean, milestone-based logging with colorized console output
## ๐ ๏ธ Requirements
**System Requirements:**
- Linux-based VPS or local machine
- Docker & Docker Compose installed
- HuggingFace account with **WRITE** access token
- Sufficient disk space for model downloads and conversion (varies by model size)
## ๐ Project Structure
```diff
gguf-convert-model/
โโโ .env
โโโ .env.example
โโโ .gitignore
โโโ .dockerignore
โโโ docker-compose.yml
โโโ Dockerfile
โโโ requirements.txt
โโโ README.md
โโโ scripts/
โ โโโ start.sh
โโโ src/
โ โโโ __init__.py
โ โโโ main.py
โ โโโ config.py
โ โโโ utils/
โ โโโ __init__.py
โ โโโ logger.py
โ โโโ helpers.py
โโโ logs/ (auto-created)
```
## ๐ **Quick Start**
### 1. Prerequisites
**HuggingFace Access Token:**
- Visit settings โ https://huggingface.co/settings/tokens
- Create a new token with **Write** permissions
- Copy the token (starts with `hf_`)
**Install Docker & Compose** if not already installed
> Instal docker is optional, if you don't have.. try securely
```
curl -sSL https://raw.githubusercontent.com/arcxteam/succinct-prover/refs/heads/main/docker.sh | sudo bash
```
### 2. Clone Repository
```
git clone https://github.com/arcxteam/gguf-convert-model.git
cd gguf-convert-model
```
### 3. Configure Environment
> Create edit & save configuration file
```
cp .env.example .env
nano .env
```
> Example config environment variable
```diff
# HF token with WRITE permission
HUGGINGFACE_TOKEN=hf_xxxxxxxx
# Source model repository to convert
+ Example: Qwen/Qwen3-0.6B
REPO_ID=username/model-name
# Use interval in secs
+ Default 0 = only one-time convert, for other commits setup more)
CHECK_INTERVAL=0
# Output formats (comma-separated, no spaces)
# Available: F16,BF16,F32,Q2_K,Q2_K_S,Q3_K_S,Q3_K_M,Q3_K_L,Q4_K_S,Q4_K_M,Q4_K_L,Q5_K_S,Q5_K_M,Q5_K_L,Q6_K,Q8_0
+ Recommended: F16,Q4_K_M,Q5_K_M,Q6_K
QUANT_TYPES=F16,Q3_K_M,Q4_K_M,Q5_K_M,Q6_K
# ========================================
# UPLOAD MODE - Choose ONE option below
# ========================================
# OPTION 1: same_repo
# Upload to the same repository as own source model
+ Use this only YOUR OWN models with WRITE access
UPLOAD_MODE=same_repo
# OPTION 2: new_repo
# TARGET_REPO will be auto-generated as: username/ModelName-GGUF
+ Leave TARGET_REPO empty for auto (recommended)
+ Or manually specify: TARGET_REPO=your-username/custom-name-GGUF
UPLOAD_MODE=new_repo
TARGET_REPO=
# OPTION 3: local_only
+ Save to local directory only (no upload hugging)
+ Files auto-delete after LOCAL_CLEANUP_HOURS
UPLOAD_MODE=local_only
OUTPUT_DIR=./output
# Only set if auto-detection fails (default)
+ Example: Qwen/Qwen3-0.6B
BASE_MODEL_TOKENIZER=
# Output filename pattern (default)
# Placeholders: {model_name} = extracted base name, {quant} = format type
+ Result example: Qwen3-0.6B-Instruct-Q4_K_M.gguf
OUTPUT_PATTERN={model_name}-{quant}.gguf
# Auto-cleanup hours (default)
+ Setup you need local_only mode
LOCAL_CLEANUP_HOURS=24
# Timezone
TZ=Asia/Singapore
```
## ๐ **Configuration Reference**
| ENV Variable | Required? | When to Change | Default if Empty |
|--------------|-----------|----------------|------------------|
| `HUGGINGFACE_TOKEN` | โ
Yes | Always (your token) | `ERROR` |
| `REPO_ID` | โ
Yes | Always (source model) | `ERROR` |
| `CHECK_INTERVAL` | โ ๏ธ Optional | Default= 0 or Changes | `in secs 3600=1h` |
| `QUANT_TYPES` | โ ๏ธ Optional | Change formats needed | `F16,Q4_K_M,Q5_K_M,more` |
| `UPLOAD_MODE` | โ ๏ธ Optional | Change based on use case | default `new_repo` |
| `TARGET_REPO` | โ ๏ธ Conditional | Only if `new_repo` mode | Same as `REPO_ID` |
| `OUTPUT_DIR` | โ ๏ธ Conditional | Only if `local_only` mode | `./output` |
| `BASE_MODEL_TOKENIZER` | โ Optional | Only if auto-detect fails | `empty = auto` |
| `OUTPUT_PATTERN` | โ Optional | Only if custom naming | `{model_name}-{quant}.gguf` |
| `LOCAL_CLEANUP_HOURS` | โ Optional | Only for `local_only` | default `24hour` |
| `TZ` | โ Optional | Change to your timezone | UTC |
### โ
Checklist - What to Change
**Always Change:**
- โ
`HUGGINGFACE_TOKEN` โ Your personal token
- โ
`REPO_ID` โ Model to convert
**Usually Change:**
- โ ๏ธ `CHECK_INTERVAL` โ Frequency (or 0 for one-time)
- โ ๏ธ `QUANT_TYPES` โ Formats you need
- โ ๏ธ `UPLOAD_MODE` โ Based on use case
**Change Only If Needed:**
- โ `TARGET_REPO` โ If using `new_repo` mode
- โ `OUTPUT_DIR` โ If using `local_only` mode
- โ `BASE_MODEL_TOKENIZER` โ If auto-detect fails
- โ `OUTPUT_PATTERN` โ If custom naming wanted
- โ `LOCAL_CLEANUP_HOURS` โ If different cleanup time
- โ `TZ` โ Your timezone (up to you)
**Never Change (Leave Default):**
- โ
Comments (helpful documentation)
- โ
Commented-out options (for reference)
### 3.๐ **Build and Start**
> Starting running
```
docker compose up --build -d
```
> Monitor logs & stop
```
docker compose logs -f
# docker compose down
```
## ๐ **Supported Quantization Formats**
| Format | Precision | Size Reduction | Use Case |
|--------|-----------|----------------|----------|
| **F32** | Full (32-bit) | None | Maximum precision |
| **F16** | Half (16-bit) | ~50% | High quality general use |
| **BF16** | Brain Float 16 | ~50% | Training-optimized |
| **Q8_0** | 8-bit | ~75% | Near-lossless compression |
| **Q6_K** | 6-bit | ~80% | High quality compression |
| **Q5_K_M** | 5-bit | ~83% | **Recommended** balance |
| **Q4_K_M** | 4-bit | ~87% | **Popular** for production |
| **Q3_K_M** | 3-bit | ~90% | Aggressive compression |
| **Q2_K** | 2-bit | ~93% | Maximum compression |
## ๐ License
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.