An open API service indexing awesome lists of open source software.

https://github.com/arcxteam/gguf-convert-model

Auto GGUF Converter for HuggingFace Hub Models with Multiple Quantizations (GGUF Format)
https://github.com/arcxteam/gguf-convert-model

ai ai-models bf16 cmake convert-gguf gguf gguf-editor gguf-models gguf-quantization huggingface huggingface-models llama-cpp machine-learning safetensors tensorflow transformers

Last synced: 7 days ago
JSON representation

Auto GGUF Converter for HuggingFace Hub Models with Multiple Quantizations (GGUF Format)

Awesome Lists containing this project

README

          

GGUF LLMs Converter for Huggingface Hub Models with Multiple Quantizations (GGUF-Format)


Automated conversion of any Huggingface model to multiple GGUF LLMs quantization formats

Supports continuous monitoring, auto-detection, and universal deployment modes


Version
GGUF
llama.cpp
Huggingface
License

---

## ๐Ÿ“– Overview

**Universal GGUF LLMs Converter** is a production-ready, Docker-based solution for automatically converting HuggingFace models to GGUF format with multiple quantization types. Built with `llama.cpp` integration and intelligent tokenizer detection, this tool streamlines the conversion workflow for both personal and community models.

### Key Features

- ๐Ÿ”„ **Continuous Monitoring**: Automatically detects and converts new model updates from HuggingFace repositories
- ๐Ÿค– **Auto-Detection**: Intelligent tokenizer detection for 50+ popular model architectures (Qwen, Llama, Mistral, Phi, Gemma, etc.)
- ๐Ÿ“ฆ **Multiple Quantization**: Supports F16, F32, BF16, and all K-quant formats (Q2_K to Q8_0)
- ๐ŸŽฏ **Flexible Deploy**: Three (3) upload modes - same repository, new repository, or local-only storage
- ๐Ÿงน **Smart Cleanup**: Automatic temporary file management to prevent storage used
- ๐Ÿณ **Docker**: Fully container with optimized build times and resource usage
- ๐Ÿ“Š **Progress Tracking**: Clean, milestone-based logging with colorized console output

## ๐Ÿ› ๏ธ Requirements


VPS
Linux
Docker
HuggingFace

**System Requirements:**
- Linux-based VPS or local machine
- Docker & Docker Compose installed
- HuggingFace account with **WRITE** access token
- Sufficient disk space for model downloads and conversion (varies by model size)

## ๐Ÿ“ Project Structure

```diff
gguf-convert-model/
โ”œโ”€โ”€ .env
โ”œโ”€โ”€ .env.example
โ”œโ”€โ”€ .gitignore
โ”œโ”€โ”€ .dockerignore
โ”œโ”€โ”€ docker-compose.yml
โ”œโ”€โ”€ Dockerfile
โ”œโ”€โ”€ requirements.txt
โ”œโ”€โ”€ README.md
โ”œโ”€โ”€ scripts/
โ”‚ โ””โ”€โ”€ start.sh
โ”œโ”€โ”€ src/
โ”‚ โ”œโ”€โ”€ __init__.py
โ”‚ โ”œโ”€โ”€ main.py
โ”‚ โ”œโ”€โ”€ config.py
โ”‚ โ””โ”€โ”€ utils/
โ”‚ โ”œโ”€โ”€ __init__.py
โ”‚ โ”œโ”€โ”€ logger.py
โ”‚ โ””โ”€โ”€ helpers.py
โ””โ”€โ”€ logs/ (auto-created)
```

## ๐Ÿš€ **Quick Start**
### 1. Prerequisites

**HuggingFace Access Token:**
- Visit settings โ†’ https://huggingface.co/settings/tokens
- Create a new token with **Write** permissions
- Copy the token (starts with `hf_`)

**Install Docker & Compose** if not already installed
> Instal docker is optional, if you don't have.. try securely

```
curl -sSL https://raw.githubusercontent.com/arcxteam/succinct-prover/refs/heads/main/docker.sh | sudo bash
```

### 2. Clone Repository

```
git clone https://github.com/arcxteam/gguf-convert-model.git
cd gguf-convert-model
```

### 3. Configure Environment
> Create edit & save configuration file

```
cp .env.example .env
nano .env
```
> Example config environment variable

```diff
# HF token with WRITE permission
HUGGINGFACE_TOKEN=hf_xxxxxxxx

# Source model repository to convert
+ Example: Qwen/Qwen3-0.6B
REPO_ID=username/model-name

# Use interval in secs
+ Default 0 = only one-time convert, for other commits setup more)
CHECK_INTERVAL=0

# Output formats (comma-separated, no spaces)
# Available: F16,BF16,F32,Q2_K,Q2_K_S,Q3_K_S,Q3_K_M,Q3_K_L,Q4_K_S,Q4_K_M,Q4_K_L,Q5_K_S,Q5_K_M,Q5_K_L,Q6_K,Q8_0
+ Recommended: F16,Q4_K_M,Q5_K_M,Q6_K
QUANT_TYPES=F16,Q3_K_M,Q4_K_M,Q5_K_M,Q6_K

# ========================================
# UPLOAD MODE - Choose ONE option below
# ========================================

# OPTION 1: same_repo
# Upload to the same repository as own source model
+ Use this only YOUR OWN models with WRITE access
UPLOAD_MODE=same_repo

# OPTION 2: new_repo
# TARGET_REPO will be auto-generated as: username/ModelName-GGUF
+ Leave TARGET_REPO empty for auto (recommended)
+ Or manually specify: TARGET_REPO=your-username/custom-name-GGUF
UPLOAD_MODE=new_repo
TARGET_REPO=

# OPTION 3: local_only
+ Save to local directory only (no upload hugging)
+ Files auto-delete after LOCAL_CLEANUP_HOURS
UPLOAD_MODE=local_only
OUTPUT_DIR=./output

# Only set if auto-detection fails (default)
+ Example: Qwen/Qwen3-0.6B
BASE_MODEL_TOKENIZER=

# Output filename pattern (default)
# Placeholders: {model_name} = extracted base name, {quant} = format type
+ Result example: Qwen3-0.6B-Instruct-Q4_K_M.gguf
OUTPUT_PATTERN={model_name}-{quant}.gguf

# Auto-cleanup hours (default)
+ Setup you need local_only mode
LOCAL_CLEANUP_HOURS=24

# Timezone
TZ=Asia/Singapore
```

## ๐Ÿ“Š **Configuration Reference**

| ENV Variable | Required? | When to Change | Default if Empty |
|--------------|-----------|----------------|------------------|
| `HUGGINGFACE_TOKEN` | โœ… Yes | Always (your token) | `ERROR` |
| `REPO_ID` | โœ… Yes | Always (source model) | `ERROR` |
| `CHECK_INTERVAL` | โš ๏ธ Optional | Default= 0 or Changes | `in secs 3600=1h` |
| `QUANT_TYPES` | โš ๏ธ Optional | Change formats needed | `F16,Q4_K_M,Q5_K_M,more` |
| `UPLOAD_MODE` | โš ๏ธ Optional | Change based on use case | default `new_repo` |
| `TARGET_REPO` | โš ๏ธ Conditional | Only if `new_repo` mode | Same as `REPO_ID` |
| `OUTPUT_DIR` | โš ๏ธ Conditional | Only if `local_only` mode | `./output` |
| `BASE_MODEL_TOKENIZER` | โŒ Optional | Only if auto-detect fails | `empty = auto` |
| `OUTPUT_PATTERN` | โŒ Optional | Only if custom naming | `{model_name}-{quant}.gguf` |
| `LOCAL_CLEANUP_HOURS` | โŒ Optional | Only for `local_only` | default `24hour` |
| `TZ` | โŒ Optional | Change to your timezone | UTC |

### โœ… Checklist - What to Change

**Always Change:**
- โœ… `HUGGINGFACE_TOKEN` โ†’ Your personal token
- โœ… `REPO_ID` โ†’ Model to convert

**Usually Change:**
- โš ๏ธ `CHECK_INTERVAL` โ†’ Frequency (or 0 for one-time)
- โš ๏ธ `QUANT_TYPES` โ†’ Formats you need
- โš ๏ธ `UPLOAD_MODE` โ†’ Based on use case

**Change Only If Needed:**
- โŒ `TARGET_REPO` โ†’ If using `new_repo` mode
- โŒ `OUTPUT_DIR` โ†’ If using `local_only` mode
- โŒ `BASE_MODEL_TOKENIZER` โ†’ If auto-detect fails
- โŒ `OUTPUT_PATTERN` โ†’ If custom naming wanted
- โŒ `LOCAL_CLEANUP_HOURS` โ†’ If different cleanup time
- โŒ `TZ` โ†’ Your timezone (up to you)

**Never Change (Leave Default):**
- โœ… Comments (helpful documentation)
- โœ… Commented-out options (for reference)

### 3.๐Ÿƒ **Build and Start**
> Starting running

```
docker compose up --build -d
```

> Monitor logs & stop

```
docker compose logs -f
# docker compose down
```

## ๐Ÿ“Š **Supported Quantization Formats**

| Format | Precision | Size Reduction | Use Case |
|--------|-----------|----------------|----------|
| **F32** | Full (32-bit) | None | Maximum precision |
| **F16** | Half (16-bit) | ~50% | High quality general use |
| **BF16** | Brain Float 16 | ~50% | Training-optimized |
| **Q8_0** | 8-bit | ~75% | Near-lossless compression |
| **Q6_K** | 6-bit | ~80% | High quality compression |
| **Q5_K_M** | 5-bit | ~83% | **Recommended** balance |
| **Q4_K_M** | 4-bit | ~87% | **Popular** for production |
| **Q3_K_M** | 3-bit | ~90% | Aggressive compression |
| **Q2_K** | 2-bit | ~93% | Maximum compression |

## ๐Ÿ“œ License

This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.