https://github.com/uqer1244/MLX_z-image
MLX version of z-image model
https://github.com/uqer1244/MLX_z-image
apple-silicon image-generation mlx
Last synced: 29 days ago
JSON representation
MLX version of z-image model
- Host: GitHub
- URL: https://github.com/uqer1244/MLX_z-image
- Owner: uqer1244
- License: apache-2.0
- Created: 2025-12-09T17:13:00.000Z (6 months ago)
- Default Branch: master
- Last Pushed: 2026-01-21T17:34:56.000Z (4 months ago)
- Last Synced: 2026-01-22T05:18:00.772Z (4 months ago)
- Topics: apple-silicon, image-generation, mlx
- Language: Python
- Homepage:
- Size: 6.85 MB
- Stars: 31
- Watchers: 4
- Forks: 3
- Open Issues: 2
-
Metadata Files:
- Readme: readme.md
Awesome Lists containing this project
- awesome-mlx - MLX_z-image - image model (Rising projects)
README
# MLX z-image 🍎
An efficient **MLX implementation** of [Z-Image-Turbo](https://huggingface.co/Tongyi-MAI/Z-Image-Turbo) optimized for Apple Silicon (M1/M2/M3/M4).
This repository allows you to run high-quality image generation locally on your Mac using **4-bit quantization**, significantly reducing memory usage while maintaining quality.
[](https://huggingface.co/uqer1244/MLX-z-image)
## 📂 Project Structure
It is recommended to organize your folders as follows:
```text
MLX_z-image/
├── converting/ # Scripts to convert PyTorch to MLX (just non functional files)
├── Z-Image-Turbo-MLX/ # MLX Weights (auto download on first run)
├── mlx_text_encoder.py # MLX converted Text Encoder
├── mlx_z_image.py # MLX converted transformer
├── check_lora.py # Checking lora is suitable for mlx-z-image
├── lora_utils.py # Apply lora
├── run.py # Run Script
├── prompt.txt # prompt
└── mlx_pipeline.py # mlx Pipeline
````
## 📊 Performance & Gallery
### Benchmarks
Inference tests were conducted on Apple Silicon devices using **native MLX** with **4-bit quantization**.
- **Resolution**: 1024x1024
- **Steps**: 9 (Turbo)
| Device | RAM | Total Time | Denoise Time | Time per Step |
|:-----------|:-----|:-----------|:-------------|:--------------|
| **M3 Pro** | 18GB | ~ 150 s | 140 s | 15 s/Step |
| **M4** | 16GB | ~ 240 s | 230 s | 25 s/Step |
### Gallary
Uncurated samples generated directly on a Mac using the 4-bit quantized model.
*"anime digital painting She sat poised on a ledge of polished peach quartz, the very image of a classical statue brought to wild, impish life within the sunlit cave dwellings. Her Korean features were framed by a stunning ginger hime cut, its straight, . A wild assembly of leather straps and sheer, iridescent fabric served as her lingerie, barely covering her slim figure while perfectly accentuating her narrow waist, tight ass, and breasts. The natural sunlight filtering through the cave's opening bathed her in a warm, rosy glow, making her pale skin seem to glow from within. One hand rested flat on the quartz beside her, supporting her lean, while the other was raised to her mouth, a single finger resting thoughtfully on her lower lip as she watched the dust motes dance in the light."*
| **MLX** |
|:-----------------------------------:|
|
|
## Technical Highlights
I implemented a **Single-Stream Z-Image Transformer** fully based on MLX, optimizing specifically for the **Unified Memory Architecture** of Apple Silicon.
### 1. Linear Projection Fusion (QKV Optimization)
In standard PyTorch implementations, Q, K, and V projections are often performed sequentially:
```python
q = x @ W_q, k = x @ W_k, v = x @ W_v # 3 Memory Accesses
```
I fused these weights into a single tensor to minimize memory reads:
```python
qkv = x @ W_qkv # 1 Memory Access
```
This is crucial for LLMs and Diffusion models on Mac, where **memory bandwidth** often becomes the bottleneck before compute power.
### 2. Hardware-Accelerated Flash Attention
I utilize MLX's native kernel `mx.fast.scaled_dot_product_attention`. This operation runs directly on the GPU using optimized Metal kernels, avoiding the creation of large intermediate attention maps. This allows for higher resolution generation without OOM (Out Of Memory) errors.
### 3. Loop-Invariant RoPE Caching
The denoising process involves iterative steps, but the **spatial grid (H, W)** of the image latent remains constant. Instead of recalculating rotary embeddings at every step.
* **Pre-compute** Cosine/Sine tables for the maximum sequence length before the loop.
* **Cache** them in VRAM.
* **Pass** the cached tensors to the compiled step function.
This reduces the computational overhead of the `ApplyRoPE` operation to near zero during sampling.
## Installation
### 1\. Clone the repository
```bash
git clone https://github.com/uqer1244/MLX_z-image.git
cd MLX_z-image
```
### 2\. Install dependencies
Ensure you have Python installed (Python 3.10+ recommended).
```bash
pip install -r requirements.txt
```
*(Note: `huggingface_hub` is required for downloading models)*
-----
## Quick Start
We have packaged everything (Transformer, Text Encoder, VAE, Tokenizer, Scheduler) into a single repository for easy usage.
### Option 1: Automatic Download & Run (Recommended)
Simply run the script. If the model is not found locally, it will automatically download the full 4-bit quantized package from Hugging Face.
```bash
python run.py
```
> **Note:** The prompt is always loaded from `prompt.txt` to handle long/complex prompts easily.
### Options
| Argument | Description | Default |
|:---------------|:----------------|:---------|
| `--width` | Image Width | `1024` |
| `--height` | Image Height | `1024` |
| `--steps` | Inference Steps | `9` |
| `--seed` | Random Seed | `42` |
| `--output` | Output filename | `res.png` |
| `--lora_path` | Lora path | `None` |
| `--lora_scale` | Lora scale | `1.0` |
Lora only works when insert "Lora_path"
```bash
python run.py \
--width 1024 \
--height 1024 \
--steps 9 \
--seed 42 \
--output "res.png" \
--steps 5 \
--lora_path "~~.safetensor" \
--lora_scale 1.0 \
```
> **Note:** Width and Height resolutions are always able to devide by 8
### [ComfyUI Custom node setup](custom_nodes/readme.md)
-----
## Todo & Roadmap
We are actively working on making this implementation pure MLX and bug-free.
- [x] **Fix Artifacts**: Investigate and resolve visual artifacts (tiling/color issues) currently visible in some generations.
- [ ] **Full MLX Pipeline**: Port the remaining PyTorch components (VAE, Text Encoder, Tokenizer, Scheduler) to native MLX to remove the `torch` and `diffusers` dependencies completely.
- [x] Text Encoder - 4bit
- [ ] Tokenizer - tokenizer is faster enough
- [x] Scheduler
- [x] Transformer - 4bit
- [ ] VAE - I tried MLX converting for VAE, but pytorch version is more stable
- [x] **LoRA Support**: Add support for loading and applying LoRA adapters for style customization.
- [x] **ComfyUI Node**: Add custom node for ComfyUI GUI.
- [ ] **IP over ThunderBolt (or RDMA on TB5) support** : Add support for multiple mac cluster.
- [ ] **Z-Image-Edit, Base model support** : now on turbo only.
-----
## Acknowledgements
- Original Model: [Tongyi-MAI/Z-Image-Turbo](https://huggingface.co/Tongyi-MAI/Z-Image-Turbo)
- MLX Framework: [Apple Machine Learning Research](https://github.com/ml-explore/mlx)
## License
This project is a modification of [Tongyi-MAI/Z-Image](https://github.com/Tongyi-MAI/Z-Image) and is licensed under the **Apache License 2.0**.
- **Original Code**: Copyright (c) Tongyi-MAI
- **Modifications**: Ported to Apple MLX by uqer1244