An open API service indexing awesome lists of open source software.

https://github.com/neurone/flux.1-dev-fp8

Inference app for a FP8-quantized flux1-dev model. This runs on graphic cards with 16 GB of VRAM.
https://github.com/neurone/flux.1-dev-fp8

ai flux models python txt-to-image

Last synced: 8 months ago
JSON representation

Inference app for a FP8-quantized flux1-dev model. This runs on graphic cards with 16 GB of VRAM.

Awesome Lists containing this project

README

          

# FLUX.1-DEV-FP8 Inference App

Inference app for a FP8-quantized flux1-dev model. **This runs on graphic cards with 16 GB of VRAM**.

## Description

This is the inference app for a FP8 quantized version of flux1-dev that can run on graphic cards with 16 GB of VRAM.

This project resembles the [FLUX.1-dev's Inference App on Hugging Face](https://huggingface.co/spaces/black-forest-labs/FLUX.1-dev), but it is meant to run locally on your machine.

Improvements over the original code:

- Quantization of the model to FP8 at startup (I tried to serialize and reload the model from disk, there's no gain in terms of startup speed)
- Automatically save generated images (a WEBP, without metadata, for sharing and a PNG, with metadata, for archiving)
- Automatically insert metadata into images ([tag list](https://exiv2.org/tags.html))
- Automatically insert inference metadata in JSON format into images (this allows you to recreate the same image later)
- Avoid writing memory dump to disk in case of python crash
- Tracking startup time

**Fun fact**. Using the same parameters for inference, you can check the differences between the images generated by the quantized and the non quantized model. Sometimes they are very marginal, sometimes they are more evident.

For my tests, the non-quantized model is in general **better** and you can see the difference.
This is true when you compare the results directly, but the images are generally so good even with the FP8 model that usually you don't care :D

## Install

```bash
git clone https://github.com/Neurone/flux.1-dev-fp8.git
cd flux.1-dev-fp8
python3 -m virtualenv .venv
source .venv/bin/activate
pip install -r requirements.txt
```

## Run

```bash
cd flux.1-dev-fp8
source .venv/bin/activate
python app.py
```

## Alternative Run

If you experience memory problems from time to time, especially when you try 2048x2048 images, try starting the app with this:

```bash
cd flux.1-dev-fp8
source .venv/bin/activate
PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True python app.py
```

### Model Comparison

| FLUX.1-DEV | FLUX.1-DEV-FP8 |
| - | - |
| ![1024x2048; 40 steps; FLUX.1-DEV FULL MODEL](./samples/1723504062.1747687-dev.webp "1024x2048; 40 steps; FLUX.1-DEV FULL MODEL") | ![1024x2048; 40 steps; FLUX.1-DEV-FP8 QUANTIZED MODEL](./samples/1723504062.1747687.webp "1024x2048; 40 steps; FLUX.1-DEV-FP8 QUANTIZED MODEL") |

## Inference Metadata

This is an example of the inference metadata saved into the PNG files.

```json
{
"model": {
"name": "flux",
"id": "flux1-dev-fp8",
"multihash": "1220dc4a58f44c1ba335822aaf041b2d19483bfd12d5dc260f6fac403f7be5f33181",
"license_url": "https://huggingface.co/black-forest-labs/FLUX.1-dev/resolve/main/LICENSE.md",
"license_multihash": "1220b7a00498845420da83aad42857f69fbfcf731fd1efa6d1bb596a884f2f2cbf53",
"author": "Black Forest Labs"
},
"input": {
"prompt": "A majestic angel with large, dark wings, adorned in flowing blue robes, carrying a sleeping baby and surrounded by cherubs in a moonlit sky. Whimsical, ethereal, celestial, fantasy art",
"seed": 1914590619,
"cfg_scale": 3.5,
"steps": 40,
"width": 1024,
"height": 2048,
"type": "txt2img"
},
"output": {
"filename": "1723504062.1747687.png",
"format": "image/png",
"image_multihash": "1220456650f0ccf07be30cc60e8c01b446039ac6606a7154bc59c30b5eb89e537258",
"creation_date_time": "2024-08-12 23:07:42.833579+00:00"
}
}
```

## File Metadata

This is an example of all the metadata saved into the PNG files.

```bash
❯ exiftool 1723504062.1747687.png
ExifTool Version Number : 12.92
File Name : 1723504062.1747687.png
Directory : .
File Size : 2.8 MB
File Modification Date/Time : 2024:08:13 01:07:42+02:00
File Access Date/Time : 2024:08:13 01:11:32+02:00
File Inode Change Date/Time : 2024:08:13 01:11:02+02:00
File Permissions : -rw-rw-r--
File Type : PNG
File Type Extension : png
MIME Type : image/png
Image Width : 1024
Image Height : 2048
Bit Depth : 8
Color Type : RGB
Compression : Deflate/Inflate
Filter : Adaptive
Interlace : Noninterlaced
Exif Byte Order : Little-endian (Intel, II)
Image Description : A majestic angel with large, dark wings, adorned in flowing blue robes, carrying a sleeping baby and surrounded by cherubs in a moonlit sky. Whimsical, ethereal, celestial, fantasy art
Make : Black Forest Labs
Camera Model Name : flux1-dev-fp8
Modify Date : 2024-08-12 23:07:42.833579+00:00
Artist : flux1-dev-fp8
Image ID : 1220456650f0ccf07be30cc60e8c01b446039ac6606a7154bc59c30b5eb89e537258
Copyright : https://huggingface.co/black-forest-labs/FLUX.1-dev/resolve/main/LICENSE.md;1220b7a00498845420da83aad42857f69fbfcf731fd1efa6d1bb596a884f2f2cbf53
Date/Time Original : 2024-08-12 23:07:42.833579+00:00
User Comment : {"model": {"name": "flux", "id": "flux1-dev-fp8", "multihash": "1220dc4a58f44c1ba335822aaf041b2d19483bfd12d5dc260f6fac403f7be5f33181", "license_url": "https://huggingface.co/black-forest-labs/FLUX.1-dev/resolve/main/LICENSE.md", "license_multihash": "1220b7a00498845420da83aad42857f69fbfcf731fd1efa6d1bb596a884f2f2cbf53", "author": "Black Forest Labs"}, "input": {"prompt": "A majestic angel with large, dark wings, adorned in flowing blue robes, carrying a sleeping baby and surrounded by cherubs in a moonlit sky. Whimsical, ethereal, celestial, fantasy art", "seed": 1914590619, "cfg_scale": 3.5, "steps": 40, "width": 1024, "height": 2048, "type": "txt2img"}, "output": {"filename": "1723504062.1747687.png", "format": "image/png", "image_multihash": "1220456650f0ccf07be30cc60e8c01b446039ac6606a7154bc59c30b5eb89e537258", "creation_date_time": "2024-08-12 23:07:42.833579+00:00"}}
Image Size : 1024x2048
Megapixels : 2.1
```

## Performance

This is my configuration:

- CPU: Intel Core i7-12700K
- GPU: Nvidia GeForce RTX 4080 SUPER 16 GB
- RAM: 64 GB DDR5
- DISK (Operating System): SSD NVMe Crucial P5 Plus 2TB
- DISK (Models): External USB3 NVMe Disk
- OPERATING SYSTEM: Ubuntu 22.04.3 LTS
- PYTHON VERSION: 3.10.12

Excluding the real first time when you need to download all the resources, these are some examples of the performance I get.

Prompt: *A majestic angel with large, dark wings, adorned in flowing blue robes, carrying a sleeping baby and surrounded by cherubs in a moonlit sky. Whimsical, ethereal, celestial, fantasy art*

Seed: 1914590619

CFG: 3.5

| operation | time spent |
| - | - |
| Startup time | ~50 seconds (using an HDD can increase the startup time significantly, for example up to 13 mins old setup with HDD)|
| Inference; 512x512; 28 steps | 19 seconds |
| Inference; 1024x1024; 28 steps | 45 seconds |
| Inference; 1024x1024; 50 steps | 1 minute and 6 seconds |
| Inference; 1024x2024; 28 steps | 1 minute and 15 seconds |
| Inference; 1024x2048; 40 steps | 1 minute and 44 seconds |
| Inference; 2048x2048; 28 steps | 2 minutes and 39 seconds|
| Inference; 2048x2048; 50 steps | 4 minutes and 46 seconds|

After a couple of runs the startup time decreases, but most of the startup time is spent in the part **before** the quantization of the model.

Quantization adds ~9 seconds at the startup time (~90 in my previous configuration with an external HDD).

## Samples

### 512x512; 28 steps

![512x512; 28 steps](./samples/1723504145.2547626.webp "512x512; 28 steps")

### 1024x1024; 28 steps

![1024x1024; 28 steps](./samples/1723504398.6657534.webp "1024x1024; 28 steps")

### 1024x2024; 28 steps

![1024x2024; 28 steps](./samples/1723501887.0529025.webp "1024x2024; 28 steps")

### 2048x2048; 50 steps

![2048x2048; 50 steps](./samples/1723503836.2297032.webp "2048x2048; 50 steps")

## Example Of Consecutive Startups

```bash
❯ python app.py
2025-03-05 20:19:16.468597 Started
Downloading shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 2694.70it/s]
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:10<00:00, 5.20s/it]
You set `add_prefix_space`. The tokenizer needs to be converted from the slow tokenizers
Fetching 3 files: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:00<00:00, 1283.45it/s]
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:23<00:00, 7.97s/it]
2025-03-05 20:19:57.180855 Quantizing transformer
2025-03-05 20:20:05.793952 Quantizing text encoder 2
2025-03-05 20:20:09.611559 Loading demo
/home/developer/workspace/flux.1-dev-fp8/.venv/lib/python3.12/site-packages/gradio/helpers.py:148: UserWarning: In future versions of Gradio, the `cache_examples` parameter will no longer accept a value of 'lazy'. To enable lazy caching in Gradio, you should set `cache_examples=True`, and `cache_mode='lazy'` instead.
warnings.warn(
Will cache examples in '/home/developer/workspace/flux.1-dev-fp8/.gradio/cached_examples/19' directory at first use.

* Running on local URL: http://127.0.0.1:7860
INFO:httpx:HTTP Request: GET http://127.0.0.1:7860/gradio_api/startup-events "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: HEAD http://127.0.0.1:7860/ "HTTP/1.1 200 OK"

To create a public link, set `share=True` in `launch()`.
INFO:httpx:HTTP Request: GET https://api.gradio.app/pkg-version "HTTP/1.1 200 OK"

### 53 seconds (12:21 mins with HDD)

❯ python app.py
2025-03-05 20:24:33.615529 Started
Downloading shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 24966.10it/s]
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 2.86it/s]
You set `add_prefix_space`. The tokenizer needs to be converted from the slow tokenizers
Fetching 3 files: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:00<00:00, 12761.57it/s]
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:00<00:00, 103.95it/s]
2025-03-05 20:24:39.492626 Quantizing transformer
2025-03-05 20:24:47.987617 Quantizing text encoder 2
2025-03-05 20:24:51.777597 Loading demo
/home/developer/workspace/flux.1-dev-fp8/.venv/lib/python3.12/site-packages/gradio/helpers.py:148: UserWarning: In future versions of Gradio, the `cache_examples` parameter will no longer accept a value of 'lazy'. To enable lazy caching in Gradio, you should set `cache_examples=True`, and `cache_mode='lazy'` instead.
warnings.warn(
Will cache examples in '/home/developer/workspace/flux.1-dev-fp8/.gradio/cached_examples/19' directory at first use.

* Running on local URL: http://127.0.0.1:7860
INFO:httpx:HTTP Request: GET http://127.0.0.1:7860/gradio_api/startup-events "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: HEAD http://127.0.0.1:7860/ "HTTP/1.1 200 OK"

To create a public link, set `share=True` in `launch()`.
INFO:httpx:HTTP Request: GET https://api.gradio.app/pkg-version "HTTP/1.1 200 OK"

## 18 seconds (6:37 mins with HDD)
```

## Utils

Show all exif metadata

```bash
exiftool
```

Show only inference metadata

```bash
exiftool -usercomment -s3 | jq
```

Delete metadata from one or all files

```bash
exift -all=
exift -all= *
```

## Side notes

1. The model_multihash (1220dc4a58f44c1ba335822aaf041b2d19483bfd12d5dc260f6fac403f7be5f33181) is derived from the
serialization of the quantized transformer using the optimum.quanto libray just after the freeze operation.

```python
from optimum.quanto import quantization_map
from safetensors.torch import save_file
save_file(transformer.state_dict(), './flux1-dev-transformer-fp8.safetensors')
```

2. The `Image.Exif()` object let you set the exif data and save them when saving the image for the first time.
This works correctly unless you need to use the UserComments field, and I want to use it. In that case, there's
an error in the encoding. To avoid the encoding error, I use an external library (exiv2) that I was able to make it
work only when using an actual file and not when reading the image from memory. This is why there are two saving
steps implemented, and not only once.

## Credits

- Thanks to [Black Forest Labs](https://blackforestlabs.ai/) for releasing the model free to use, and the inference code open source
- Huge thanks to [@AmericanPresidentJimmyCarter](https://gist.github.com/AmericanPresidentJimmyCarter) for developing the [original quantization code](https://gist.github.com/AmericanPresidentJimmyCarter/873985638e1f3541ba8b00137e7dacd9).