{"id":27259791,"url":"https://github.com/neurone/flux.1-dev-fp8","last_synced_at":"2025-10-04T21:36:36.570Z","repository":{"id":252848280,"uuid":"841639537","full_name":"Neurone/flux.1-dev-fp8","owner":"Neurone","description":"Inference app for a FP8-quantized flux1-dev model. This runs on graphic cards with 16 GB of VRAM.","archived":false,"fork":false,"pushed_at":"2025-03-05T19:26:54.000Z","size":13366,"stargazers_count":31,"open_issues_count":1,"forks_count":4,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-04-11T04:07:06.491Z","etag":null,"topics":["ai","flux","models","python","txt-to-image"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Neurone.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2024-08-12T20:11:11.000Z","updated_at":"2025-04-05T00:53:16.000Z","dependencies_parsed_at":"2025-03-05T20:50:06.427Z","dependency_job_id":null,"html_url":"https://github.com/Neurone/flux.1-dev-fp8","commit_stats":null,"previous_names":["neurone/flux.1-dev"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/Neurone/flux.1-dev-fp8","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Neurone%2Fflux.1-dev-fp8","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Neurone%2Fflux.1-dev-fp8/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Neurone%2Fflux.1-dev-fp8/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Neurone%2Fflux.1-dev-fp8/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Neurone","download_url":"https://codeload.github.com/Neurone/flux.1-dev-fp8/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Neurone%2Fflux.1-dev-fp8/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":278380349,"owners_count":25977215,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-04T02:00:05.491Z","response_time":63,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai","flux","models","python","txt-to-image"],"created_at":"2025-04-11T04:07:01.030Z","updated_at":"2025-10-04T21:36:36.547Z","avatar_url":"https://github.com/Neurone.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# FLUX.1-DEV-FP8 Inference App\n\nInference app for a FP8-quantized flux1-dev model. **This runs on graphic cards with 16 GB of VRAM**.\n\n## Description\n\nThis is the inference app for a FP8 quantized version of flux1-dev that can run on graphic cards with 16 GB of VRAM.\n\nThis project resembles the [FLUX.1-dev's Inference App on Hugging Face](https://huggingface.co/spaces/black-forest-labs/FLUX.1-dev), but it is meant to run locally on your machine.\n\nImprovements over the original code:\n\n- Quantization of the model to FP8 at startup (I tried to serialize and reload the model from disk, there's no gain in terms of startup speed)\n- Automatically save generated images (a WEBP, without metadata, for sharing and a PNG, with metadata, for archiving)\n- Automatically insert metadata into images ([tag list](https://exiv2.org/tags.html))\n- Automatically insert inference metadata in JSON format into images (this allows you to recreate the same image later)\n- Avoid writing memory dump to disk in case of python crash\n- Tracking startup time\n\n**Fun fact**. Using the same parameters for inference, you can check the differences between the images generated by the quantized and the non quantized model. Sometimes they are very marginal, sometimes they are more evident.\n\nFor my tests, the non-quantized model is in general **better** and you can see the difference.\nThis is true when you compare the results directly, but the images are generally so good even with the FP8 model that usually you don't care :D\n\n## Install\n\n```bash\ngit clone https://github.com/Neurone/flux.1-dev-fp8.git\ncd flux.1-dev-fp8\npython3 -m virtualenv .venv\nsource .venv/bin/activate\npip install -r requirements.txt\n```\n\n## Run\n\n```bash\ncd flux.1-dev-fp8\nsource .venv/bin/activate\npython app.py\n```\n\n## Alternative Run\n\nIf you experience memory problems from time to time, especially when you try 2048x2048 images, try starting the app with this:\n\n```bash\ncd flux.1-dev-fp8\nsource .venv/bin/activate\nPYTORCH_CUDA_ALLOC_CONF=expandable_segments:True python app.py\n```\n\n### Model Comparison\n\n| FLUX.1-DEV | FLUX.1-DEV-FP8 |\n| - | - |\n| ![1024x2048; 40 steps; FLUX.1-DEV FULL MODEL](./samples/1723504062.1747687-dev.webp \"1024x2048; 40 steps; FLUX.1-DEV FULL MODEL\") | ![1024x2048; 40 steps; FLUX.1-DEV-FP8 QUANTIZED MODEL](./samples/1723504062.1747687.webp \"1024x2048; 40 steps; FLUX.1-DEV-FP8 QUANTIZED MODEL\") |\n\n## Inference Metadata\n\nThis is an example of the inference metadata saved into the PNG files.\n\n```json\n{\n  \"model\": {\n    \"name\": \"flux\",\n    \"id\": \"flux1-dev-fp8\",\n    \"multihash\": \"1220dc4a58f44c1ba335822aaf041b2d19483bfd12d5dc260f6fac403f7be5f33181\",\n    \"license_url\": \"https://huggingface.co/black-forest-labs/FLUX.1-dev/resolve/main/LICENSE.md\",\n    \"license_multihash\": \"1220b7a00498845420da83aad42857f69fbfcf731fd1efa6d1bb596a884f2f2cbf53\",\n    \"author\": \"Black Forest Labs\"\n  },\n  \"input\": {\n    \"prompt\": \"A majestic angel with large, dark wings, adorned in flowing blue robes, carrying a sleeping baby and surrounded by cherubs in a moonlit sky.  Whimsical, ethereal, celestial, fantasy art\",\n    \"seed\": 1914590619,\n    \"cfg_scale\": 3.5,\n    \"steps\": 40,\n    \"width\": 1024,\n    \"height\": 2048,\n    \"type\": \"txt2img\"\n  },\n  \"output\": {\n    \"filename\": \"1723504062.1747687.png\",\n    \"format\": \"image/png\",\n    \"image_multihash\": \"1220456650f0ccf07be30cc60e8c01b446039ac6606a7154bc59c30b5eb89e537258\",\n    \"creation_date_time\": \"2024-08-12 23:07:42.833579+00:00\"\n  }\n}\n```\n\n## File Metadata\n\nThis is an example of all the metadata saved into the PNG files.\n\n```bash\n❯ exiftool 1723504062.1747687.png\nExifTool Version Number         : 12.92\nFile Name                       : 1723504062.1747687.png\nDirectory                       : .\nFile Size                       : 2.8 MB\nFile Modification Date/Time     : 2024:08:13 01:07:42+02:00\nFile Access Date/Time           : 2024:08:13 01:11:32+02:00\nFile Inode Change Date/Time     : 2024:08:13 01:11:02+02:00\nFile Permissions                : -rw-rw-r--\nFile Type                       : PNG\nFile Type Extension             : png\nMIME Type                       : image/png\nImage Width                     : 1024\nImage Height                    : 2048\nBit Depth                       : 8\nColor Type                      : RGB\nCompression                     : Deflate/Inflate\nFilter                          : Adaptive\nInterlace                       : Noninterlaced\nExif Byte Order                 : Little-endian (Intel, II)\nImage Description               : A majestic angel with large, dark wings, adorned in flowing blue robes, carrying a sleeping baby and surrounded by cherubs in a moonlit sky.  Whimsical, ethereal, celestial, fantasy art\nMake                            : Black Forest Labs\nCamera Model Name               : flux1-dev-fp8\nModify Date                     : 2024-08-12 23:07:42.833579+00:00\nArtist                          : flux1-dev-fp8\nImage ID                        : 1220456650f0ccf07be30cc60e8c01b446039ac6606a7154bc59c30b5eb89e537258\nCopyright                       : https://huggingface.co/black-forest-labs/FLUX.1-dev/resolve/main/LICENSE.md;1220b7a00498845420da83aad42857f69fbfcf731fd1efa6d1bb596a884f2f2cbf53\nDate/Time Original              : 2024-08-12 23:07:42.833579+00:00\nUser Comment                    : {\"model\": {\"name\": \"flux\", \"id\": \"flux1-dev-fp8\", \"multihash\": \"1220dc4a58f44c1ba335822aaf041b2d19483bfd12d5dc260f6fac403f7be5f33181\", \"license_url\": \"https://huggingface.co/black-forest-labs/FLUX.1-dev/resolve/main/LICENSE.md\", \"license_multihash\": \"1220b7a00498845420da83aad42857f69fbfcf731fd1efa6d1bb596a884f2f2cbf53\", \"author\": \"Black Forest Labs\"}, \"input\": {\"prompt\": \"A majestic angel with large, dark wings, adorned in flowing blue robes, carrying a sleeping baby and surrounded by cherubs in a moonlit sky.  Whimsical, ethereal, celestial, fantasy art\", \"seed\": 1914590619, \"cfg_scale\": 3.5, \"steps\": 40, \"width\": 1024, \"height\": 2048, \"type\": \"txt2img\"}, \"output\": {\"filename\": \"1723504062.1747687.png\", \"format\": \"image/png\", \"image_multihash\": \"1220456650f0ccf07be30cc60e8c01b446039ac6606a7154bc59c30b5eb89e537258\", \"creation_date_time\": \"2024-08-12 23:07:42.833579+00:00\"}}\nImage Size                      : 1024x2048\nMegapixels                      : 2.1\n```\n\n## Performance\n\nThis is my configuration:\n\n- CPU: Intel Core i7-12700K\n- GPU: Nvidia GeForce RTX 4080 SUPER 16 GB\n- RAM: 64 GB DDR5\n- DISK (Operating System): SSD NVMe Crucial P5 Plus 2TB\n- DISK (Models): External USB3 NVMe Disk\n- OPERATING SYSTEM: Ubuntu 22.04.3 LTS\n- PYTHON VERSION: 3.10.12\n\nExcluding the real first time when you need to download all the resources, these are some examples of the performance I get.\n\nPrompt: *A majestic angel with large, dark wings, adorned in flowing blue robes, carrying a sleeping baby and surrounded by cherubs in a moonlit sky.  Whimsical, ethereal, celestial, fantasy art*\n\nSeed: 1914590619\n\nCFG: 3.5\n\n| operation | time spent |\n| - | - |\n| Startup time | ~50 seconds (using an HDD can increase the startup time significantly, for example up to 13 mins old setup with HDD)|\n| Inference; 512x512; 28 steps | 19 seconds |\n| Inference; 1024x1024; 28 steps | 45 seconds |\n| Inference; 1024x1024; 50 steps | 1 minute and 6 seconds |\n| Inference; 1024x2024; 28 steps | 1 minute and 15 seconds |\n| Inference; 1024x2048; 40 steps | 1 minute and 44 seconds |\n| Inference; 2048x2048; 28 steps | 2 minutes and 39 seconds|\n| Inference; 2048x2048; 50 steps | 4 minutes and 46 seconds|\n\nAfter a couple of runs the startup time decreases, but most of the startup time is spent in the part **before** the quantization of the model.\n\nQuantization adds ~9 seconds at the startup time (~90 in my previous configuration with an external HDD).\n\n## Samples\n\n### 512x512; 28 steps\n\n![512x512; 28 steps](./samples/1723504145.2547626.webp \"512x512; 28 steps\")\n\n### 1024x1024; 28 steps\n\n![1024x1024; 28 steps](./samples/1723504398.6657534.webp \"1024x1024; 28 steps\")\n\n### 1024x2024; 28 steps\n\n![1024x2024; 28 steps](./samples/1723501887.0529025.webp \"1024x2024; 28 steps\")\n\n### 2048x2048; 50 steps\n\n![2048x2048; 50 steps](./samples/1723503836.2297032.webp \"2048x2048; 50 steps\")\n\n## Example Of Consecutive Startups\n\n```bash\n❯ python app.py\n2025-03-05 20:19:16.468597 Started\nDownloading shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00\u003c00:00, 2694.70it/s]\nLoading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:10\u003c00:00,  5.20s/it]\nYou set `add_prefix_space`. The tokenizer needs to be converted from the slow tokenizers\nFetching 3 files: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:00\u003c00:00, 1283.45it/s]\nLoading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:23\u003c00:00,  7.97s/it]\n2025-03-05 20:19:57.180855 Quantizing transformer\n2025-03-05 20:20:05.793952 Quantizing text encoder 2\n2025-03-05 20:20:09.611559 Loading demo\n/home/developer/workspace/flux.1-dev-fp8/.venv/lib/python3.12/site-packages/gradio/helpers.py:148: UserWarning: In future versions of Gradio, the `cache_examples` parameter will no longer accept a value of 'lazy'. To enable lazy caching in Gradio, you should set `cache_examples=True`, and `cache_mode='lazy'` instead.\n  warnings.warn(\nWill cache examples in '/home/developer/workspace/flux.1-dev-fp8/.gradio/cached_examples/19' directory at first use.\n\n* Running on local URL:  http://127.0.0.1:7860\nINFO:httpx:HTTP Request: GET http://127.0.0.1:7860/gradio_api/startup-events \"HTTP/1.1 200 OK\"\nINFO:httpx:HTTP Request: HEAD http://127.0.0.1:7860/ \"HTTP/1.1 200 OK\"\n\nTo create a public link, set `share=True` in `launch()`.\nINFO:httpx:HTTP Request: GET https://api.gradio.app/pkg-version \"HTTP/1.1 200 OK\"\n\n### 53 seconds (12:21 mins with HDD)\n\n\n❯ python app.py\n2025-03-05 20:24:33.615529 Started\nDownloading shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00\u003c00:00, 24966.10it/s]\nLoading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00\u003c00:00,  2.86it/s]\nYou set `add_prefix_space`. The tokenizer needs to be converted from the slow tokenizers\nFetching 3 files: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:00\u003c00:00, 12761.57it/s]\nLoading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:00\u003c00:00, 103.95it/s]\n2025-03-05 20:24:39.492626 Quantizing transformer\n2025-03-05 20:24:47.987617 Quantizing text encoder 2\n2025-03-05 20:24:51.777597 Loading demo\n/home/developer/workspace/flux.1-dev-fp8/.venv/lib/python3.12/site-packages/gradio/helpers.py:148: UserWarning: In future versions of Gradio, the `cache_examples` parameter will no longer accept a value of 'lazy'. To enable lazy caching in Gradio, you should set `cache_examples=True`, and `cache_mode='lazy'` instead.\n  warnings.warn(\nWill cache examples in '/home/developer/workspace/flux.1-dev-fp8/.gradio/cached_examples/19' directory at first use.\n\n* Running on local URL:  http://127.0.0.1:7860\nINFO:httpx:HTTP Request: GET http://127.0.0.1:7860/gradio_api/startup-events \"HTTP/1.1 200 OK\"\nINFO:httpx:HTTP Request: HEAD http://127.0.0.1:7860/ \"HTTP/1.1 200 OK\"\n\nTo create a public link, set `share=True` in `launch()`.\nINFO:httpx:HTTP Request: GET https://api.gradio.app/pkg-version \"HTTP/1.1 200 OK\"\n\n## 18 seconds (6:37 mins with HDD)\n```\n\n## Utils\n\nShow all exif metadata\n\n```bash\nexiftool \u003cfilename\u003e\n```\n\nShow only inference metadata\n\n```bash\nexiftool -usercomment -s3 \u003cfilename\u003e | jq\n```\n\nDelete metadata from one or all files\n\n```bash\nexift -all= \u003cfilename\u003e\nexift -all= *\n```\n\n## Side notes\n\n1. The model_multihash (1220dc4a58f44c1ba335822aaf041b2d19483bfd12d5dc260f6fac403f7be5f33181) is derived from the\nserialization of the quantized transformer using the optimum.quanto libray just after the freeze operation.\n\n    ```python\n    from optimum.quanto import quantization_map\n    from safetensors.torch import save_file\n    save_file(transformer.state_dict(), './flux1-dev-transformer-fp8.safetensors')\n    ```\n\n2. The `Image.Exif()` object let you set the exif data and save them when saving the image for the first time.\nThis works correctly unless you need to use the UserComments field, and I want to use it. In that case, there's\nan error in the encoding.  To avoid the encoding error, I use an external library (exiv2) that I was able to make it\nwork only when using an actual file and not when reading the image from memory. This is why there are two saving\nsteps implemented, and not only once.\n\n## Credits\n\n- Thanks to [Black Forest Labs](https://blackforestlabs.ai/) for releasing the model free to use, and the inference code open source\n- Huge thanks to [@AmericanPresidentJimmyCarter](https://gist.github.com/AmericanPresidentJimmyCarter) for developing the [original quantization code](https://gist.github.com/AmericanPresidentJimmyCarter/873985638e1f3541ba8b00137e7dacd9).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fneurone%2Fflux.1-dev-fp8","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fneurone%2Fflux.1-dev-fp8","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fneurone%2Fflux.1-dev-fp8/lists"}