{"id":23425487,"url":"https://github.com/fal-ai/stable-diffusion-benchmarks","last_synced_at":"2025-10-26T18:49:09.719Z","repository":{"id":205673740,"uuid":"714320113","full_name":"fal-ai/stable-diffusion-benchmarks","owner":"fal-ai","description":"Comparison of different stable diffusion implementations and optimizations","archived":false,"fork":false,"pushed_at":"2024-01-27T20:35:00.000Z","size":54,"stargazers_count":39,"open_issues_count":1,"forks_count":2,"subscribers_count":5,"default_branch":"main","last_synced_at":"2025-04-12T17:49:40.660Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"https://www.fal.ai/models/stable-diffusion-xl","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/fal-ai.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2023-11-04T15:06:18.000Z","updated_at":"2025-03-03T11:04:08.000Z","dependencies_parsed_at":null,"dependency_job_id":"0f490680-70f9-42ba-b8fc-ca7d77892842","html_url":"https://github.com/fal-ai/stable-diffusion-benchmarks","commit_stats":null,"previous_names":["fal-ai/stable-diffusion-benchmarks"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/fal-ai/stable-diffusion-benchmarks","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fal-ai%2Fstable-diffusion-benchmarks","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fal-ai%2Fstable-diffusion-benchmarks/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fal-ai%2Fstable-diffusion-benchmarks/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fal-ai%2Fstable-diffusion-benchmarks/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/fal-ai","download_url":"https://codeload.github.com/fal-ai/stable-diffusion-benchmarks/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fal-ai%2Fstable-diffusion-benchmarks/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":279446661,"owners_count":26171783,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-17T02:00:07.504Z","response_time":56,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-12-23T05:12:53.471Z","updated_at":"2025-10-18T01:11:34.419Z","avatar_url":"https://github.com/fal-ai.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Stable Diffusion Benchmarks\n\nA set of benchmarks targeting different stable diffusion implementations to have a\nbetter understanding of their performance and scalability.\n\n## Benchmarks\n\nRunning on an A100 80G SXM hosted at [fal.ai](https://fal.ai). If you want to see how these models perform first hand,\ncheck out the [Fast SDXL](https://www.fal.ai/models/stable-diffusion-xl) playground which offers one of the most optimized\nSDXL implementations available (combining the open source techniques from this repo).\n\n\u003e [!NOTE]\n\u003e Most of the implementations here are also based on Diffusers, which is an amazing library\n\u003e that pretty much the whole industry is using. However, when we use 'Diffusers' name in the\n\u003e benchmarks, it means the experience you might get with out-of-box Diffusers (w/applying\n\u003e necessary settings).\n\n\u003c!-- START TABLE --\u003e\n### SD1.5 (End-to-end) Benchmarks\n|                  | mean (s) | median (s) | min (s) | max (s) | speed (it/s) |\n|------------------|----------|------------|---------|---------|--------------|\n| Diffusers (torch 2.1, SDPA) + OpenAI's [consistency decoder](https://github.com/openai/consistencydecoder)\\*\\* |   2.230s |     2.229s |  2.220s |  2.238s |   22.43 it/s |\n| Diffusers (torch 2.1, xformers) |   1.729s |     1.728s |  1.720s |  1.747s |   28.94 it/s |\n| Diffusers (torch 2.1, SDPA) |   1.604s |     1.603s |  1.589s |  1.618s |   31.19 it/s |\n| Diffusers (torch 2.1, SDPA, [tiny VAE](https://github.com/madebyollin/taesd))\\* |   1.567s |     1.562s |  1.547s |  1.602s |   32.02 it/s |\n| Diffusers (torch 2.1, SDPA, compiled) |   1.354s |     1.354s |  1.351s |  1.356s |   36.93 it/s |\n| Diffusers (torch 2.1, SDPA, compiled, NCHW channels last) |   1.058s |     1.057s |  1.056s |  1.060s |   47.29 it/s |\n| OneFlow          |   0.950s |     0.950s |  0.947s |  0.961s |   52.65 it/s |\n| Stable Fast (torch 2.1) |   0.901s |     0.901s |  0.900s |  0.903s |   55.51 it/s |\n| TensorRT 9.0 (cuda graphs, static shapes) |   0.819s |     0.818s |  0.817s |  0.821s |   61.14 it/s |\n\n### SDXL (End-to-end) Benchmarks\n|                  | mean (s) | median (s) | min (s) | max (s) | speed (it/s) |\n|------------------|----------|------------|---------|---------|--------------|\n| [minSDXL](https://github.com/cloneofsimo/minSDXL) (torch 2.1) |   8.146s |     8.146s |  8.137s |  8.155s |    6.14 it/s |\n| Diffusers (torch 2.1, SDPA) |   5.932s |     5.932s |  5.924s |  5.940s |    8.43 it/s |\n| [minSDXL+](https://github.com/isidentical/minSDXL) (torch 2.1, SDPA) |   5.887s |     5.887s |  5.872s |  5.897s |    8.49 it/s |\n| Comfy (torch 2.1, xformers) |   5.779s |     5.772s |  5.748s |  5.824s |    8.66 it/s |\n| Diffusers (torch 2.1, SDPA, [tiny VAE](https://github.com/madebyollin/taesd))\\* |   5.739s |     5.738s |  5.722s |  5.767s |    8.71 it/s |\n| Diffusers (torch 2.1, xformers) |   5.719s |     5.717s |  5.710s |  5.732s |    8.75 it/s |\n| [minSDXL+](https://github.com/isidentical/minSDXL) (torch 2.1, flash-attention v2) |   5.323s |     5.322s |  5.313s |  5.340s |    9.39 it/s |\n| Diffusers (torch 2.1, SDPA, compiled) |   5.217s |     5.216s |  5.213s |  5.220s |    9.59 it/s |\n| Diffusers (torch 2.1, SDPA, compiled, NCHW channels last) |   5.136s |     5.137s |  5.125s |  5.147s |    9.73 it/s |\n| OneFlow          |   4.300s |     4.301s |  4.282s |  4.316s |   11.62 it/s |\n| Stable Fast (torch 2.1) |   4.150s |     4.149s |  4.138s |  4.168s |   12.05 it/s |\n| TensorRT 9.0 (cuda graphs, static shapes) |   4.102s |     4.104s |  4.091s |  4.107s |   12.18 it/s |\n\n\u003c!-- END TABLE --\u003e\n\nGeneration options:\n- `prompt=\"A photo of a cat\"`\n- `num_inference_steps=50`\n- For SD1.5, the width/height is 512x512 (the default); for SDXL, the width/height is 1024x1024.\n- For all other options, the defaults from the generation systems are used.\n- Weights are always half-precision (fp16) unless otherwise specified.\n- Generation on benchmarks with a `*`/`**` means the used techniques might lead to quality degradation (or sometimes improvements) but the underlying diffusion model is still the same.\n\n\u003e [!NOTE]\n\u003e All the timings here are end to end, and reflects the time it takes to go from a single prompt\n\u003e to a decoded image. We are planning to make the benchmarking more granular and provide details\n\u003e and comparisons between each components (text encoder, VAE, and most importantly UNET) in the\n\u003e future, but for now, some of the results might not linearly scale with the number of inference\n\u003e steps since cost of certain components are one-time only.\n\n\nEnvironments (like torch and other library versions) for each benchmark are defined\nunder [benchmarks/](benchmarks/) folder.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffal-ai%2Fstable-diffusion-benchmarks","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ffal-ai%2Fstable-diffusion-benchmarks","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffal-ai%2Fstable-diffusion-benchmarks/lists"}