{"id":13429517,"url":"https://github.com/CompVis/latent-diffusion","last_synced_at":"2025-03-16T03:31:50.048Z","repository":{"id":37419359,"uuid":"440244590","full_name":"CompVis/latent-diffusion","owner":"CompVis","description":"High-Resolution Image Synthesis with Latent Diffusion Models","archived":false,"fork":false,"pushed_at":"2024-02-29T05:29:47.000Z","size":29378,"stargazers_count":12487,"open_issues_count":288,"forks_count":1585,"subscribers_count":98,"default_branch":"main","last_synced_at":"2025-03-12T08:02:13.645Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/CompVis.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2021-12-20T16:56:18.000Z","updated_at":"2025-03-12T08:00:18.000Z","dependencies_parsed_at":"2024-09-30T14:31:11.445Z","dependency_job_id":null,"html_url":"https://github.com/CompVis/latent-diffusion","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/CompVis%2Flatent-diffusion","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/CompVis%2Flatent-diffusion/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/CompVis%2Flatent-diffusion/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/CompVis%2Flatent-diffusion/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/CompVis","download_url":"https://codeload.github.com/CompVis/latent-diffusion/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":243822309,"owners_count":20353496,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-07-31T02:00:41.083Z","updated_at":"2025-03-16T03:31:50.038Z","avatar_url":"https://github.com/CompVis.png","language":"Jupyter Notebook","funding_links":[],"categories":["2 Foundation Models","Implementations","Jupyter Notebook","👑Stable Diffusion","Diffusion Models","Uncategorized","Image Generation","其他_机器视觉","Papers","📦 Legacy \u0026 Inactive Projects"],"sub_categories":["2.2 Vision Foundation Models","Python","Uncategorized","Diffusion Paradigm","网络服务_其他","Text-Image Generation"],"readme":"# Latent Diffusion Models\n[arXiv](https://arxiv.org/abs/2112.10752) | [BibTeX](#bibtex)\n\n\u003cp align=\"center\"\u003e\n\u003cimg src=assets/results.gif /\u003e\n\u003c/p\u003e\n\n\n\n[**High-Resolution Image Synthesis with Latent Diffusion Models**](https://arxiv.org/abs/2112.10752)\u003cbr/\u003e\n[Robin Rombach](https://github.com/rromb)\\*,\n[Andreas Blattmann](https://github.com/ablattmann)\\*,\n[Dominik Lorenz](https://github.com/qp-qp)\\,\n[Patrick Esser](https://github.com/pesser),\n[Björn Ommer](https://hci.iwr.uni-heidelberg.de/Staff/bommer)\u003cbr/\u003e\n\\* equal contribution\n\n\u003cp align=\"center\"\u003e\n\u003cimg src=assets/modelfigure.png /\u003e\n\u003c/p\u003e\n\n## News\n\n### July 2022\n- Inference code and model weights to run our [retrieval-augmented diffusion models](https://arxiv.org/abs/2204.11824) are now available. See [this section](#retrieval-augmented-diffusion-models).\n### April 2022\n- Thanks to [Katherine Crowson](https://github.com/crowsonkb), classifier-free guidance received a ~2x speedup and the [PLMS sampler](https://arxiv.org/abs/2202.09778) is available. See also [this PR](https://github.com/CompVis/latent-diffusion/pull/51).\n\n- Our 1.45B [latent diffusion LAION model](#text-to-image) was integrated into [Huggingface Spaces 🤗](https://huggingface.co/spaces) using [Gradio](https://github.com/gradio-app/gradio). Try out the Web Demo: [![Hugging Face Spaces](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https://huggingface.co/spaces/multimodalart/latentdiffusion)\n\n- More pre-trained LDMs are available: \n  - A 1.45B [model](#text-to-image) trained on the [LAION-400M](https://arxiv.org/abs/2111.02114) database.\n  - A class-conditional model on ImageNet, achieving a FID of 3.6 when using [classifier-free guidance](https://openreview.net/pdf?id=qw8AKxfYbI) Available via a [colab notebook](https://colab.research.google.com/github/CompVis/latent-diffusion/blob/main/scripts/latent_imagenet_diffusion.ipynb) [![][colab]][colab-cin].\n  \n## Requirements\nA suitable [conda](https://conda.io/) environment named `ldm` can be created\nand activated with:\n\n```\nconda env create -f environment.yaml\nconda activate ldm\n```\n\n# Pretrained Models\nA general list of all available checkpoints is available in via our [model zoo](#model-zoo).\nIf you use any of these models in your work, we are always happy to receive a [citation](#bibtex).\n\n## Retrieval Augmented Diffusion Models\n![rdm-figure](assets/rdm-preview.jpg)\nWe include inference code to run our retrieval-augmented diffusion models (RDMs) as described in [https://arxiv.org/abs/2204.11824](https://arxiv.org/abs/2204.11824).\n\n\nTo get started, install the additionally required python packages into your `ldm` environment\n```shell script\npip install transformers==4.19.2 scann kornia==0.6.4 torchmetrics==0.6.0\npip install git+https://github.com/arogozhnikov/einops.git\n```\nand download the trained weights (preliminary ceckpoints):\n\n```bash\nmkdir -p models/rdm/rdm768x768/\nwget -O models/rdm/rdm768x768/model.ckpt https://ommer-lab.com/files/rdm/model.ckpt\n```\nAs these models are conditioned on a set of CLIP image embeddings, our RDMs support different inference modes, \nwhich are described in the following.\n#### RDM with text-prompt only (no explicit retrieval needed)\nSince CLIP offers a shared image/text feature space, and RDMs learn to cover a neighborhood of a given\nexample during training, we can directly take a CLIP text embedding of a given prompt and condition on it.\nRun this mode via\n```\npython scripts/knn2img.py  --prompt \"a happy bear reading a newspaper, oil on canvas\"\n```\n\n#### RDM with text-to-image retrieval\n\nTo be able to run a RDM conditioned on a text-prompt and additionally images retrieved from this prompt, you will also need to download the corresponding retrieval database. \nWe provide two distinct databases extracted from the [Openimages-](https://storage.googleapis.com/openimages/web/index.html) and [ArtBench-](https://github.com/liaopeiyuan/artbench) datasets. \nInterchanging the databases results in different capabilities of the model as visualized below, although the learned weights are the same in both cases. \n\nDownload the retrieval-databases which contain the retrieval-datasets ([Openimages](https://storage.googleapis.com/openimages/web/index.html) (~11GB) and [ArtBench](https://github.com/liaopeiyuan/artbench) (~82MB)) compressed into CLIP image embeddings:\n```bash\nmkdir -p data/rdm/retrieval_databases\nwget -O data/rdm/retrieval_databases/artbench.zip https://ommer-lab.com/files/rdm/artbench_databases.zip\nwget -O data/rdm/retrieval_databases/openimages.zip https://ommer-lab.com/files/rdm/openimages_database.zip\nunzip data/rdm/retrieval_databases/artbench.zip -d data/rdm/retrieval_databases/\nunzip data/rdm/retrieval_databases/openimages.zip -d data/rdm/retrieval_databases/\n```\nWe also provide trained [ScaNN](https://github.com/google-research/google-research/tree/master/scann) search indices for ArtBench. Download and extract via\n```bash\nmkdir -p data/rdm/searchers\nwget -O data/rdm/searchers/artbench.zip https://ommer-lab.com/files/rdm/artbench_searchers.zip\nunzip data/rdm/searchers/artbench.zip -d data/rdm/searchers\n```\n\nSince the index for OpenImages is large (~21 GB), we provide a script to create and save it for usage during sampling. Note however,\nthat sampling with the OpenImages database will not be possible without this index. Run the script via\n```bash\npython scripts/train_searcher.py\n```\n\nRetrieval based text-guided sampling with visual nearest neighbors can be started via \n```\npython scripts/knn2img.py  --prompt \"a happy pineapple\" --use_neighbors --knn \u003cnumber_of_neighbors\u003e \n```\nNote that the maximum supported number of neighbors is 20. \nThe database can be changed via the cmd parameter ``--database`` which can be `[openimages, artbench-art_nouveau, artbench-baroque, artbench-expressionism, artbench-impressionism, artbench-post_impressionism, artbench-realism, artbench-renaissance, artbench-romanticism, artbench-surrealism, artbench-ukiyo_e]`.\nFor using `--database openimages`, the above script (`scripts/train_searcher.py`) must be executed before.\nDue to their relatively small size, the artbench datasetbases are best suited for creating more abstract concepts and do not work well for detailed text control. \n\n\n#### Coming Soon\n- better models\n- more resolutions\n- image-to-image retrieval\n\n## Text-to-Image\n![text2img-figure](assets/txt2img-preview.png) \n\n\nDownload the pre-trained weights (5.7GB)\n```\nmkdir -p models/ldm/text2img-large/\nwget -O models/ldm/text2img-large/model.ckpt https://ommer-lab.com/files/latent-diffusion/nitro/txt2img-f8-large/model.ckpt\n```\nand sample with\n```\npython scripts/txt2img.py --prompt \"a virus monster is playing guitar, oil on canvas\" --ddim_eta 0.0 --n_samples 4 --n_iter 4 --scale 5.0  --ddim_steps 50\n```\nThis will save each sample individually as well as a grid of size `n_iter` x `n_samples` at the specified output location (default: `outputs/txt2img-samples`).\nQuality, sampling speed and diversity are best controlled via the `scale`, `ddim_steps` and `ddim_eta` arguments.\nAs a rule of thumb, higher values of `scale` produce better samples at the cost of a reduced output diversity.   \nFurthermore, increasing `ddim_steps` generally also gives higher quality samples, but returns are diminishing for values \u003e 250.\nFast sampling (i.e. low values of `ddim_steps`) while retaining good quality can be achieved by using `--ddim_eta 0.0`.  \nFaster sampling (i.e. even lower values of `ddim_steps`) while retaining good quality can be achieved by using `--ddim_eta 0.0` and `--plms` (see [Pseudo Numerical Methods for Diffusion Models on Manifolds](https://arxiv.org/abs/2202.09778)).\n\n#### Beyond 256²\n\nFor certain inputs, simply running the model in a convolutional fashion on larger features than it was trained on\ncan sometimes result in interesting results. To try it out, tune the `H` and `W` arguments (which will be integer-divided\nby 8 in order to calculate the corresponding latent size), e.g. run\n\n```\npython scripts/txt2img.py --prompt \"a sunset behind a mountain range, vector image\" --ddim_eta 1.0 --n_samples 1 --n_iter 1 --H 384 --W 1024 --scale 5.0  \n```\nto create a sample of size 384x1024. Note, however, that controllability is reduced compared to the 256x256 setting. \n\nThe example below was generated using the above command. \n![text2img-figure-conv](assets/txt2img-convsample.png)\n\n\n\n## Inpainting\n![inpainting](assets/inpainting.png)\n\nDownload the pre-trained weights\n```\nwget -O models/ldm/inpainting_big/last.ckpt https://heibox.uni-heidelberg.de/f/4d9ac7ea40c64582b7c9/?dl=1\n```\n\nand sample with\n```\npython scripts/inpaint.py --indir data/inpainting_examples/ --outdir outputs/inpainting_results\n```\n`indir` should contain images `*.png` and masks `\u003cimage_fname\u003e_mask.png` like\nthe examples provided in `data/inpainting_examples`.\n\n## Class-Conditional ImageNet\n\nAvailable via a [notebook](scripts/latent_imagenet_diffusion.ipynb) [![][colab]][colab-cin].\n![class-conditional](assets/birdhouse.png)\n\n[colab]: \u003chttps://colab.research.google.com/assets/colab-badge.svg\u003e\n[colab-cin]: \u003chttps://colab.research.google.com/github/CompVis/latent-diffusion/blob/main/scripts/latent_imagenet_diffusion.ipynb\u003e\n\n\n## Unconditional Models\n\nWe also provide a script for sampling from unconditional LDMs (e.g. LSUN, FFHQ, ...). Start it via\n\n```shell script\nCUDA_VISIBLE_DEVICES=\u003cGPU_ID\u003e python scripts/sample_diffusion.py -r models/ldm/\u003cmodel_spec\u003e/model.ckpt -l \u003clogdir\u003e -n \u003c\\#samples\u003e --batch_size \u003cbatch_size\u003e -c \u003c\\#ddim steps\u003e -e \u003c\\#eta\u003e \n```\n\n# Train your own LDMs\n\n## Data preparation\n\n### Faces \nFor downloading the CelebA-HQ and FFHQ datasets, proceed as described in the [taming-transformers](https://github.com/CompVis/taming-transformers#celeba-hq) \nrepository.\n\n### LSUN \n\nThe LSUN datasets can be conveniently downloaded via the script available [here](https://github.com/fyu/lsun).\nWe performed a custom split into training and validation images, and provide the corresponding filenames\nat [https://ommer-lab.com/files/lsun.zip](https://ommer-lab.com/files/lsun.zip). \nAfter downloading, extract them to `./data/lsun`. The beds/cats/churches subsets should\nalso be placed/symlinked at `./data/lsun/bedrooms`/`./data/lsun/cats`/`./data/lsun/churches`, respectively.\n\n### ImageNet\nThe code will try to download (through [Academic\nTorrents](http://academictorrents.com/)) and prepare ImageNet the first time it\nis used. However, since ImageNet is quite large, this requires a lot of disk\nspace and time. If you already have ImageNet on your disk, you can speed things\nup by putting the data into\n`${XDG_CACHE}/autoencoders/data/ILSVRC2012_{split}/data/` (which defaults to\n`~/.cache/autoencoders/data/ILSVRC2012_{split}/data/`), where `{split}` is one\nof `train`/`validation`. It should have the following structure:\n\n```\n${XDG_CACHE}/autoencoders/data/ILSVRC2012_{split}/data/\n├── n01440764\n│   ├── n01440764_10026.JPEG\n│   ├── n01440764_10027.JPEG\n│   ├── ...\n├── n01443537\n│   ├── n01443537_10007.JPEG\n│   ├── n01443537_10014.JPEG\n│   ├── ...\n├── ...\n```\n\nIf you haven't extracted the data, you can also place\n`ILSVRC2012_img_train.tar`/`ILSVRC2012_img_val.tar` (or symlinks to them) into\n`${XDG_CACHE}/autoencoders/data/ILSVRC2012_train/` /\n`${XDG_CACHE}/autoencoders/data/ILSVRC2012_validation/`, which will then be\nextracted into above structure without downloading it again.  Note that this\nwill only happen if neither a folder\n`${XDG_CACHE}/autoencoders/data/ILSVRC2012_{split}/data/` nor a file\n`${XDG_CACHE}/autoencoders/data/ILSVRC2012_{split}/.ready` exist. Remove them\nif you want to force running the dataset preparation again.\n\n\n## Model Training\n\nLogs and checkpoints for trained models are saved to `logs/\u003cSTART_DATE_AND_TIME\u003e_\u003cconfig_spec\u003e`.\n\n### Training autoencoder models\n\nConfigs for training a KL-regularized autoencoder on ImageNet are provided at `configs/autoencoder`.\nTraining can be started by running\n```\nCUDA_VISIBLE_DEVICES=\u003cGPU_ID\u003e python main.py --base configs/autoencoder/\u003cconfig_spec\u003e.yaml -t --gpus 0,    \n```\nwhere `config_spec` is one of {`autoencoder_kl_8x8x64`(f=32, d=64), `autoencoder_kl_16x16x16`(f=16, d=16), \n`autoencoder_kl_32x32x4`(f=8, d=4), `autoencoder_kl_64x64x3`(f=4, d=3)}.\n\nFor training VQ-regularized models, see the [taming-transformers](https://github.com/CompVis/taming-transformers) \nrepository.\n\n### Training LDMs \n\nIn ``configs/latent-diffusion/`` we provide configs for training LDMs on the LSUN-, CelebA-HQ, FFHQ and ImageNet datasets. \nTraining can be started by running\n\n```shell script\nCUDA_VISIBLE_DEVICES=\u003cGPU_ID\u003e python main.py --base configs/latent-diffusion/\u003cconfig_spec\u003e.yaml -t --gpus 0,\n``` \n\nwhere ``\u003cconfig_spec\u003e`` is one of {`celebahq-ldm-vq-4`(f=4, VQ-reg. autoencoder, spatial size 64x64x3),`ffhq-ldm-vq-4`(f=4, VQ-reg. autoencoder, spatial size 64x64x3),\n`lsun_bedrooms-ldm-vq-4`(f=4, VQ-reg. autoencoder, spatial size 64x64x3),\n`lsun_churches-ldm-vq-4`(f=8, KL-reg. autoencoder, spatial size 32x32x4),`cin-ldm-vq-8`(f=8, VQ-reg. autoencoder, spatial size 32x32x4)}.\n\n# Model Zoo \n\n## Pretrained Autoencoding Models\n![rec2](assets/reconstruction2.png)\n\nAll models were trained until convergence (no further substantial improvement in rFID).\n\n| Model                   | rFID vs val | train steps           |PSNR           | PSIM          | Link                                                                                                                                                  | Comments              \n|-------------------------|------------|----------------|----------------|---------------|-------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------|\n| f=4, VQ (Z=8192, d=3)   | 0.58       | 533066 | 27.43  +/- 4.26 | 0.53 +/- 0.21 |     https://ommer-lab.com/files/latent-diffusion/vq-f4.zip                   |  |\n| f=4, VQ (Z=8192, d=3)   | 1.06       | 658131 | 25.21 +/-  4.17 | 0.72 +/- 0.26 | https://heibox.uni-heidelberg.de/f/9c6681f64bb94338a069/?dl=1  | no attention          |\n| f=8, VQ (Z=16384, d=4)  | 1.14       | 971043 | 23.07 +/- 3.99 | 1.17 +/- 0.36 |       https://ommer-lab.com/files/latent-diffusion/vq-f8.zip                     |                       |\n| f=8, VQ (Z=256, d=4)    | 1.49       | 1608649 | 22.35 +/- 3.81 | 1.26 +/- 0.37 |   https://ommer-lab.com/files/latent-diffusion/vq-f8-n256.zip |  \n| f=16, VQ (Z=16384, d=8) | 5.15       | 1101166 | 20.83 +/- 3.61 | 1.73 +/- 0.43 |             https://heibox.uni-heidelberg.de/f/0e42b04e2e904890a9b6/?dl=1                        |                       |\n|                         |            |  |                |               |                                                                                                                                                    |                       |\n| f=4, KL                 | 0.27       | 176991 | 27.53 +/- 4.54 | 0.55 +/- 0.24 |     https://ommer-lab.com/files/latent-diffusion/kl-f4.zip                                   |                       |\n| f=8, KL                 | 0.90       | 246803 | 24.19 +/- 4.19 | 1.02 +/- 0.35 |             https://ommer-lab.com/files/latent-diffusion/kl-f8.zip                            |                       |\n| f=16, KL     (d=16)     | 0.87       | 442998 | 24.08 +/- 4.22 | 1.07 +/- 0.36 |      https://ommer-lab.com/files/latent-diffusion/kl-f16.zip                                  |                       |\n | f=32, KL     (d=64)     | 2.04       | 406763 | 22.27 +/- 3.93 | 1.41 +/- 0.40 |             https://ommer-lab.com/files/latent-diffusion/kl-f32.zip                            |                       |\n\n### Get the models\n\nRunning the following script downloads und extracts all available pretrained autoencoding models.   \n```shell script\nbash scripts/download_first_stages.sh\n```\n\nThe first stage models can then be found in `models/first_stage_models/\u003cmodel_spec\u003e`\n\n\n\n## Pretrained LDMs\n| Datset                          |   Task    | Model        | FID           | IS              | Prec | Recall | Link                                                                                                                                                                                   | Comments                                        \n|---------------------------------|------|--------------|---------------|-----------------|------|------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------|\n| CelebA-HQ                       | Unconditional Image Synthesis    |  LDM-VQ-4 (200 DDIM steps, eta=0)| 5.11 (5.11)          | 3.29            | 0.72    | 0.49 |    https://ommer-lab.com/files/latent-diffusion/celeba.zip     |                                                 |  \n| FFHQ                            | Unconditional Image Synthesis    |  LDM-VQ-4 (200 DDIM steps, eta=1)| 4.98 (4.98)  | 4.50 (4.50)   | 0.73 | 0.50 |              https://ommer-lab.com/files/latent-diffusion/ffhq.zip                                              |                                                 |\n| LSUN-Churches                   | Unconditional Image Synthesis   |  LDM-KL-8 (400 DDIM steps, eta=0)| 4.02 (4.02) | 2.72 | 0.64 | 0.52 |         https://ommer-lab.com/files/latent-diffusion/lsun_churches.zip        |                                                 |  \n| LSUN-Bedrooms                   | Unconditional Image Synthesis   |  LDM-VQ-4 (200 DDIM steps, eta=1)| 2.95 (3.0)          | 2.22 (2.23)| 0.66 | 0.48 | https://ommer-lab.com/files/latent-diffusion/lsun_bedrooms.zip |                                                 |  \n| ImageNet                        | Class-conditional Image Synthesis | LDM-VQ-8 (200 DDIM steps, eta=1) | 7.77(7.76)* /15.82** | 201.56(209.52)* /78.82** | 0.84* / 0.65** | 0.35* / 0.63** |   https://ommer-lab.com/files/latent-diffusion/cin.zip                                                                   | *: w/ guiding, classifier_scale 10  **: w/o guiding, scores in bracket calculated with script provided by [ADM](https://github.com/openai/guided-diffusion) |   \n| Conceptual Captions             |  Text-conditional Image Synthesis | LDM-VQ-f4 (100 DDIM steps, eta=0) | 16.79         | 13.89           | N/A | N/A |              https://ommer-lab.com/files/latent-diffusion/text2img.zip                                | finetuned from LAION                            |   \n| OpenImages                      | Super-resolution   | LDM-VQ-4     | N/A            | N/A               | N/A    | N/A    |                                    https://ommer-lab.com/files/latent-diffusion/sr_bsr.zip                                    | BSR image degradation                           |\n| OpenImages                      | Layout-to-Image Synthesis    | LDM-VQ-4 (200 DDIM steps, eta=0) | 32.02         | 15.92           | N/A    | N/A    |                  https://ommer-lab.com/files/latent-diffusion/layout2img_model.zip                                           |                                                 | \n| Landscapes      |  Semantic Image Synthesis   | LDM-VQ-4  | N/A             | N/A               | N/A    | N/A    |           https://ommer-lab.com/files/latent-diffusion/semantic_synthesis256.zip                                    |                                                 |\n| Landscapes       |  Semantic Image Synthesis   | LDM-VQ-4  | N/A             | N/A               | N/A    | N/A    |           https://ommer-lab.com/files/latent-diffusion/semantic_synthesis.zip                                    |             finetuned on resolution 512x512                                     |\n\n\n### Get the models\n\nThe LDMs listed above can jointly be downloaded and extracted via\n\n```shell script\nbash scripts/download_models.sh\n```\n\nThe models can then be found in `models/ldm/\u003cmodel_spec\u003e`.\n\n\n\n## Coming Soon...\n\n* More inference scripts for conditional LDMs.\n* In the meantime, you can play with our colab notebook https://colab.research.google.com/drive/1xqzUi2iXQXDqXBHQGP9Mqt2YrYW6cx-J?usp=sharing\n\n## Comments \n\n- Our codebase for the diffusion models builds heavily on [OpenAI's ADM codebase](https://github.com/openai/guided-diffusion)\nand [https://github.com/lucidrains/denoising-diffusion-pytorch](https://github.com/lucidrains/denoising-diffusion-pytorch). \nThanks for open-sourcing!\n\n- The implementation of the transformer encoder is from [x-transformers](https://github.com/lucidrains/x-transformers) by [lucidrains](https://github.com/lucidrains?tab=repositories). \n\n\n## BibTeX\n\n```\n@misc{rombach2021highresolution,\n      title={High-Resolution Image Synthesis with Latent Diffusion Models}, \n      author={Robin Rombach and Andreas Blattmann and Dominik Lorenz and Patrick Esser and Björn Ommer},\n      year={2021},\n      eprint={2112.10752},\n      archivePrefix={arXiv},\n      primaryClass={cs.CV}\n}\n\n@misc{https://doi.org/10.48550/arxiv.2204.11824,\n  doi = {10.48550/ARXIV.2204.11824},\n  url = {https://arxiv.org/abs/2204.11824},\n  author = {Blattmann, Andreas and Rombach, Robin and Oktay, Kaan and Ommer, Björn},\n  keywords = {Computer Vision and Pattern Recognition (cs.CV), FOS: Computer and information sciences, FOS: Computer and information sciences},\n  title = {Retrieval-Augmented Diffusion Models},\n  publisher = {arXiv},\n  year = {2022},  \n  copyright = {arXiv.org perpetual, non-exclusive license}\n}\n\n\n```\n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FCompVis%2Flatent-diffusion","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FCompVis%2Flatent-diffusion","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FCompVis%2Flatent-diffusion/lists"}