{"id":18317332,"url":"https://github.com/compvis/imagebart","last_synced_at":"2025-04-05T21:32:20.496Z","repository":{"id":48160416,"uuid":"398005609","full_name":"CompVis/imagebart","owner":"CompVis","description":"ImageBART: Bidirectional Context with Multinomial Diffusion for Autoregressive Image Synthesis","archived":false,"fork":false,"pushed_at":"2022-03-14T20:03:30.000Z","size":55946,"stargazers_count":124,"open_issues_count":4,"forks_count":14,"subscribers_count":17,"default_branch":"master","last_synced_at":"2025-03-21T12:07:18.604Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"https://arxiv.org/abs/2108.08827","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/CompVis.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2021-08-19T16:22:29.000Z","updated_at":"2024-12-22T10:24:41.000Z","dependencies_parsed_at":"2022-07-22T11:17:29.190Z","dependency_job_id":null,"html_url":"https://github.com/CompVis/imagebart","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/CompVis%2Fimagebart","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/CompVis%2Fimagebart/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/CompVis%2Fimagebart/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/CompVis%2Fimagebart/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/CompVis","download_url":"https://codeload.github.com/CompVis/imagebart/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247406080,"owners_count":20933803,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-05T18:05:46.528Z","updated_at":"2025-04-05T21:32:16.352Z","avatar_url":"https://github.com/CompVis.png","language":"Python","readme":"# ImageBART\n#### [NeurIPS 2021](https://nips.cc/)\n\n![teaser](assets/modelfigure.png)\n\u003cbr/\u003e\n[Patrick Esser](https://github.com/pesser)\\*,\n[Robin Rombach](https://github.com/rromb)\\*,\n[Andreas Blattmann](https://github.com/ablattmann)\\*,\n[Björn Ommer](https://ommer-lab.com/)\u003cbr/\u003e\n\\* equal contribution\n\n[arXiv](https://arxiv.org/abs/2108.08827) | [BibTeX](#bibtex) | [Poster](assets/imagebart_poster.pdf) \n\n## Requirements\nA suitable [conda](https://conda.io/) environment named `imagebart` can be created\nand activated with:\n\n```\nconda env create -f environment.yaml\nconda activate imagebart\n```\n\n## Get the Models\n\nWe provide pretrained weights and hyperparameters for models trained on the following datasets:\n\n* FFHQ: \n    * [4 scales, geometric noise schedule](https://ommer-lab.com/files/ffhq_4_scales_geometric.zip): `wget -c https://ommer-lab.com/files/ffhq_4_scales_geometric.zip`\n    * [2 scales, custom noise schedule](https://ommer-lab.com/files/ffhq_2_scales_custom.zip): `wget -c https://ommer-lab.com/files/ffhq_2_scales_custom.zip`\n* LSUN, 3 scales, custom noise schedules:\n    * [Churches](https://ommer-lab.com/files/churches_3_scales.zip): `wget -c https://ommer-lab.com/files/churches_3_scales.zip`\n    * [Bedrooms](https://ommer-lab.com/files/bedrooms_3_scales.zip): `wget -c https://ommer-lab.com/files/bedrooms_3_scales.zip`\n    * [Cats](https://ommer-lab.com/files/cats_3_scales.zip): `wget -c https://ommer-lab.com/files/cats_3_scales.zip`\n* Class-conditional ImageNet:\n    * [5 scales, custom noise schedule](https://ommer-lab.com/files/cin_5_scales_custom.zip): `wget -c https://ommer-lab.com/files/cin_5_scales_custom.zip`\n    * [4 scales, geometric noise schedule](https://ommer-lab.com/files/cin_4_scales_geometric.zip): `wget -c https://ommer-lab.com/files/cin_4_scales_geometric.zip`\n\nDownload the respective files and extract their contents to a directory `./models/`.\n\nMoreover, we provide all the required VQGANs as a .zip at [https://ommer-lab.com/files/vqgan.zip](https://ommer-lab.com/files/vqgan.zip), \nwhich contents have to be extracted to `./vqgan/`.    \n\n## Get the Data\nRunning the training configs or the [inpainting script](scripts/inpaint_imagebart.py) requires \na dataset available locally. For ImageNet and FFHQ, see this repo's parent directory [taming-transformers](https://github.com/CompVis/taming-transformers).\nThe LSUN datasets can be conveniently downloaded via the script available [here](https://github.com/fyu/lsun).\nWe performed a custom split into training and validation images, and provide the corresponding filenames\nat [https://ommer-lab.com/files/lsun.zip](https://ommer-lab.com/files/lsun.zip). \nAfter downloading, extract them to `./data/lsun`. The beds/cats/churches subsets should\nalso be placed/symlinked at `./data/lsun/bedrooms`/`./data/lsun/cats`/`./data/lsun/churches`, respectively.\n\n## Inference\n\n### Unconditional Sampling\nWe provide a script for sampling from unconditional models trained on the LSUN-{bedrooms,bedrooms,cats}- and FFHQ-datasets.\n\n#### FFHQ\n\nOn the FFHQ dataset, we provide two distinct pretrained models, one with a chain of length 4 and a geometric noise schedule as proposed by Sohl-Dickstein et al. [[1]](##References) , and another one with a chain of length 2 and a custom schedule.\nThese models can be started with\n```shell script\nCUDA_VISIBLE_DEVICES=\u003cgpu_id\u003e streamlit run scripts/sample_imagebart.py configs/sampling/ffhq/\u003cconfig\u003e\n```\n\n#### LSUN\nFor the models trained on the LSUN-datasets, use \n```shell script\nCUDA_VISIBLE_DEVICES=\u003cgpu_id\u003e streamlit run scripts/sample_imagebart.py configs/sampling/lsun/\u003cconfig\u003e\n```\n\n### Class Conditional Sampling on ImageNet\n\n\nTo sample from class-conditional ImageNet models, use\n```shell script\nCUDA_VISIBLE_DEVICES=\u003cgpu_id\u003e streamlit run scripts/sample_imagebart.py configs/sampling/imagenet/\u003cconfig\u003e\n```\n\n### Image Editing with Unconditional Models\n\nWe also provide a script for image editing with our unconditional models. For our FFHQ-model with geometric schedule this can be started with\n```shell script\nCUDA_VISIBLE_DEVICES=\u003cgpu_id\u003e streamlit run scripts/inpaint_imagebart.py configs/sampling/ffhq/ffhq_4scales_geometric.yaml\n```\nresulting in samples similar to the following.\n![teaser](assets/image_editing.png)\n\n\n## Training\nIn general, there are two options for training the autoregressive transition probabilities of the \nreverse Markov chain: (i) train them jointly, taking into account a weighting of the \nindividual scale contributions, or (ii) train them independently, which means that each \ntraining process optimizes a single transition and the scales must be stacked after training. \nWe conduct most of our experiments using the latter option, but provide configurations for both cases.\n\n### Training Scales Independently\nFor training scales independently, each transition requires a seperate optimization process, which can \nstarted via\n\n```\nCUDA_VISIBLE_DEVICES=\u003cgpu_id\u003e python main.py --base configs/\u003cdata\u003e/\u003cconfig\u003e.yaml -t --gpus 0, \n```\n\nWe provide training configs for a four scale training of FFHQ using a geometric schedule, \na four scale geometric training on ImageNet and various three-scale experiments on LSUN.\nSee also the overview of our [pretrained models](#get-the-models).\n\n\n### Training Scales Jointly\n\nFor completeness, we also provide a config to run a joint training with 4 scales on FFHQ.\nTraining can be started by running\n\n```\nCUDA_VISIBLE_DEVICES=\u003cgpu_id\u003e python main.py --base configs/ffhq/ffhq_4_scales_joint-training.yaml -t --gpus 0, \n```\n\n\n## Shout-Outs\nMany thanks to all who make their work and implementations publicly available. \nFor this work, these were in particular: \n\n- The extremely clear and extensible encoder-decoder transformer implementations by [lucidrains](https://github.com/lucidrains): \nhttps://github.com/lucidrains/x-transformers\n- Emiel Hoogeboom et al's paper on multinomial diffusion and argmax flows: https://arxiv.org/abs/2102.05379 \n\n\n![teaser](assets/foxchain.png)\n\n## References\n\n[1] Sohl-Dickstein, J., Weiss, E., Maheswaranathan, N. \u0026amp; Ganguli, S.. (2015). Deep Unsupervised Learning using Nonequilibrium Thermodynamics. \u003ci\u003eProceedings of the 32nd International Conference on Machine Learning\n\n## Bibtex\n\n```\n@article{DBLP:journals/corr/abs-2108-08827,\n  author    = {Patrick Esser and\n               Robin Rombach and\n               Andreas Blattmann and\n               Bj{\\\"{o}}rn Ommer},\n  title     = {ImageBART: Bidirectional Context with Multinomial Diffusion for Autoregressive\n               Image Synthesis},\n  journal   = {CoRR},\n  volume    = {abs/2108.08827},\n  year      = {2021}\n}\n```","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcompvis%2Fimagebart","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcompvis%2Fimagebart","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcompvis%2Fimagebart/lists"}