{"id":18317334,"url":"https://github.com/compvis/fm-boosting","last_synced_at":"2025-04-13T06:28:36.236Z","repository":{"id":211569619,"uuid":"729486401","full_name":"CompVis/fm-boosting","owner":"CompVis","description":"FMBoost: Boosting Latent Diffusion with Flow Matching (ECCV 2024 Oral)","archived":false,"fork":false,"pushed_at":"2024-12-03T19:23:02.000Z","size":112449,"stargazers_count":226,"open_issues_count":5,"forks_count":5,"subscribers_count":28,"default_branch":"main","last_synced_at":"2025-04-04T05:43:12.568Z","etag":null,"topics":["diffusion-models","flow-matching","stable-diffusion","super-resolution"],"latest_commit_sha":null,"homepage":"https://compvis.github.io/fm-boosting/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/CompVis.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-12-09T11:40:05.000Z","updated_at":"2025-04-04T03:04:25.000Z","dependencies_parsed_at":null,"dependency_job_id":"f4e9d2b7-612f-4067-a2f6-3161f2c4cf3d","html_url":"https://github.com/CompVis/fm-boosting","commit_stats":{"total_commits":32,"total_committers":7,"mean_commits":4.571428571428571,"dds":0.75,"last_synced_commit":"8f38bd316175c4e6ad644faed7e319a06525cd07"},"previous_names":["compvis/fm-boosting"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/CompVis%2Ffm-boosting","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/CompVis%2Ffm-boosting/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/CompVis%2Ffm-boosting/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/CompVis%2Ffm-boosting/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/CompVis","download_url":"https://codeload.github.com/CompVis/fm-boosting/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248674121,"owners_count":21143650,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["diffusion-models","flow-matching","stable-diffusion","super-resolution"],"created_at":"2024-11-05T18:05:46.704Z","updated_at":"2025-04-13T06:28:36.208Z","avatar_url":"https://github.com/CompVis.png","language":"Python","readme":"\u003cp align=\"center\"\u003e\n \u003ch2 align=\"center\"\u003e🚀 Boosting Latent Diffusion with Flow Matching\u003c/h2\u003e\n \u003cp align=\"center\"\u003e \n Johannes Schusterbauer\n\u003csup\u003e*\u003c/sup\u003e · Ming Gui\u003csup\u003e*\u003c/sup\u003e · Pingchuan Ma\u003csup\u003e*\u003c/sup\u003e · \n \u003c!-- \u003c/p\u003e\n  \u003cp align=\"center\"\u003e  --\u003e\n Nick Stracke · Stefan A. Baumann ·  Vincent Tao Hu · Björn Ommer\n \u003c/p\u003e\n \u003cp align=\"center\"\u003e \n    \u003cb\u003eCompVis Group @ LMU Munich\u003c/b\u003e\n \u003c/p\u003e\n \u003c/p\u003e\n  \u003cp align=\"center\"\u003e \u003csup\u003e*\u003c/sup\u003e \u003ci\u003eequal contribution\u003c/i\u003e \u003c/p\u003e\n\u003c/p\u003e\n\n\u003cp align=\"center\"\u003e\n \u003cb\u003eECCV 2024 Oral\u003c/b\u003e\n\u003c/p\u003e\n\n\n[![Website](assets/figs/badge-website.svg)](https://compvis.github.io/fm-boosting/)\n[![Paper](https://img.shields.io/badge/arXiv-PDF-b31b1b)](https://arxiv.org/abs/2312.07360)\n\n\n![cover](/assets/figs/cover-fig.png)\n\n**Samples synthesized in $1024^2$ px.** We elevate DMs and similar architectures to a higher-resolution domain, achieving exceptionally rapid processing speeds. We leverage the [Latent Consistency Models (LCM)](https://arxiv.org/abs/2310.04378), distilled from [SD1.5](https://arxiv.org/abs/2112.10752)  and [SDXL](https://arxiv.org/abs/2307.01952), respectively. To achieve the same resolution as LCM (SDXL), we boost LCM-SD1.5 with our general Coupling Flow Matching (CFM) model. This yields a further speedup in the synthesis process and enables the generation of high-resolution images of high fidelity in an average $`0.347`$ seconds. The LCM-SDXL model fails to produce competitive results within this shortened timeframe, highlighting the effectiveness of our approach in achieving both speed and quality in image synthesis.\n\n\n## 📝 Overview\n\nIn this work, we leverage the complementary strengths of Diffusion Models (DMs), Flow Matching models (FMs), and Variational AutoEncoders (VAEs): the diversity of stochastic DMs, the speed of FMs in training and inference stages, and the efficiency of a convolutional decoder to map latents into pixel space. This synergy results in a small diffusion model that excels in generating diverse samples at a low resolution. Flow Matching then takes a direct path from this lower-resolution representation to a higher-resolution latent, which is subsequently translated into a high-resolution image by a convolutional decoder. We achieve competitive high-resolution image synthesis at $1024^2$ and $2048^2$ pixels with minimal computational cost.\n\n## 🚀 Pipeline\n\nDuring training we feed both a low- and a high-res image through the pre-trained encoder to obtain a low- and a high-res latent code. Our model is trained to regress a vector field which forms a probability path from the low- to the high-res latent within $t \\in [0, 1]$.\n\n![training](assets/figs/pipeline-train.jpg)\n\nAt inference we can take any diffusion model, generate the low-res latent, and then use our Coupling Flow Matching model to synthesize the higher dimensional latent code. Finally, the pre-trained decoder projects the latent code back to pixel space, resulting in $1024^2$ or $2048^2$ images.\n\n![inference](assets/figs/pipeline-inf.jpg)\n\n\n## 📈 Results\n\nWe show zero-shot quantitative comparison of our method against other state-of-the-art methods on the COCO dataset. Our method achieves a good trade-off between performance and computational cost.\n\n![results-coco](assets/figs/coco-comparison.jpg)\n\nWe can cascade our models to increase the resolution of a $128^2$ px LDM 1.5 generation to a $2048^2$ px output.\n\n![cascading](assets/figs/128_to_2k-universe.jpg)\n\nYou can find more qualitative results on our [project page](https://compvis.github.io/fm-boosting/).\n\n## 🔥 Usage\n\n###\nPlease execute the following command to download the first stage autoencoder checkpoint:\n```\nmkdir checkpoints\nwget -O checkpoints/sd_ae.ckpt https://www.dropbox.com/scl/fi/lvfvy7qou05kxfbqz5d42/sd_ae.ckpt?rlkey=fvtu2o48namouu9x3w08olv3o\u0026st=vahu44z5\u0026dl=0\n```\n\n### Data\nFor training the model, you have to provide a config file. An example config can be found in `configs/flow400_64-128/unet-base_psu.yaml`. Please customize the data part to your use case. \n\nIn order to speed up the training process, we pre-computed the latents. Your dataloader should return a batch with the following keys, i.e. `image`, `latent`, and `latent_lowres`. Please notice that we use pixel space upsampling (*PSU* in the paper), therefore the `latent` and `latent_lowres` should have the same spatial resolution (refer to L228 `extract_from_batch()` in `fmboost/trainer.py`). \n\n\n### Training\n\nAfterwards, you can start the training with\n\n```bash\npython3 train.py --config configs/flow400_64-128/unet-base_psu.yaml --name your-name --use_wandb\n```\n\nthe flag `--use_wandb` enables logging to WandB. By default, it only logs metrics to a CSV file and tensorboard. All logs are stored in the `logs` folder. You can also define a folder structure for your experiment name, e.g. `logs/exp_name`.\n\n### Resume checkpoint\n\nIf you want to resume from a checkpoint, just add the additional parameter\n\n```bash\n... --resume_checkpoint path_to_your_checkpoint.ckpt\n```\n\nThis resumes all states from the checkpoint (i.e. optimizer states). If you want to just load weights in a non-strict manner from some checkpoint, use the `--load_weights` argument.\n\n### Inference\n*We will release a pretrained checkpoint and the corresponding inference jupyter notebook soon. Stay tuned!*\n\n\n\n## 🎓 Citation\n\nPlease cite our paper:\n\n```bibtex\n@InProceedings{schusterbauer2024boosting,\n      title={Boosting Latent Diffusion with Flow Matching}, \n      author={Johannes Schusterbauer and Ming Gui and Pingchuan Ma and Nick Stracke and Stefan A. Baumann and Vincent Tao Hu and Björn Ommer},\n      booktitle = {ECCV},\n      year={2024}\n}\n```\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcompvis%2Ffm-boosting","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcompvis%2Ffm-boosting","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcompvis%2Ffm-boosting/lists"}