{"id":27373406,"url":"https://github.com/lakonik/gmflow","last_synced_at":"2025-06-23T12:34:10.273Z","repository":{"id":286721213,"uuid":"959528076","full_name":"Lakonik/GMFlow","owner":"Lakonik","description":"[ICML 2025] Gaussian Mixture Flow Matching Models (GMFlow)","archived":false,"fork":false,"pushed_at":"2025-05-28T23:25:44.000Z","size":2023,"stargazers_count":100,"open_issues_count":0,"forks_count":3,"subscribers_count":3,"default_branch":"main","last_synced_at":"2025-06-14T05:04:34.940Z","etag":null,"topics":["diffusion","diffusion-model","flow-matching","generative-ai","generative-model","image-generation","pytorch"],"latest_commit_sha":null,"homepage":"https://arxiv.org/abs/2504.05304","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Lakonik.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-04-02T23:45:44.000Z","updated_at":"2025-06-04T07:02:15.000Z","dependencies_parsed_at":"2025-06-14T05:04:37.463Z","dependency_job_id":"134e668e-d7ff-4f39-8d09-2edba09ae192","html_url":"https://github.com/Lakonik/GMFlow","commit_stats":null,"previous_names":["lakonik/gmflow"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/Lakonik/GMFlow","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Lakonik%2FGMFlow","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Lakonik%2FGMFlow/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Lakonik%2FGMFlow/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Lakonik%2FGMFlow/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Lakonik","download_url":"https://codeload.github.com/Lakonik/GMFlow/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Lakonik%2FGMFlow/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":261479611,"owners_count":23164732,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["diffusion","diffusion-model","flow-matching","generative-ai","generative-model","image-generation","pytorch"],"created_at":"2025-04-13T11:14:37.652Z","updated_at":"2025-06-23T12:34:05.254Z","avatar_url":"https://github.com/Lakonik.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Gaussian Mixture Flow Matching Models (GMFlow)\n\nOfficial PyTorch implementation of the paper:\n\n**Gaussian Mixture Flow Matching Models [[arXiv](https://arxiv.org/abs/2504.05304)]**\n\u003cbr\u003e\nIn ICML 2025\n\u003cbr\u003e\n[Hansheng Chen](https://lakonik.github.io/)\u003csup\u003e1\u003c/sup\u003e, \n[Kai Zhang](https://kai-46.github.io/website/)\u003csup\u003e2\u003c/sup\u003e,\n[Hao Tan](https://research.adobe.com/person/hao-tan/)\u003csup\u003e2\u003c/sup\u003e,\n[Zexiang Xu](https://zexiangxu.github.io/)\u003csup\u003e3\u003c/sup\u003e, \n[Fujun Luan](https://research.adobe.com/person/fujun/)\u003csup\u003e2\u003c/sup\u003e,\n[Leonidas Guibas](https://geometry.stanford.edu/?member=guibas)\u003csup\u003e1\u003c/sup\u003e,\n[Gordon Wetzstein](http://web.stanford.edu/~gordonwz/)\u003csup\u003e1\u003c/sup\u003e, \n[Sai Bi](https://sai-bi.github.io/)\u003csup\u003e2\u003c/sup\u003e\u003cbr\u003e\n\u003csup\u003e1\u003c/sup\u003eStanford University, \u003csup\u003e2\u003c/sup\u003eAdobe Research, \u003csup\u003e3\u003c/sup\u003eHillbot\n\u003cbr\u003e\n\n\u003cimg src=\"gmdit.png\" width=\"600\"  alt=\"\"/\u003e\n\n\u003cimg src=\"gmdit_results.png\" width=\"1000\"  alt=\"\"/\u003e\n\n## Highlights\n\nGMFlow is an extension of diffusion/flow matching models.\n\n- **Gaussian Mixture Output**: GMFlow expands the network's output layer to predict a Gaussian mixture (GM) distribution of flow velocity. Standard diffusion/flow matching models are special cases of GMFlow with a single Gaussian component.\n\n- **Precise Few-Step Sampling**: GMFlow introduces novel **GM-SDE** and **GM-ODE** solvers that leverage analytic denoising distributions and velocity fields for precise few-step sampling.\n\n- **Improved Classifier-Free Guidance (CFG)**: GMFlow introduces a **probabilistic guidance** scheme that mitigates the over-saturation issues of CFG and improves image generation quality.\n\n- **Efficiency**: GMFlow maintains similar training and inference costs to standard diffusion/flow matching models.\n\n## Installation\n\nThe code has been tested in the environment described as follows:\n\n- Linux (tested on Ubuntu 20 and above)\n- [CUDA Toolkit](https://developer.nvidia.com/cuda-toolkit-archive) 11.8 and above\n- [PyTorch](https://pytorch.org/get-started/previous-versions/) 2.1 and above\n\nOther dependencies can be installed via `pip install -r requirements.txt`. \n\nAn example of installation commands is shown below (assuming you have already installed CUDA Toolkit and configured the environment variables):\n\n```bash\n# Create conda environment\nconda create -y -n gmflow python=3.10 numpy=1.26 ninja\nconda activate gmflow\n\n# Goto https://pytorch.org/ to select the appropriate version\npip install torch torchvision\n\n# Install other dependencies\npip install -r requirements.txt\n```\n\nThis codebase may work on Windows systems, but it has not been tested extensively.\n\n## GM-DiT ImageNet 256x256\n\n### Inference\n\nWe provide a [Diffusers pipeline](lib/pipelines/gmdit_pipeline.py) for easy inference. The following code demonstrates how to sample images from the pretrained GM-DiT model using the GM-ODE 2 solver and the GM-SDE 2 solver.\n\n```python\nimport torch\nfrom huggingface_hub import snapshot_download\nfrom lib.models.diffusions.schedulers import FlowEulerODEScheduler, GMFlowSDEScheduler\nfrom lib.pipelines.gmdit_pipeline import GMDiTPipeline\n\n# Currently the pipeline can only load local checkpoints, so we need to download the checkpoint first\nckpt = snapshot_download(repo_id='Lakonik/gmflow_imagenet_k8_ema')\npipe = GMDiTPipeline.from_pretrained(ckpt, variant='bf16', torch_dtype=torch.bfloat16)\npipe = pipe.to('cuda')\n\n# Pick words that exist in ImageNet\nwords = ['jay', 'magpie']\nclass_ids = pipe.get_label_ids(words)\n\n# Sample using GM-ODE 2 solver\npipe.scheduler = FlowEulerODEScheduler.from_config(pipe.scheduler.config)\ngenerator = torch.manual_seed(42)\noutput = pipe(\n    class_labels=class_ids,\n    guidance_scale=0.45,\n    num_inference_steps=32,\n    num_inference_substeps=4,\n    output_mode='mean',\n    order=2,\n    generator=generator)\nfor i, (word, image) in enumerate(zip(words, output.images)):\n    image.save(f'{i:03d}_{word}_gmode2_step32.png')\n\n# Sample using GM-SDE 2 solver (the first run may be slow due to CUDA compilation)\npipe.scheduler = GMFlowSDEScheduler.from_config(pipe.scheduler.config)\ngenerator = torch.manual_seed(42)\noutput = pipe(\n    class_labels=class_ids,\n    guidance_scale=0.45,\n    num_inference_steps=32,\n    num_inference_substeps=1,\n    output_mode='sample',\n    order=2,\n    generator=generator)\nfor i, (word, image) in enumerate(zip(words, output.images)):\n    image.save(f'{i:03d}_{word}_gmsde2_step32.png')\n```\n\nThe results will be saved under the current directory.\n\n\u003cimg src=\"example_results.png\" width=\"800\"  alt=\"\"/\u003e\n\n### Before Training: Data Preparation\n\nDownload [ILSVRC2012_img_train.tar](https://www.image-net.org/challenges/LSVRC/2012/2012-downloads.php) and the [metadata](http://dl.caffe.berkeleyvision.org/caffe_ilsvrc12.tar.gz). Extract the downloaded archives according to the following folder tree (or use symlinks).\n```\n./\n├── configs/\n├── data/\n│   └── imagenet/\n│       ├── train/\n│       │   ├── n01440764/\n│       │   │   ├── n01440764_10026.JPEG\n│       │   │   ├── n01440764_10027.JPEG\n│       │   │   …\n│       │   ├── n01443537/\n│       │   …\n│       ├── imagenet1000_clsidx_to_labels.txt\n│       ├── train.txt\n|       …\n├── lib/\n├── tools/\n…\n```\n\nRun the following command to prepare the ImageNet dataset using DDP on 1 node with 8 GPUs\n\n```bash\ntorchrun --nnodes=1 --nproc_per_node=8 tools/prepare_imagenet_dit.py\n```\n\n### Training\n\nRun the following command to train the model using DDP on 1 node with 8 GPUs:\n\n```bash\ntorchrun --nnodes=1 --nproc_per_node=8 tools/train.py configs/gmflow_imagenet_k8_8gpus.py --launcher pytorch --diff_seed\n```\n\nAlternatively, you can start single-node DDP training from a Python script:\n\n```bash\npython train.py configs/gmflow_imagenet_k8_8gpus.py --gpu-ids 0 1 2 3 4 5 6 7\n```\n\nThe config in [gmflow_imagenet_k8_8gpus.py](configs/gmflow_imagenet_k8_8gpus.py) specifies a training batch size of 512 images per GPU and an inference batch size of 125 images per GPU. Training requires 32GB of VRAM per GPU, and the validation step requires an additional 8GB of VRAM per GPU. If you are using 32GB GPUs, you can disable the validation step by adding the `--no-validate` flag to the training command. Alternatively, you can also edit the config file to adjust the batch sizes.\n\nBy default, checkpoints will be saved into [checkpoints/](checkpoints/), logs will be saved into [work_dirs/](work_dirs/), and sampled images will be saved into [viz/](viz/).\n\n#### Resuming Training\n\nIf existing checkpoints are found, the training will automatically resume from the latest checkpoint.\n\n#### Tensorboard\n\nThe logs can be plotted using Tensorboard. Run the following command to start Tensorboard:\n\n```bash\ntensorboard --logdir work_dirs/\n```\n\n### Evaluation\n\nAfter training, to conduct a complete evaluation of the model under varying guidance scales, run the following command to start DDP evaluation on 1 node with 8 GPUs:\n\n```bash\ntorchrun --nnodes=1 --nproc_per_node=8 tools/test.py configs/gmflow_imagenet_k8_test.py checkpoints/gmflow_imagenet_k8_8gpus/latest.pth --launcher pytorch --diff_seed\n```\n\nAlternatively, you can start single-node DDP evaluation from a Python script:\n```bash\npython test.py configs/gmflow_imagenet_k8_test.py checkpoints/gmflow_imagenet_k8_8gpus/latest.pth --gpu-ids 0 1 2 3 4 5 6 7\n```\n\nThe config in [gmflow_imagenet_k8_test.py](configs/gmflow_imagenet_k8_test.py) specifies an inference batch size of 125 images per GPU, which requires 35GB of VRAM per GPU. You can edit the config file to adjust the batch size.\n\nThe evaluation results will be saved to where the checkpoint is located, and the sampled images will be saved into [viz/](viz/).\n\n## Toy Model on 2D Checkerboard\n\nWe provide a minimal GMFlow trainer in [train_toymodel.py](train_toymodel.py) for the toy model on the 2D checkerboard dataset. Run the following command to train the model:\n\n```bash\npython train_toymodel.py -k 64\n```\n\nThis minimal trainer does not support transition loss and EMA. To reproduce the results in the paper, you can use the following command to start the full trainer:\n\n```bash\npython train.py configs/gmflow_checkerboard_k64.py --gpu-ids 0\n```\n\nThis full trainer is not optimized for the simple 2D checkerboard dataset, so GPU usage may be inefficient.\n\n## Essential Code\n\n- Training\n    - [train_toymodel.py](train_toymodel.py): A simplified training script for the 2D checkerboard experiment.\n    - [gmflow.py](lib/models/diffusions/gmflow.py): The `forward_train` method contains the full training loop.\n- Inference\n    - [gmdit_pipeline.py](lib/pipelines/gmdit_pipeline.py): Full sampling code in the style of Diffusers.\n    - [gmflow.py](lib/models/diffusions/gmflow.py): The `forward_test` method contains the same full sampling loop.\n- Network\n    - [gmflow.py](lib/models/architecture/gmflow.py): GMDiT and SpectrumMLP\n    - [toymodels.py](lib/models/architecture/toymodels.py): MLP toy model for the 2D checkerboard experiment.\n- GM math operations\n    - [gmflow_ops](lib/ops/gmflow_ops/): A complete library of analytic operations for GM and Gaussian distributions.\n\n## Citation\n```\n@inproceedings{gmflow,\n  title={Gaussian Mixture Flow Matching Models},\n  author={Hansheng Chen and Kai Zhang and Hao Tan and Zexiang Xu and Fujun Luan and Leonidas Guibas and Gordon Wetzstein and Sai Bi},\n  booktitle={ICML},\n  year={2025},\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flakonik%2Fgmflow","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Flakonik%2Fgmflow","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flakonik%2Fgmflow/lists"}