{"id":13488310,"url":"https://github.com/mapo-t2i/mapo","last_synced_at":"2025-03-28T00:33:34.027Z","repository":{"id":243748616,"uuid":"804676459","full_name":"mapo-t2i/mapo","owner":"mapo-t2i","description":"Official codebase for Margin-aware Preference Optimization for Aligning Diffusion Models without Reference (MaPO).","archived":false,"fork":false,"pushed_at":"2024-06-11T15:34:08.000Z","size":5637,"stargazers_count":59,"open_issues_count":1,"forks_count":7,"subscribers_count":2,"default_branch":"main","last_synced_at":"2024-10-31T00:39:34.037Z","etag":null,"topics":["alignment","diffusers","diffusion-models","human-preference","pytorch","text-to-image-generation"],"latest_commit_sha":null,"homepage":"https://mapo-t2i.github.io/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/mapo-t2i.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-05-23T03:48:14.000Z","updated_at":"2024-10-29T08:33:56.000Z","dependencies_parsed_at":"2024-06-15T13:21:04.764Z","dependency_job_id":"93956c08-3217-4ea0-b759-ad5cf066dcea","html_url":"https://github.com/mapo-t2i/mapo","commit_stats":null,"previous_names":["mapo-t2i/mapo"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mapo-t2i%2Fmapo","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mapo-t2i%2Fmapo/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mapo-t2i%2Fmapo/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mapo-t2i%2Fmapo/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/mapo-t2i","download_url":"https://codeload.github.com/mapo-t2i/mapo/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":245949256,"owners_count":20698911,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["alignment","diffusers","diffusion-models","human-preference","pytorch","text-to-image-generation"],"created_at":"2024-07-31T18:01:13.548Z","updated_at":"2025-03-28T00:33:28.995Z","avatar_url":"https://github.com/mapo-t2i.png","language":"Python","readme":"# Margin-aware Preference Optimization for Aligning Diffusion Models without Reference (MaPO)\n\nThis repository provides the official PyTorch implementation for MaPO. \n\n\u003cdiv align=\"center\"\u003e\n\u003cimg src=\"assets/mapo_overview.png\" width=750/\u003e\n\u003c/div\u003e\u003cbr\u003e\n\n_By: Jiwoo Hong\u003csup\u003e\\*\u003c/sup\u003e, Sayak Paul\u003csup\u003e\\*\u003c/sup\u003e, Noah Lee, Kashif Rasul, James Thorne, Jongheon Jeong_\n\u003cbr\u003e_(\u003csmall\u003e\u003csup\u003e*\u003c/sup\u003e indicates equal contribution\u003c/small\u003e)_\n\nFor the paper, models, datasets, etc., please visit the [project website](https://mapo-t2i.github.io/).\n\n**Contents**:\n\n* [Running MaPO training](#running-mapo-training)\n* [Models and Datasets](#models-and-datasets) \n* [Inference](#inference)\n* [Citation](#citation)\n\n## Running MaPO training\n\n### Hardware requirements\n\nWe ran our experiments on a node of 8 H100s (80GB). But `train.py` can run on a single GPU having at least 40GB VRAM. \n\n### Environment\n\nCreate a Python virtual environment with your favorite package manager. \n\nAfter activating the environment, install PyTorch. We recommend following the [official website](https://pytorch.org/) for this. \n\nFinally, install the other requirements from `requirements.txt`. \n\n### Steps to run the code\n\nWe performed our experiments on the [`yuvalkirstain/pickapic_v2`](https://huggingface.co/datasets/yuvalkirstain/pickapic_v2) dataset which is 335 GB in size. However, here's another smaller version of the dataset that can be used for debugging -- [`kashif/pickascore`](https://huggingface.co/datasets/kashif/pickascore).\n\nWhen using `yuvalkirstain/pickapic_v2`, be sure to specify the `--dataset_split_name` CLI arg as `train`.\n\nBelow is an example training command for a single-GPU run:\n\n```bash\naccelerate launch train.py \\\n  --pretrained_model_name_or_path=stabilityai/stable-diffusion-xl-base-1.0  \\\n  --pretrained_vae_model_name_or_path=madebyollin/sdxl-vae-fp16-fix \\\n  --output_dir=\"mapo\" \\\n  --mixed_precision=\"fp16\" \\\n  --dataset_name=kashif/pickascore \\\n  --train_batch_size=8 \\\n  --gradient_accumulation_steps=2 \\\n  --gradient_checkpointing \\\n  --use_8bit_adam \\\n  --learning_rate=1e-5 \\\n  --lr_scheduler=\"constant\" \\\n  --lr_warmup_steps=0 \\\n  --max_train_steps=2000 \\\n  --checkpointing_steps=500 \\\n  --seed=\"0\" \n```\n\n\u003e [!NOTE]  \n\u003e In the above command, we use a smaller version of the original Pick-a-Pic dataset -- [`kashif/pickascore`](https://huggingface.co/datasets/kashif/pickascore) for debugging and validation purposes.\n\n### Running with LoRA\n\nWe provide a LoRA variant of the `train.py` script in `train_with_lora.py` so one can experiment with MaPO on consumer GPUs. To run `train_with_lora.py`, first, install the `peft` library. \n\nThen you can use the following command to start a LoRA training run:\n\n```bash\naccelerate launch train_with_lora.py \\\n  --pretrained_model_name_or_path=stabilityai/stable-diffusion-xl-base-1.0  \\\n  --pretrained_vae_model_name_or_path=madebyollin/sdxl-vae-fp16-fix \\\n  --output_dir=\"mapo\" \\\n  --mixed_precision=\"fp16\" \\\n  --dataset_name=kashif/pickascore \\\n  --train_batch_size=8 \\\n  --gradient_accumulation_steps=2 \\\n  --gradient_checkpointing \\\n  --lora_rank=8 \\\n  --use_8bit_adam \\\n  --learning_rate=1e-5 \\\n  --lr_scheduler=\"constant\" \\\n  --lr_warmup_steps=0 \\\n  --max_train_steps=2000 \\\n  --checkpointing_steps=500 \\\n  --seed=\"0\" \n```\n\n### Misc\n\n\u003cdetails\u003e\n\u003csummary\u003eTo run on multiple GPUs, specify the `--multi_gpu` option:\u003c/summary\u003e\n\n```bash\naccelerate launch --multi_gpu train.py \\\n  --pretrained_model_name_or_path=stabilityai/stable-diffusion-xl-base-1.0  \\\n  --pretrained_vae_model_name_or_path=madebyollin/sdxl-vae-fp16-fix \\\n  --output_dir=\"mapo\" \\\n  --mixed_precision=\"fp16\" \\\n  --dataset_name=kashif/pickascore \\\n  --train_batch_size=8 \\\n  --gradient_accumulation_steps=2 \\\n  --gradient_checkpointing \\\n  --use_8bit_adam \\\n  --learning_rate=1e-5 \\\n  --lr_scheduler=\"constant\" \\\n  --lr_warmup_steps=0 \\\n  --max_train_steps=2000 \\\n  --checkpointing_steps=500 \\\n  --seed=\"0\" \n```\n\u003c/details\u003e\u003cbr\u003e\n\n\u003cdetails\u003e\n\u003csummary\u003eTo run intermediate validation runs (i.e., generate samples for model assessment), add the following things:\u003c/summary\u003e\n\n```diff\n+  --run_validation --validation_steps=50 \\\n+  --report_to=\"wandb\"\n```\n\nThis will additionally, log the generated results and other metrics to Weights and Biases. This requires you to install the `wandb` Python package. \n\nAnother option for an experiment logger is `tensorboard`. \n\u003c/details\u003e\u003cbr\u003e\n\nTo push the intermediate checkpoints and the final checkpoint to the Hugging Face Hub platform, pass the `--push_to_hub` option. Just so you know, you need to be authenticated to use your Hugging Face Hub account for this. \n\n**Notes on evaluation**:\n\nFor evaluation with metrics like Aesthetic Scoring, HPS v2.1, and Pickscore, we followed the respective official codebases.\n\nFor visual quantitative results, please refer to the [project website](https://mapo-t2i.github.io/).\n\n## Models and Datasets\n\nAll the models and datasets of our work can be found via our Hugging Face Hub organization: https://huggingface.co/mapo-t2i/.\n\n## Inference\n\n```python\nfrom diffusers import DiffusionPipeline, AutoencoderKL, UNet2DConditionModel\nimport torch \n\nsdxl_id = \"stabilityai/stable-diffusion-xl-base-1.0\"\nvae_id = \"madebyollin/sdxl-vae-fp16-fix\"\nunet_id = \"mapo-t2i/mapo-beta\"\n\nvae = AutoencoderKL.from_pretrained(vae_id, torch_dtype=torch.float16)\nunet = UNet2DConditionModel.from_pretrained(unet_id, torch_dtype=torch.float16)\npipeline = DiffusionPipeline.from_pretrained(sdxl_id, vae=vae, unet=unet, torch_dtype=torch.float16).to(\"cuda\")\n\nprompt = \"A lion with eagle wings coming out of the sea , digital Art, Greg rutkowski, Trending artstation, cinematographic, hyperrealistic\"\nimage = pipeline(prompt=prompt, num_inference_steps=30).images[0]\n```\n\n## Citation\n\n```bibtex\n@misc{hong2024marginaware,\n    title={Margin-aware Preference Optimization for Aligning Diffusion Models without Reference}, \n    author={Jiwoo Hong and Sayak Paul and Noah Lee and Kashif Rasul and James Thorne and Jongheon Jeong},\n    year={2024},\n    eprint={2406.06424},\n    archivePrefix={arXiv},\n    primaryClass={cs.CV}\n}\n```\n","funding_links":[],"categories":["T2I Diffusion Model augmentation"],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmapo-t2i%2Fmapo","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmapo-t2i%2Fmapo","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmapo-t2i%2Fmapo/lists"}