{"id":13488348,"url":"https://github.com/mlpc-ucsd/TokenCompose","last_synced_at":"2025-03-28T00:33:39.564Z","repository":{"id":210593283,"uuid":"726957287","full_name":"mlpc-ucsd/TokenCompose","owner":"mlpc-ucsd","description":"(CVPR 2024) 🧩 TokenCompose: Text-to-Image Diffusion with Token-level Supervision","archived":false,"fork":false,"pushed_at":"2024-12-21T08:13:31.000Z","size":218835,"stargazers_count":115,"open_issues_count":0,"forks_count":3,"subscribers_count":3,"default_branch":"main","last_synced_at":"2024-12-21T09:19:47.753Z","etag":null,"topics":["artificial-intelligence","computer-vision","diffusion-models","generative-ai","image-generation","latent-diffusion","machine-learning","multimodal","stable-diffusion","text-to-image"],"latest_commit_sha":null,"homepage":"https://mlpc-ucsd.github.io/TokenCompose/","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/mlpc-ucsd.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-12-03T21:43:01.000Z","updated_at":"2024-12-07T09:22:21.000Z","dependencies_parsed_at":"2023-12-13T23:23:14.382Z","dependency_job_id":"f3f5d127-15af-41b2-bf2d-0cbc3ae1d44e","html_url":"https://github.com/mlpc-ucsd/TokenCompose","commit_stats":null,"previous_names":["mlpc-ucsd/tokencompose"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mlpc-ucsd%2FTokenCompose","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mlpc-ucsd%2FTokenCompose/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mlpc-ucsd%2FTokenCompose/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mlpc-ucsd%2FTokenCompose/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/mlpc-ucsd","download_url":"https://codeload.github.com/mlpc-ucsd/TokenCompose/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":245949274,"owners_count":20698912,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["artificial-intelligence","computer-vision","diffusion-models","generative-ai","image-generation","latent-diffusion","machine-learning","multimodal","stable-diffusion","text-to-image"],"created_at":"2024-07-31T18:01:14.182Z","updated_at":"2025-03-28T00:33:34.549Z","avatar_url":"https://github.com/mlpc-ucsd.png","language":"Jupyter Notebook","funding_links":[],"categories":["T2I Diffusion Model augmentation","Jupyter Notebook"],"sub_categories":[],"readme":"\u003cp align=\"center\"\u003e\n\n  \u003ch1 align=\"center\"\u003e\u003ca href=\"https://mlpc-ucsd.github.io/TokenCompose/\"\u003e🧩 TokenCompose\u003c/a\u003e: Text-to-Image Diffusion with Token-level Supervision\u003c/h1\u003e\n  \u003cp align=\"center\"\u003e\n    \u003ca href=\"https://zwcolin.github.io/\"\u003e\u003cstrong\u003eZirui Wang\u003c/strong\u003e\u003c/a\u003e\u003csup\u003e1, 3\u003c/sup\u003e\n    ·\n    \u003ca href=\"https://jamessand.github.io/\"\u003e\u003cstrong\u003eZhizhou Sha\u003c/strong\u003e\u003c/a\u003e\u003csup\u003e2, 3\u003c/sup\u003e\n    ·\n    \u003ca href=\"https://github.com/zh-ding\"\u003e\u003cstrong\u003eZheng Ding\u003c/strong\u003e\u003c/a\u003e\u003csup\u003e3\u003c/sup\u003e\n    ·\n    \u003ca href=\"https://github.com/modric197\"\u003e\u003cstrong\u003eYilin Wang\u003c/strong\u003e\u003c/a\u003e\u003csup\u003e2, 3\u003c/sup\u003e\n    ·\n    \u003ca href=\"https://pages.ucsd.edu/~ztu/\"\u003e\u003cstrong\u003eZhuowen Tu\u003c/strong\u003e\u003c/a\u003e\u003csup\u003e3\u003c/sup\u003e\n  \u003c/p\u003e\n  \u003cp align=\"center\"\u003e\n    \u003csup\u003e1\u003c/sup\u003e\u003cstrong\u003ePrinceton University\u003c/strong\u003e\n    ·\n    \u003csup\u003e2\u003c/sup\u003e\u003cstrong\u003eTsinghua University\u003c/strong\u003e\n    ·\n    \u003csup\u003e3\u003c/sup\u003e\u003cstrong\u003eUniversity of California, San Diego\u003c/strong\u003e\n  \u003c/p\u003e\n  \n  \u003cp align=\"center\" style=\"font-size: 70%;\"\u003e\n    \u003cstrong\u003e\u003ci style=\"color:red;\"\u003eCVPR 2024\u003c/i\u003e\u003c/strong\u003e\n  \u003c/p\u003e\n  \n  \u003cp align=\"center\" style=\"font-size: 70%;\"\u003e\n    \u003c!-- \u003cstrong\u003e\n      \u003ci\u003eProject done while Zirui Wang, Zhizhou Sha and Yilin Wang interned at UC San Diego.\u003c/i\u003e\n    \u003c/strong\u003e --\u003e\n    \u003ci\u003eProject done while Zirui Wang, Zhizhou Sha and Yilin Wang interned at UC San Diego.\u003c/i\u003e\n  \u003c/p\u003e\n\n\u003c/p\u003e\n\n\u003ch3 align=\"center\"\u003e\n  \u003ca href=\"https://mlpc-ucsd.github.io/TokenCompose/\"\u003e\u003cstrong\u003eProject Page\u003c/strong\u003e\u003c/a\u003e\n  |\n  \u003ca href=\"https://arxiv.org/abs/2312.03626\"\u003e\u003cstrong\u003earXiv\u003c/strong\u003e\u003c/a\u003e\n  |\n  \u003ca href=\"https://x.com/zwcolin/status/1732578746949837205?s=46\u0026t=_jLYQtkGRBhT0cOPjbEiiQ\"\u003e\u003cstrong\u003eX (Twitter)\u003c/strong\u003e\u003c/a\u003e\n\u003c/h3\u003e\n\n### Updates\n*If you use our method and/or model for your research project, we are happy to provide cross-reference here in the updates.* :)\n\n[04/04/2024] 🔥 Our training methodology is incorporated into [CoMat](https://arxiv.org/abs/2404.03653) which shows enhanced text-to-image attribute assignments.  \n[02/26/2024] 🔥 TokenCompose is accepted to CVPR 2024!  \n[02/20/2024] 🔥 TokenCompose is used as a base model from the [RealCompo](https://arxiv.org/abs/2402.12908) paper for enhanced compositionality.  \n\nhttps://github.com/mlpc-ucsd/TokenCompose/assets/59942464/93feea16-4eac-49c3-b286-ee390a325b17\n\n\u003cp align=\"center\"\u003e\n  A \u003cspan style=\"color: lightblue\"\u003eStable Diffusion\u003c/span\u003e model finetuned with \u003cstrong\u003etoken-level consistency terms\u003c/strong\u003e for enhanced \u003cstrong\u003emulti-category instance composition\u003c/strong\u003e and \u003cstrong\u003ephotorealism\u003c/strong\u003e.\n\u003c/p\u003e\n\n\u003cbr\u003e\n\n\u003cdiv align=\"center\"\u003e\n  \u003cimg src=\"asset/teaser.jpg\" alt=\"Logo\" width=\"100%\"\u003e\n\u003c/div\u003e\n\n\n\n\u003ctable\u003e\n\n  \u003ctr\u003e\n    \u003cth rowspan=\"3\" align=\"center\"\u003eMethod\u003c/th\u003e\n    \u003cth colspan=\"9\" align=\"center\"\u003eMulti-category Instance Composition\u003c/th\u003e\n    \u003cth colspan=\"2\" align=\"center\"\u003ePhotorealism\u003c/th\u003e\n    \u003cth colspan=\"1\" align=\"center\"\u003eEfficiency\u003c/th\u003e\n  \u003c/tr\u003e\n\n  \u003ctr\u003e\n    \u003c!-- \u003cth align=\"center\"\u003e\u0026nbsp;\u003c/th\u003e --\u003e\n    \u003cth rowspan=\"2\" align=\"center\"\u003eObject Accuracy\u003c/th\u003e\n    \u003cth colspan=\"4\" align=\"center\"\u003eCOCO\u003c/th\u003e\n    \u003cth colspan=\"4\" align=\"center\"\u003eADE20K\u003c/th\u003e\n    \u003cth rowspan=\"2\" align=\"center\"\u003eFID (COCO)\u003c/th\u003e\n    \u003cth rowspan=\"2\" align=\"center\"\u003eFID (Flickr30K)\u003c/th\u003e\n    \u003cth rowspan=\"2\" align=\"center\"\u003eLatency\u003c/th\u003e\n  \u003c/tr\u003e\n\n  \u003ctr\u003e\n    \u003c!-- \u003cth align=\"center\"\u003e\u0026nbsp;\u003c/th\u003e --\u003e\n    \u003cth align=\"center\"\u003eMG2\u003c/th\u003e\n    \u003cth align=\"center\"\u003eMG3\u003c/th\u003e\n    \u003cth align=\"center\"\u003eMG4\u003c/th\u003e\n    \u003cth align=\"center\"\u003eMG5\u003c/th\u003e\n    \u003cth align=\"center\"\u003eMG2\u003c/th\u003e\n    \u003cth align=\"center\"\u003eMG3\u003c/th\u003e\n    \u003cth align=\"center\"\u003eMG4\u003c/th\u003e\n    \u003cth align=\"center\"\u003eMG5\u003c/th\u003e\n  \u003c/tr\u003e\n\n  \u003ctr\u003e\n    \u003ctd align=\"center\"\u003e\u003ca href=\"https://huggingface.co/CompVis/stable-diffusion-v1-4\"\u003eSD 1.4\u003c/a\u003e\u003c/td\u003e\n    \u003ctd align=\"center\"\u003e29.86\u003c/td\u003e\n    \u003ctd align=\"center\"\u003e90.72\u003csub\u003e1.33\u003c/sub\u003e\u003c/td\u003e\n    \u003ctd align=\"center\"\u003e50.74\u003csub\u003e0.89\u003c/sub\u003e\u003c/td\u003e\n    \u003ctd align=\"center\"\u003e11.68\u003csub\u003e0.45\u003c/sub\u003e\u003c/td\u003e\n    \u003ctd align=\"center\"\u003e0.88\u003csub\u003e0.21\u003c/sub\u003e\u003c/td\u003e\n    \u003ctd align=\"center\"\u003e89.81\u003csub\u003e0.40\u003c/sub\u003e\u003c/td\u003e\n    \u003ctd align=\"center\"\u003e53.96\u003csub\u003e1.14\u003c/sub\u003e\u003c/td\u003e\n    \u003ctd align=\"center\"\u003e16.52\u003csub\u003e1.13\u003c/sub\u003e\u003c/td\u003e\n    \u003ctd align=\"center\"\u003e1.89\u003csub\u003e0.34\u003c/sub\u003e\u003c/td\u003e\n    \u003ctd align=\"center\"\u003e\u003cu\u003e20.88\u003c/u\u003e\u003c/td\u003e\n    \u003ctd align=\"center\"\u003e\u003cu\u003e71.46\u003c/u\u003e\u003c/td\u003e\n    \u003ctd align=\"center\"\u003e\u003cb\u003e7.54\u003c/b\u003e\u003csub\u003e0.17\u003c/sub\u003e\u003c/td\u003e\n  \u003c/tr\u003e\n\n  \u003ctr\u003e\n    \u003ctd align=\"center\"\u003e\u003ca href=\"https://github.com/energy-based-model/Compositional-Visual-Generation-with-Composable-Diffusion-Models-PyTorch\"\u003eComposable\u003c/a\u003e\u003c/td\u003e\n    \u003ctd align=\"center\"\u003e27.83\u003c/td\u003e\n    \u003ctd align=\"center\"\u003e63.33\u003csub\u003e0.59\u003c/sub\u003e\u003c/td\u003e\n    \u003ctd align=\"center\"\u003e21.87\u003csub\u003e1.01\u003c/sub\u003e\u003c/td\u003e\n    \u003ctd align=\"center\"\u003e3.25\u003csub\u003e0.45\u003c/sub\u003e\u003c/td\u003e\n    \u003ctd align=\"center\"\u003e0.23\u003csub\u003e0.18\u003c/sub\u003e\u003c/td\u003e\n    \u003ctd align=\"center\"\u003e69.61\u003csub\u003e0.99\u003c/sub\u003e\u003c/td\u003e\n    \u003ctd align=\"center\"\u003e29.96\u003csub\u003e0.84\u003c/sub\u003e\u003c/td\u003e\n    \u003ctd align=\"center\"\u003e6.89\u003csub\u003e0.38\u003c/sub\u003e\u003c/td\u003e\n    \u003ctd align=\"center\"\u003e0.73\u003csub\u003e0.22\u003c/sub\u003e\u003c/td\u003e\n    \u003ctd align=\"center\"\u003e-\u003c/td\u003e\n    \u003ctd align=\"center\"\u003e75.57\u003c/td\u003e\n    \u003ctd align=\"center\"\u003e13.81\u003csub\u003e0.15\u003c/sub\u003e\u003c/td\u003e\n  \u003c/tr\u003e\n\n  \u003ctr\u003e\n    \u003ctd align=\"center\"\u003e\u003ca href=\"https://github.com/silent-chen/layout-guidance\"\u003eLayout\u003c/a\u003e\u003c/td\u003e\n    \u003ctd align=\"center\"\u003e43.59\u003c/td\u003e\n    \u003ctd align=\"center\"\u003e93.22\u003csub\u003e0.69\u003c/sub\u003e\u003c/td\u003e\n    \u003ctd align=\"center\"\u003e60.15\u003csub\u003e1.58\u003c/sub\u003e\u003c/td\u003e\n    \u003ctd align=\"center\"\u003e19.49\u003csub\u003e0.88\u003c/sub\u003e\u003c/td\u003e\n    \u003ctd align=\"center\"\u003e2.27\u003csub\u003e0.44\u003c/sub\u003e\u003c/td\u003e\n    \u003ctd align=\"center\"\u003e\u003cu\u003e96.05\u003c/u\u003e\u003csub\u003e0.34\u003c/sub\u003e\u003c/td\u003e\n    \u003ctd align=\"center\"\u003e\u003cu\u003e67.83\u003c/u\u003e\u003csub\u003e0.90\u003c/sub\u003e\u003c/td\u003e\n    \u003ctd align=\"center\"\u003e21.93\u003csub\u003e1.34\u003c/sub\u003e\u003c/td\u003e\n    \u003ctd align=\"center\"\u003e2.35\u003csub\u003e0.41\u003c/sub\u003e\u003c/td\u003e\n    \u003ctd align=\"center\"\u003e-\u003c/td\u003e\n    \u003ctd align=\"center\"\u003e74.00\u003c/td\u003e\n    \u003ctd align=\"center\"\u003e18.89\u003csub\u003e0.20\u003c/sub\u003e\u003c/td\u003e\n  \u003c/tr\u003e\n\n  \u003ctr\u003e\n    \u003ctd align=\"center\"\u003e\u003ca href=\"https://github.com/weixi-feng/Structured-Diffusion-Guidance\"\u003eStructured\u003c/a\u003e\u003c/td\u003e\n    \u003ctd align=\"center\"\u003e29.64\u003c/td\u003e\n    \u003ctd align=\"center\"\u003e90.40\u003csub\u003e1.06\u003c/sub\u003e\u003c/td\u003e\n    \u003ctd align=\"center\"\u003e48.64\u003csub\u003e1.32\u003c/sub\u003e\u003c/td\u003e\n    \u003ctd align=\"center\"\u003e10.71\u003csub\u003e0.92\u003c/sub\u003e\u003c/td\u003e\n    \u003ctd align=\"center\"\u003e0.68\u003csub\u003e0.25\u003c/sub\u003e\u003c/td\u003e\n    \u003ctd align=\"center\"\u003e89.25\u003csub\u003e0.72\u003c/sub\u003e\u003c/td\u003e\n    \u003ctd align=\"center\"\u003e53.05\u003csub\u003e1.20\u003c/sub\u003e\u003c/td\u003e\n    \u003ctd align=\"center\"\u003e15.76\u003csub\u003e0.86\u003c/sub\u003e\u003c/td\u003e\n    \u003ctd align=\"center\"\u003e1.74\u003csub\u003e0.49\u003c/sub\u003e\u003c/td\u003e\n    \u003ctd align=\"center\"\u003e21.13\u003c/td\u003e\n    \u003ctd align=\"center\"\u003e71.68\u003c/td\u003e\n    \u003ctd align=\"center\"\u003e\u003cu\u003e7.74\u003c/u\u003e\u003csub\u003e0.17\u003c/sub\u003e\u003c/td\u003e\n  \u003c/tr\u003e\n\n  \u003ctr\u003e\n    \u003ctd align=\"center\"\u003e\u003ca href=\"https://github.com/yuval-alaluf/Attend-and-Excite\"\u003eAttn-Exct\u003c/a\u003e\u003c/td\u003e\n    \u003ctd align=\"center\"\u003e\u003cu\u003e45.13\u003c/u\u003e\u003c/td\u003e\n    \u003ctd align=\"center\"\u003e\u003cu\u003e93.64\u003c/u\u003e\u003csub\u003e0.76\u003c/sub\u003e\u003c/td\u003e\n    \u003ctd align=\"center\"\u003e\u003cu\u003e65.10\u003c/u\u003e\u003csub\u003e1.24\u003c/sub\u003e\u003c/td\u003e\n    \u003ctd align=\"center\"\u003e\u003cu\u003e28.01\u003c/u\u003e\u003csub\u003e0.90\u003c/sub\u003e\u003c/td\u003e\n    \u003ctd align=\"center\"\u003e\u003cb\u003e6.01\u003c/b\u003e\u003csub\u003e0.61\u003c/sub\u003e\u003c/td\u003e\n    \u003ctd align=\"center\"\u003e91.74\u003csub\u003e0.49\u003c/sub\u003e\u003c/td\u003e\n    \u003ctd align=\"center\"\u003e62.51\u003csub\u003e0.94\u003c/sub\u003e\u003c/td\u003e\n    \u003ctd align=\"center\"\u003e\u003cu\u003e26.12\u003c/u\u003e\u003csub\u003e0.78\u003c/sub\u003e\u003c/td\u003e\n    \u003ctd align=\"center\"\u003e\u003cu\u003e5.89\u003c/u\u003e\u003csub\u003e0.40\u003c/sub\u003e\u003c/td\u003e\n    \u003ctd align=\"center\"\u003e-\u003c/td\u003e\n    \u003ctd align=\"center\"\u003e71.68\u003c/td\u003e\n    \u003ctd align=\"center\"\u003e25.43\u003csub\u003e4.89\u003c/sub\u003e\u003c/td\u003e\n  \u003c/tr\u003e\n\n  \u003ctr\u003e\n    \u003ctd align=\"center\"\u003e\u003ca href=\"https://github.com/mlpc-ucsd/TokenCompose\"\u003e\u003cstrong\u003eTokenCompose (Ours)\u003c/strong\u003e\u003c/a\u003e\u003c/td\u003e\n    \u003ctd align=\"center\"\u003e\u003cb\u003e52.15\u003c/b\u003e\u003c/td\u003e\n    \u003ctd align=\"center\"\u003e\u003cb\u003e98.08\u003c/b\u003e\u003csub\u003e0.40\u003c/sub\u003e\u003c/td\u003e\n    \u003ctd align=\"center\"\u003e\u003cb\u003e76.16\u003c/b\u003e\u003csub\u003e1.04\u003c/sub\u003e\u003c/td\u003e\n    \u003ctd align=\"center\"\u003e\u003cb\u003e28.81\u003c/b\u003e\u003csub\u003e0.95\u003c/sub\u003e\u003c/td\u003e\n    \u003ctd align=\"center\"\u003e\u003cu\u003e3.28\u003c/u\u003e\u003csub\u003e0.48\u003c/sub\u003e\u003c/td\u003e\n    \u003ctd align=\"center\"\u003e\u003cb\u003e97.75\u003c/b\u003e\u003csub\u003e0.34\u003c/sub\u003e\u003c/td\u003e\n    \u003ctd align=\"center\"\u003e\u003cb\u003e76.93\u003c/b\u003e\u003csub\u003e1.09\u003c/sub\u003e\u003c/td\u003e\n    \u003ctd align=\"center\"\u003e\u003cb\u003e33.92\u003c/b\u003e\u003csub\u003e1.47\u003c/sub\u003e\u003c/td\u003e\n    \u003ctd align=\"center\"\u003e\u003cb\u003e6.21\u003c/b\u003e\u003csub\u003e0.62\u003c/sub\u003e\u003c/td\u003e\n    \u003ctd align=\"center\"\u003e\u003cb\u003e20.19\u003c/b\u003e\u003c/td\u003e\n    \u003ctd align=\"center\"\u003e\u003cb\u003e71.13\u003c/b\u003e\u003c/td\u003e\n    \u003ctd align=\"center\"\u003e\u003cb\u003e7.56\u003c/b\u003e\u003csub\u003e0.14\u003c/sub\u003e\u003c/td\u003e\n  \u003c/tr\u003e\n\n\u003c/table\u003e\n\n\n\n## 🆕 Models\n\n| Stable Diffusion Version | Checkpoint 1 | Checkpoint 2 |\n|:------------------------:|:------------:|:------------:|\n| v1.4                     | [TokenCompose_SD14_A](https://huggingface.co/mlpc-lab/TokenCompose_SD14_A)         | [TokenCompose_SD14_B](https://huggingface.co/mlpc-lab/TokenCompose_SD14_B)         |\n| v2.1                     | [TokenCompose_SD21_A](https://huggingface.co/mlpc-lab/TokenCompose_SD21_A)         | [TokenCompose_SD21_B](https://huggingface.co/mlpc-lab/TokenCompose_SD21_B)         |\n\nOur finetuned models do not contain any extra modules and can be directly used in a standard diffusion model library (e.g., HuggingFace's Diffusers) by replacing the pretrained U-Net with our finetuned U-Net in a plug-and-play manner. We provide a [demo jupyter notebook](notebooks/example_usage.ipynb) which uses our model checkpoint to generate images. \n\nYou can also use the following code to download our checkpoints and generate images:\n\n```python\nimport torch\nfrom diffusers import StableDiffusionPipeline\n\nmodel_id = \"mlpc-lab/TokenCompose_SD14_A\"\ndevice = \"cuda\"\n\npipe = StableDiffusionPipeline.from_pretrained(model_id, torch_dtype=torch.float32)\npipe = pipe.to(device)\n\nprompt = \"A cat and a wine glass\"\nimage = pipe(prompt).images[0]  \n    \nimage.save(\"cat_and_wine_glass.png\")\n```\n\n## 📊 MultiGen \n\nSee [MultiGen](multigen/readme.md) for details.\n\n\n\n\n\u003ctable\u003e\n  \u003ctr\u003e\n    \u003cth rowspan=\"2\" align=\"center\"\u003eMethod\u003c/th\u003e\n    \u003cth colspan=\"4\" align=\"center\"\u003eCOCO\u003c/th\u003e\n    \u003cth colspan=\"4\" align=\"center\"\u003eADE20K\u003c/th\u003e\n  \u003c/tr\u003e\n\n  \u003ctr\u003e\n    \u003c!-- \u003cth align=\"center\"\u003e\u0026nbsp;\u003c/th\u003e --\u003e\n    \u003cth align=\"center\"\u003eMG2\u003c/th\u003e\n    \u003cth align=\"center\"\u003eMG3\u003c/th\u003e\n    \u003cth align=\"center\"\u003eMG4\u003c/th\u003e\n    \u003cth align=\"center\"\u003eMG5\u003c/th\u003e\n    \u003cth align=\"center\"\u003eMG2\u003c/th\u003e\n    \u003cth align=\"center\"\u003eMG3\u003c/th\u003e\n    \u003cth align=\"center\"\u003eMG4\u003c/th\u003e\n    \u003cth align=\"center\"\u003eMG5\u003c/th\u003e\n  \u003c/tr\u003e\n\n  \u003ctr\u003e\n    \u003ctd align=\"center\"\u003e\u003ca href=\"https://huggingface.co/CompVis/stable-diffusion-v1-4\"\u003eSD 1.4\u003c/a\u003e\u003c/td\u003e\n    \u003ctd align=\"center\"\u003e90.72\u003csub\u003e1.33\u003c/sub\u003e\u003c/td\u003e\n    \u003ctd align=\"center\"\u003e50.74\u003csub\u003e0.89\u003c/sub\u003e\u003c/td\u003e\n    \u003ctd align=\"center\"\u003e11.68\u003csub\u003e0.45\u003c/sub\u003e\u003c/td\u003e\n    \u003ctd align=\"center\"\u003e0.88\u003csub\u003e0.21\u003c/sub\u003e\u003c/td\u003e\n    \u003ctd align=\"center\"\u003e89.81\u003csub\u003e0.40\u003c/sub\u003e\u003c/td\u003e\n    \u003ctd align=\"center\"\u003e53.96\u003csub\u003e1.14\u003c/sub\u003e\u003c/td\u003e\n    \u003ctd align=\"center\"\u003e16.52\u003csub\u003e1.13\u003c/sub\u003e\u003c/td\u003e\n    \u003ctd align=\"center\"\u003e1.89\u003csub\u003e0.34\u003c/sub\u003e\u003c/td\u003e\n  \u003c/tr\u003e\n\n  \u003ctr\u003e\n    \u003ctd align=\"center\"\u003e\u003ca href=\"https://github.com/energy-based-model/Compositional-Visual-Generation-with-Composable-Diffusion-Models-PyTorch\"\u003eComposable\u003c/a\u003e\u003c/td\u003e\n    \u003ctd align=\"center\"\u003e63.33\u003csub\u003e0.59\u003c/sub\u003e\u003c/td\u003e\n    \u003ctd align=\"center\"\u003e21.87\u003csub\u003e1.01\u003c/sub\u003e\u003c/td\u003e\n    \u003ctd align=\"center\"\u003e3.25\u003csub\u003e0.45\u003c/sub\u003e\u003c/td\u003e\n    \u003ctd align=\"center\"\u003e0.23\u003csub\u003e0.18\u003c/sub\u003e\u003c/td\u003e\n    \u003ctd align=\"center\"\u003e69.61\u003csub\u003e0.99\u003c/sub\u003e\u003c/td\u003e\n    \u003ctd align=\"center\"\u003e29.96\u003csub\u003e0.84\u003c/sub\u003e\u003c/td\u003e\n    \u003ctd align=\"center\"\u003e6.89\u003csub\u003e0.38\u003c/sub\u003e\u003c/td\u003e\n    \u003ctd align=\"center\"\u003e0.73\u003csub\u003e0.22\u003c/sub\u003e\u003c/td\u003e\n  \u003c/tr\u003e\n\n  \u003ctr\u003e\n    \u003ctd align=\"center\"\u003e\u003ca href=\"https://github.com/silent-chen/layout-guidance\"\u003eLayout\u003c/a\u003e\u003c/td\u003e\n    \u003ctd align=\"center\"\u003e93.22\u003csub\u003e0.69\u003c/sub\u003e\u003c/td\u003e\n    \u003ctd align=\"center\"\u003e60.15\u003csub\u003e1.58\u003c/sub\u003e\u003c/td\u003e\n    \u003ctd align=\"center\"\u003e19.49\u003csub\u003e0.88\u003c/sub\u003e\u003c/td\u003e\n    \u003ctd align=\"center\"\u003e2.27\u003csub\u003e0.44\u003c/sub\u003e\u003c/td\u003e\n    \u003ctd align=\"center\"\u003e\u003cu\u003e96.05\u003c/u\u003e\u003csub\u003e0.34\u003c/sub\u003e\u003c/td\u003e\n    \u003ctd align=\"center\"\u003e\u003cu\u003e67.83\u003c/u\u003e\u003csub\u003e0.90\u003c/sub\u003e\u003c/td\u003e\n    \u003ctd align=\"center\"\u003e21.93\u003csub\u003e1.34\u003c/sub\u003e\u003c/td\u003e\n    \u003ctd align=\"center\"\u003e2.35\u003csub\u003e0.41\u003c/sub\u003e\u003c/td\u003e\n  \u003c/tr\u003e\n\n  \u003ctr\u003e\n    \u003ctd align=\"center\"\u003e\u003ca href=\"https://github.com/weixi-feng/Structured-Diffusion-Guidance\"\u003eStructured\u003c/a\u003e\u003c/td\u003e\n    \u003ctd align=\"center\"\u003e90.40\u003csub\u003e1.06\u003c/sub\u003e\u003c/td\u003e\n    \u003ctd align=\"center\"\u003e48.64\u003csub\u003e1.32\u003c/sub\u003e\u003c/td\u003e\n    \u003ctd align=\"center\"\u003e10.71\u003csub\u003e0.92\u003c/sub\u003e\u003c/td\u003e\n    \u003ctd align=\"center\"\u003e0.68\u003csub\u003e0.25\u003c/sub\u003e\u003c/td\u003e\n    \u003ctd align=\"center\"\u003e89.25\u003csub\u003e0.72\u003c/sub\u003e\u003c/td\u003e\n    \u003ctd align=\"center\"\u003e53.05\u003csub\u003e1.20\u003c/sub\u003e\u003c/td\u003e\n    \u003ctd align=\"center\"\u003e15.76\u003csub\u003e0.86\u003c/sub\u003e\u003c/td\u003e\n    \u003ctd align=\"center\"\u003e1.74\u003csub\u003e0.49\u003c/sub\u003e\u003c/td\u003e\n  \u003c/tr\u003e\n\n  \u003ctr\u003e\n    \u003ctd align=\"center\"\u003e\u003ca href=\"https://github.com/yuval-alaluf/Attend-and-Excite\"\u003eAttn-Exct\u003c/a\u003e\u003c/td\u003e\n    \u003ctd align=\"center\"\u003e\u003cu\u003e93.64\u003c/u\u003e\u003csub\u003e0.76\u003c/sub\u003e\u003c/td\u003e\n    \u003ctd align=\"center\"\u003e\u003cu\u003e65.10\u003c/u\u003e\u003csub\u003e1.24\u003c/sub\u003e\u003c/td\u003e\n    \u003ctd align=\"center\"\u003e\u003cu\u003e28.01\u003c/u\u003e\u003csub\u003e0.90\u003c/sub\u003e\u003c/td\u003e\n    \u003ctd align=\"center\"\u003e\u003cb\u003e6.01\u003c/b\u003e\u003csub\u003e0.61\u003c/sub\u003e\u003c/td\u003e\n    \u003ctd align=\"center\"\u003e91.74\u003csub\u003e0.49\u003c/sub\u003e\u003c/td\u003e\n    \u003ctd align=\"center\"\u003e62.51\u003csub\u003e0.94\u003c/sub\u003e\u003c/td\u003e\n    \u003ctd align=\"center\"\u003e\u003cu\u003e26.12\u003c/u\u003e\u003csub\u003e0.78\u003c/sub\u003e\u003c/td\u003e\n    \u003ctd align=\"center\"\u003e\u003cu\u003e5.89\u003c/u\u003e\u003csub\u003e0.40\u003c/sub\u003e\u003c/td\u003e\n  \u003c/tr\u003e\n\n  \u003ctr\u003e\n    \u003ctd align=\"center\"\u003e\u003ca href=\"https://github.com/mlpc-ucsd/TokenCompose\"\u003eOurs\u003c/a\u003e\u003c/td\u003e\n    \u003ctd align=\"center\"\u003e\u003cb\u003e98.08\u003c/b\u003e\u003csub\u003e0.40\u003c/sub\u003e\u003c/td\u003e\n    \u003ctd align=\"center\"\u003e\u003cb\u003e76.16\u003c/b\u003e\u003csub\u003e1.04\u003c/sub\u003e\u003c/td\u003e\n    \u003ctd align=\"center\"\u003e\u003cb\u003e28.81\u003c/b\u003e\u003csub\u003e0.95\u003c/sub\u003e\u003c/td\u003e\n    \u003ctd align=\"center\"\u003e\u003cu\u003e3.28\u003c/u\u003e\u003csub\u003e0.48\u003c/sub\u003e\u003c/td\u003e\n    \u003ctd align=\"center\"\u003e\u003cb\u003e97.75\u003c/b\u003e\u003csub\u003e0.34\u003c/sub\u003e\u003c/td\u003e\n    \u003ctd align=\"center\"\u003e\u003cb\u003e76.93\u003c/b\u003e\u003csub\u003e1.09\u003c/sub\u003e\u003c/td\u003e\n    \u003ctd align=\"center\"\u003e\u003cb\u003e33.92\u003c/b\u003e\u003csub\u003e1.47\u003c/sub\u003e\u003c/td\u003e\n    \u003ctd align=\"center\"\u003e\u003cb\u003e6.21\u003c/b\u003e\u003csub\u003e0.62\u003c/sub\u003e\u003c/td\u003e\n  \u003c/tr\u003e\n\n\u003c/table\u003e\n\n## 💻 Environment Setup\n\nFor those who want to use our codebase to **train your own diffusion models with token-level objectives**, follow the below instructions:\n\n```bash\nconda create -n TokenCompose python=3.8.5\nconda activate TokenCompose\nconda install pytorch==1.13.1 torchvision==0.14.1 torchaudio==0.13.1 pytorch-cuda=11.7 -c pytorch -c nvidia\npip install -r requirements.txt\n```\n\nWe have verified the environment setup using this specific package versions, but we expect that it will also work for newer versions too!\n\n## 🛠️ Dataset Setup\n\nIf you want to use your own data, please refer to [preprocess_data](preprocess_data/readme.md) for details.\n\nIf you want to use our training data as examples or for research purposes, please follow the below instructions:\n\n### 1. Setup the COCO Image Data\n\n```bash\ncd train/data\n# download COCO train2017\nwget http://images.cocodataset.org/zips/train2017.zip\nunzip train2017.zip\nrm train2017.zip\nbash coco_data_setup.sh\n```\n\nAfter this step, you should have the following structure under the `train/data`  directory:\n\n```\ntrain/data/\n    coco_gsam_img/\n        train/\n            000000000142.jpg\n            000000000370.jpg\n            ...\n```\n\n\n### 2. Setup Token-wise Grounded Segmentation Maps\n\nDownload COCO segmentation data from [Google Drive](https://drive.google.com/file/d/16uoQpfZ0O-NW92HuaCaFU8K4cGHHbv4R/view?usp=drive_link) and put it under `train/data` directory.\n\nAfter this step, you should have the following structure under the `train/data` directory:\n\n```\ntrain/data/\n    coco_gsam_img/\n        train/\n            000000000142.jpg\n            000000000370.jpg\n            ...\n    coco_gsam_seg.tar\n```\n\nThen, run the following command to unzip the segmentation data:\n\n```bash\ncd train/data\ntar -xvf coco_gsam_seg.tar\nrm coco_gsam_seg.tar\n```\n\nAfter the setup, you should have the following structure under the `train/data` directory:\n\n```\ntrain/data/\n    coco_gsam_img/\n        train/\n            000000000142.jpg\n            000000000370.jpg\n            ...\n    coco_gsam_seg/\n        000000000142/\n            mask_000000000142_bananas.png\n            mask_000000000142_bread.png\n            ...\n        000000000370/\n            mask_000000000370_bananas.png\n            mask_000000000370_bread.png\n            ...\n        ...\n```\n\n## 📈 Training \nWe use wandb to log some curves and visualizations. Login to wandb before running the scripts.\n```bash\nwandb login\n```\nThen, to run TokenCompose, use the following command:\n\n```bash\ncd train\nbash scripts/train.sh\n```\n\nThe results will be saved under `train/results` directory.\n\n## 🏷️ License\n\nThis repository is released under the [Apache 2.0](LICENSE) license. \n\n## 🙏 Acknowledgement\n\nOur code is built upon [diffusers](https://github.com/huggingface/diffusers), [prompt-to-prompt](https://github.com/google/prompt-to-prompt), [VISOR](https://github.com/microsoft/VISOR), [Grounded-Segment-Anything](https://github.com/IDEA-Research/Grounded-Segment-Anything), and [CLIP](https://github.com/openai/CLIP). We thank all these authors for their nicely open sourced code and their great contributions to the community.\n\n## 📝 Citation\n\nIf you find our work useful, please consider citing:\n```bibtex\n@InProceedings{Wang2024TokenCompose,\n    author    = {Wang, Zirui and Sha, Zhizhou and Ding, Zheng and Wang, Yilin and Tu, Zhuowen},\n    title     = {TokenCompose: Text-to-Image Diffusion with Token-level Supervision},\n    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},\n    month     = {June},\n    year      = {2024},\n    pages     = {8553-8564}\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmlpc-ucsd%2FTokenCompose","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmlpc-ucsd%2FTokenCompose","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmlpc-ucsd%2FTokenCompose/lists"}