{"id":13564998,"url":"https://github.com/fnzhan/Generative-AI","last_synced_at":"2025-04-03T22:30:34.918Z","repository":{"id":38354131,"uuid":"434836372","full_name":"fnzhan/Generative-AI","owner":"fnzhan","description":"[TPAMI 2023] Multimodal Image Synthesis and Editing: The Generative AI Era","archived":false,"fork":false,"pushed_at":"2023-11-21T09:02:58.000Z","size":127199,"stargazers_count":784,"open_issues_count":1,"forks_count":60,"subscribers_count":47,"default_branch":"main","last_synced_at":"2024-11-04T18:45:56.258Z","etag":null,"topics":["aigc","diffusion-model","gans","multimodality","nerfs"],"latest_commit_sha":null,"homepage":"","language":"TeX","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/fnzhan.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null}},"created_at":"2021-12-04T07:28:21.000Z","updated_at":"2024-10-23T12:21:43.000Z","dependencies_parsed_at":"2024-01-16T18:59:46.516Z","dependency_job_id":null,"html_url":"https://github.com/fnzhan/Generative-AI","commit_stats":null,"previous_names":["fnzhan/mise"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fnzhan%2FGenerative-AI","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fnzhan%2FGenerative-AI/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fnzhan%2FGenerative-AI/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fnzhan%2FGenerative-AI/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/fnzhan","download_url":"https://codeload.github.com/fnzhan/Generative-AI/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247089679,"owners_count":20881818,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["aigc","diffusion-model","gans","multimodality","nerfs"],"created_at":"2024-08-01T13:01:39.029Z","updated_at":"2025-04-03T22:30:29.905Z","avatar_url":"https://github.com/fnzhan.png","language":"TeX","readme":"\u003c!-- !# \u003cp align=center\u003e Multimodal Image Synthesis and Editing: \u003cbr\u003e A Survey and Taxonomy\u003c/p\u003e --\u003e\n\n\u003cimg src='title.png' align=\"center\"\u003e\n\u003cbr\u003e\n\n[![arXiv](https://img.shields.io/badge/arXiv-2107.05399-b31b1b.svg)](https://arxiv.org/abs/2112.13592)\n[![Survey](https://cdn.rawgit.com/sindresorhus/awesome/d7305f38d29fed78fa85652e3a63e154dd8e8829/media/badge.svg)](https://github.com/sindresorhus/awesome) \n[![Maintenance](https://img.shields.io/badge/Maintained%3F-yes-green.svg)](https://GitHub.com/Naereen/StrapDown.js/graphs/commit-activity) \n[![PR's Welcome](https://img.shields.io/badge/PRs-welcome-brightgreen.svg?style=flat)](http://makeapullrequest.com) \n[![GitHub license](https://badgen.net/github/license/Naereen/Strapdown.js)](https://github.com/Naereen/StrapDown.js/blob/master/LICENSE)\n\u003c!-- [![made-with-Markdown](https://img.shields.io/badge/Made%20with-Markdown-1f425f.svg)](http://commonmark.org) --\u003e\n\u003c!-- [![Documentation Status](https://readthedocs.org/projects/ansicolortags/badge/?version=latest)](http://ansicolortags.readthedocs.io/?badge=latest) --\u003e\n\n\u003cimg src='teaser.gif' align=\"center\"\u003e\n\n\nThis project is associated with our survey paper which comprehensively contextualizes the advance of Multimodal Image \nSynthesis \\\u0026 Editing (MISE) and visual AIGC by formulating taxonomies according to data modality and model architectures.\n\n\n\u003cimg src='logo.png' align=\"center\" width=20\u003e **Multimodal Image Synthesis and Editing: The Generative AI Era [[Paper](https://arxiv.org/abs/2112.13592)]  [[Project](https://fnzhan.com/Generative-AI/)]**  \u003cbr\u003e\n[Fangneng Zhan](https://fnzhan.com/), [Yingchen Yu](https://yingchen001.github.io/), [Rongliang Wu](https://scholar.google.com.sg/citations?user=SZkh3iAAAAAJ\u0026hl=en), [Jiahui Zhang](https://scholar.google.com/citations?user=DXpYbWkAAAAJ\u0026hl=zh-CN), [Shijian Lu](https://scholar.google.com.sg/citations?user=uYmK-A0AAAAJ\u0026hl=en), [Lingjie Liu](https://lingjie0206.github.io/), [Adam Kortylewsk](https://generativevision.mpi-inf.mpg.de/), \u003cbr\u003e [Christian Theobalt](https://people.mpi-inf.mpg.de/~theobalt/), [Eric Xing](http://www.cs.cmu.edu/~epxing/) \u003cbr\u003e\n*IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2023*\n\n\u003c!---[DeepAI](https://deepai.org/publication/multimodal-image-synthesis-and-editing-a-survey).**--\u003e\n\n\u003cbr\u003e\n\n[![PR's Welcome](https://img.shields.io/badge/PRs-welcome-brightgreen.svg?style=flat)](http://makeapullrequest.com) \nYou are welcome to promote papers via pull request. \u003cbr\u003e\nThe process to submit a pull request:\n- a. Fork the project into your own repository.\n- b. Add the Title, Author, Conference, Paper link, Project link, and Code link in `README.md` with below format:\n```\n**Title**\u003cbr\u003e\n*Author*\u003cbr\u003e\nConference\n[[Paper](Paper link)]\n[[Code](Project link)]\n[[Project](Code link)]\n```\n- c. Submit the pull request to this branch.\n\n\n\n\u003cbr\u003e\n\n\n\n## Related Surveys \u0026 Projects\n\n**Adversarial Text-to-Image Synthesis: A Review**\u003cbr\u003e\n*Stanislav Frolov, Tobias Hinz, Federico Raue, Jörn Hees, Andreas Dengel*\u003cbr\u003e\nNeural Networks 2021\n[[Paper](https://arxiv.org/abs/2101.09983)]\n\n**GAN Inversion: A Survey**\u003cbr\u003e\n*Weihao Xia, Yulun Zhang, Yujiu Yang, Jing-Hao Xue, Bolei Zhou, Ming-Hsuan Yang*\u003cbr\u003e\nTPAMI 2022 \n[[Paper](https://arxiv.org/abs/2101.05278)]\n[[Project](https://github.com/weihaox/awesome-gan-inversion)]\n\n**Deep Image Synthesis from Intuitive User Input: A Review and Perspectives**\u003cbr\u003e\n*Yuan Xue, Yuan-Chen Guo, Han Zhang, Tao Xu, Song-Hai Zhang, Xiaolei Huang*\u003cbr\u003e\nComputational Visual Media 2022\n[[Paper](https://arxiv.org/abs/2107.04240)]\n\n[Awesome-Text-to-Image](https://github.com/Yutong-Zhou-cv/awesome-Text-to-Image)\n\n\u003cbr\u003e\n\n## Table of Contents (Work in Progress)\n\n**Methods:**\n\u003c!-- ### Methods: --\u003e\n- [Neural Rendering Methods](#Neural-rendering-Methods)\n- [Diffusion-based Methods](#Diffusion-based-Methods)\n- [Autoregressive Methods](#Autoregressive-Methods) \n  - [Image Quantizer](#Image-Quantizer)\n- [GAN-based Methods](#GAN-based-Methods)\n  - [GAN-Inversion](#GAN-Inversion-Methods)\n- [Other Methods](#Other-Methods)\n\n**Modalities \u0026 Datasets:**\n- [Text Encoding](#Text-Encoding)\n- [Audio Encoding](#Audio-Encoding)\n- [Datasets](#Datasets)\n\n\n\n\n## Neural-Rendering-Methods\n\n**ATT3D: Amortized Text-to-3D Object Synthesis**\u003cbr\u003e\n*Jonathan Lorraine, Kevin Xie, Xiaohui Zeng, Chen-Hsuan Lin, Towaki Takikawa, Nicholas Sharp, Tsung-Yi Lin, Ming-Yu Liu, Sanja Fidler, James Lucas*\u003cbr\u003e\narxiv 2023\n[[Paper](https://arxiv.org/abs/2306.07349)]\n\n**TADA! Text to Animatable Digital Avatars**\u003cbr\u003e\n*Tingting Liao, Hongwei Yi, Yuliang Xiu, Jiaxaing Tang, Yangyi Huang, Justus Thies, Michael J. Black*\u003cbr\u003e\narxiv 2023\n[[Paper](https://arxiv.org/abs/2308.10899)]\n\n**MATLABER: Material-Aware Text-to-3D via LAtent BRDF auto-EncodeR**\u003cbr\u003e\n*Xudong Xu, Zhaoyang Lyu, Xingang Pan, Bo Dai*\u003cbr\u003e\narxiv 2023\n[[Paper](https://arxiv.org/abs/2308.09278)]\n\n**IT3D: Improved Text-to-3D Generation with Explicit View Synthesis**\u003cbr\u003e\n*Yiwen Chen, Chi Zhang, Xiaofeng Yang, Zhongang Cai, Gang Yu, Lei Yang, Guosheng Lin*\u003cbr\u003e\narxiv 2023\n[[Paper](https://arxiv.org/abs/2308.11473)]\n\n**AvatarVerse: High-quality \u0026 Stable 3D Avatar Creation from Text and Pose**\u003cbr\u003e\n*Huichao Zhang, Bowen Chen, Hao Yang, Liao Qu, Xu Wang, Li Chen, Chao Long, Feida Zhu, Kang Du, Min Zheng*\u003cbr\u003e\narxiv 2023\n[[Paper](https://arxiv.org/abs/2308.03610)]\n[[Project](https://avatarverse3d.github.io/)]\n\n**Instruct-NeRF2NeRF: Editing 3D Scenes with Instructions**\u003cbr\u003e\n*Ayaan Haque, Matthew Tancik, Alexei A. Efros, Aleksander Holynski, Angjoo Kanazawa*\u003cbr\u003e\nICCV 2023\n[[Paper](https://arxiv.org/abs/2303.12789)]\n[[Project](https://instruct-nerf2nerf.github.io)]\n[[Code](https://github.com/ayaanzhaque/instruct-nerf2nerf)]\n\n**FaceCLIPNeRF: Text-driven 3D Face Manipulation using Deformable Neural Radiance Fields**\u003cbr\u003e\n*Sungwon Hwang, Junha Hyung, Daejin Kim, Min-Jung Kim, Jaegul Choo*\u003cbr\u003e\nICCV 2023\n[[Paper](https://arxiv.org/abs/2307.11418v3)]\n\n**Local 3D Editing via 3D Distillation of CLIP Knowledge**\u003cbr\u003e\n*Junha Hyung, Sungwon Hwang, Daejin Kim, Hyunji Lee, Jaegul Choo*\u003cbr\u003e\nCVPR 2023\n[[Paper](https://arxiv.org/abs/2306.12570)]\n\n**RePaint-NeRF: NeRF Editting via Semantic Masks and Diffusion Models**\u003cbr\u003e\n*Xingchen Zhou, Ying He, F. Richard Yu, Jianqiang Li, You Li*\u003cbr\u003e\nIJCAI 2023\n[[Paper](https://arxiv.org/abs/2306.05668)]\n\n\n**DreamTime: An Improved Optimization Strategy for Text-to-3D Content Creation**\u003cbr\u003e\n*Yukun Huang, Jianan Wang, Yukai Shi, Xianbiao Qi, Zheng-Jun Zha, Lei Zhang*\u003cbr\u003e\narxiv 2023\n[[Paper](https://arxiv.org/abs/2306.12422)]\n[[Project](https://itsallagi.com/dreamtime-a-new-way-to-create-3d-content-from-text/)]\n\n**AvatarStudio: Text-driven Editing of 3D Dynamic Human Head Avatars**\u003cbr\u003e\n*Mohit Mendiratta, Xingang Pan, Mohamed Elgharib, Kartik Teotia, Mallikarjun B R, Ayush Tewari, Vladislav Golyanik, Adam Kortylewski, Christian Theobalt*\u003cbr\u003e\narxiv 2023\n[[Paper](https://arxiv.org/abs/2306.00547)]\n[[Project](https://vcai.mpi-inf.mpg.de/projects/AvatarStudio/)]\n\n\n**Blended-NeRF: Zero-Shot Object Generation and Blending in Existing Neural Radiance Fields**\u003cbr\u003e\n*Ori Gordon, Omri Avrahami, Dani Lischinski*\u003cbr\u003e\narxiv 2023\n[[Paper](https://arxiv.org/abs/2306.12760)]\n[[Project](https://www.vision.huji.ac.il/blended-nerf/)]\n\n\n**OR-NeRF: Object Removing from 3D Scenes Guided by Multiview Segmentation with Neural Radiance Fields**\u003cbr\u003e\n*Youtan Yin, Zhoujie Fu, Fan Yang, Guosheng Lin*\u003cbr\u003e\narxiv 2023\n[[Paper](https://arxiv.org/abs/2305.10503)]\n[[Project](https://ornerf.github.io/)]\n[[Code](https://github.com/cuteyyt/or-nerf)]\n\n\n**HiFA: High-fidelity Text-to-3D with Advanced Diffusion Guidance**\u003cbr\u003e\n*Junzhe Zhu, Peiye Zhuang*\u003cbr\u003e\narxiv 2023\n[[Paper](https://arxiv.org/abs/2305.18766)]\n[[Project](https://hifa-team.github.io/HiFA-site/)]\n\n**ProlificDreamer: High-Fidelity and Diverse Text-to-3D Generation with Variational Score Distillation**\u003cbr\u003e\n*Zhengyi Wang, Cheng Lu, Yikai Wang, Fan Bao, Chongxuan Li, Hang Su, Jun Zhu*\u003cbr\u003e\narxiv 2023\n[[Paper](https://arxiv.org/abs/2305.16213)]\n[[Project](https://ml.cs.tsinghua.edu.cn/prolificdreamer/)]\n\n**Text2NeRF: Text-Driven 3D Scene Generation with Neural Radiance Fields**\u003cbr\u003e\n*Jingbo Zhang, Xiaoyu Li, Ziyu Wan, Can Wang, Jing Liao*\u003cbr\u003e\narxiv 2023\n[[Paper](https://arxiv.org/abs/2305.11588)]\n[[Project](https://eckertzhang.github.io/Text2NeRF.github.io/)]\n\n**DreamAvatar: Text-and-Shape Guided 3D Human Avatar Generation via Diffusion Models**\u003cbr\u003e\n*Yukang Cao, Yan-Pei Cao, Kai Han, Ying Shan, Kwan-Yee K. Wong*\u003cbr\u003e\narxiv 2023\n[[Paper](https://arxiv.org/abs/2304.00916)]\n[[Project](https://yukangcao.github.io/DreamAvatar/)]\n\n**DITTO-NeRF: Diffusion-based Iterative Text To Omni-directional 3D Model**\u003cbr\u003e\n*Hoigi Seo, Hayeon Kim, Gwanghyun Kim, Se Young Chun*\u003cbr\u003e\narxiv 2023\n[[Paper](https://arxiv.org/abs/2304.02827)]\n[[Project](https://janeyeon.github.io/ditto-nerf/)]\n[[Code](https://github.com/janeyeon/ditto-nerf-code)]\n\n**CompoNeRF: Text-guided Multi-object Compositional NeRF with Editable 3D Scene Layout**\u003cbr\u003e\n*Yiqi Lin, Haotian Bai, Sijia Li, Haonan Lu, Xiaodong Lin, Hui Xiong, Lin Wang*\u003cbr\u003e\narxiv 2023\n[[Paper](https://arxiv.org/abs/2303.13843)]\n\n**Set-the-Scene: Global-Local Training for Generating Controllable NeRF Scenes**\u003cbr\u003e\n*Dana Cohen-Bar, Elad Richardson, Gal Metzer, Raja Giryes, Daniel Cohen-Or*\u003cbr\u003e\narxiv 2023\n[[Paper](https://arxiv.org/abs/2303.13450)]\n[[Project](https://danacohen95.github.io/Set-the-Scene/)]\n[[Code](https://github.com/DanaCohen95/Set-the-Scene)]\n\n\n\n**Let 2D Diffusion Model Know 3D-Consistency for Robust Text-to-3D Generation**\u003cbr\u003e\n*Junyoung Seo, Wooseok Jang, Min-Seop Kwak, Jaehoon Ko, Hyeonsu Kim, Junho Kim, Jin-Hwa Kim, Jiyoung Lee, Seungryong Kim*\u003cbr\u003e\narxiv 2023\n[[Paper](https://arxiv.org/abs/2303.07937)]\n[[Project](https://ku-cvlab.github.io/3DFuse/)]\n[[Code](https://github.com/KU-CVLAB/3DFuse)]\n\n**Text-To-4D Dynamic Scene Generation**\u003cbr\u003e\n*Uriel Singer, Shelly Sheynin, Adam Polyak, Oron Ashual, Iurii Makarov, Filippos Kokkinos, Naman Goyal, Andrea Vedaldi, Devi Parikh, Justin Johnson, Yaniv Taigman*\u003cbr\u003e\narxiv 2023\n[[Paper](https://arxiv.org/abs/2301.11280)]\n[[Project](https://make-a-video3d.github.io/)]\n\n**Magic3D: High-Resolution Text-to-3D Content Creation**\u003cbr\u003e\n*Chen-Hsuan Lin, Jun Gao, Luming Tang, Towaki Takikawa, Xiaohui Zeng, Xun Huang, Karsten Kreis, Sanja Fidler, Ming-Yu Liu, Tsung-Yi Lin*\u003cbr\u003e\nCVPR 2023\n[[Paper](https://arxiv.org/abs/2211.10440)]\n[[Project](https://deepimagination.cc/Magic3D/)]\n\n**DATID-3D: Diversity-Preserved Domain Adaptation Using Text-to-Image Diffusion for 3D Generative Model**\u003cbr\u003e\n*Gwanghyun Kim, Se Young Chun*\u003cbr\u003e\nCVPR 2023\n[[Paper](https://arxiv.org/abs/2211.16374)]\n[[Code](https://github.com/gwang-kim/DATID-3D)]\n[[Project](https://datid-3d.github.io/)]\n\n**Towards Photorealistic 3D Object Generation and Editing with Text-guided Diffusion Models**\u003cbr\u003e\n*Gang Li, Heliang Zheng, Chaoyue Wang, Chang Li, Changwen Zheng, Dacheng Tao*\u003cbr\u003e\narxiv 2022\n[[Paper](https://arxiv.org/abs/2211.14108)]\n[[Project](https://3ddesigner-diffusion.github.io/)]\n\n\n**DreamFusion: Text-to-3D using 2D Diffusion**\u003cbr\u003e\n*Ben Poole, Ajay Jain, Jonathan T. Barron, Ben Mildenhall*\u003cbr\u003e\narxiv 2022\n[[Paper](https://arxiv.org/abs/2209.14988)]\n[[Project](https://dreamfusion3d.github.io/)]\n\n**Zero-Shot Text-Guided Object Generation with Dream Fields**\u003cbr\u003e\n*Ajay Jain, Ben Mildenhall, Jonathan T. Barron, Pieter Abbeel, Ben Poole*\u003cbr\u003e\nCVPR 2022\n[[Paper](https://openaccess.thecvf.com/content/CVPR2022/papers/Jain_Zero-Shot_Text-Guided_Object_Generation_With_Dream_Fields_CVPR_2022_paper.pdf)]\n[[Code](https://github.com/google-research/google-research/tree/master/dreamfields)]\n[[Project](https://ajayj.com/dreamfields)]\n\n**IDE-3D: Interactive Disentangled Editing for High-Resolution 3D-aware Portrait Synthesis**\u003cbr\u003e\n*Jingxiang Sun, Xuan Wang, Yichun Shi, Lizhen Wang, Jue Wang, Yebin Liu*\u003cbr\u003e\nSIGGRAPH Asia 2022\n[[Paper](https://arxiv.org/pdf/2205.15517.pdf)]\n[[Code](https://github.com/MrTornado24/IDE-3D)]\n[[Project](https://mrtornado24.github.io/IDE-3D/)]\n\n**Sem2NeRF: Converting Single-View Semantic Masks to Neural Radiance Fields**\u003cbr\u003e\n*Yuedong Chen, Qianyi Wu, Chuanxia Zheng, Tat-Jen Cham, Jianfei Cai*\u003cbr\u003e\narxiv 2022\n[[Paper](https://arxiv.org/abs/2203.10821)]\n[[Code](https://github.com/donydchen/sem2nerf)]\n[[Project](https://donydchen.github.io/sem2nerf/)]\n\n**CLIP-NeRF: Text-and-Image Driven Manipulation of Neural Radiance Fields**\u003cbr\u003e\n*Can Wang, Menglei Chai, Mingming He, Dongdong Chen, Jing Liao*\u003cbr\u003e\nCVPR 2022\n[[Paper](https://arxiv.org/abs/2112.05139)]\n[[Code](https://github.com/cassiePython/CLIPNeRF)]\n[[Project](https://cassiepython.github.io/clipnerf/)]\n\n**CG-NeRF: Conditional Generative Neural Radiance Fields**\u003cbr\u003e\n*Kyungmin Jo, Gyumin Shim, Sanghun Jung, Soyoung Yang, Jaegul Choo*\u003cbr\u003e\narxiv 2021\n[[Paper](https://arxiv.org/abs/2112.03517)]\n\n**Zero-Shot Text-Guided Object Generation with Dream Fields**\u003cbr\u003e\n*Ajay Jain, Ben Mildenhall, Jonathan T. Barron, Pieter Abbeel, Ben Poole*\u003cbr\u003e\narxiv 2021\n[[Paper](https://arxiv.org/abs/2112.01455)]\n[[Project](https://ajayj.com/dreamfields)]\n\n\n**AD-NeRF: Audio Driven Neural Radiance Fields for Talking Head Synthesis**\u003cbr\u003e\n*Yudong Guo, Keyu Chen, Sen Liang, Yong-Jin Liu, Hujun Bao, Juyong Zhang*\u003cbr\u003e\nICCV 2021\n[[Paper](https://openaccess.thecvf.com/content/ICCV2021/papers/Guo_AD-NeRF_Audio_Driven_Neural_Radiance_Fields_for_Talking_Head_Synthesis_ICCV_2021_paper.pdf)]\n[[Code](https://github.com/YudongGuo/AD-NeRF)]\n[[Project](https://yudongguo.github.io/ADNeRF/)]\n[[Video](https://www.youtube.com/watch?v=TQO2EBYXLyU)]\n\n\n\n\u003cbr\u003e\n\n\n\n\n## Diffusion-based-Methods\n\n\n**BLIP-Diffusion: Pre-trained Subject Representation for Controllable Text-to-Image Generation and Editing**\u003cbr\u003e\n*Dongxu Li, Junnan Li, Steven C.H. Hoi*\u003cbr\u003e\nArxiv 2023\n[[Paper](https://arxiv.org/pdf/2305.14720.pdf)]\n[[Project](https://dxli94.github.io/BLIP-Diffusion-website/)]\n[[Code](https://github.com/salesforce/LAVIS/tree/main/projects/blip-diffusion)]\n\n**InstructEdit: Improving Automatic Masks for Diffusion-based Image Editing With User Instructions**\u003cbr\u003e\n*Qian Wang, Biao Zhang, Michael Birsak, Peter Wonka*\u003cbr\u003e\nArxiv 2023\n[[Paper](https://arxiv.org/pdf/2305.18047.pdf)]\n[[Project](https://qianwangx.github.io/InstructEdit/)]\n[[Code](https://github.com/qianwangx/instructedit)]\n\n**DreamBooth: Fine Tuning Text-to-Image Diffusion Models for Subject-Driven Generation**\u003cbr\u003e\n*Nataniel Ruiz, Yuanzhen Li, Varun Jampani Yael, Pritch Michael, Rubinstein Kfir Aberman*\u003cbr\u003e\nCVPR 2023\n[[Paper](https://arxiv.org/pdf/2208.12242.pdf)]\n[[Project](https://dreambooth.github.io/)]\n[[Code](https://github.com/google/dreambooth)]\n\n**Multi-Concept Customization of Text-to-Image Diffusion**\u003cbr\u003e\n*Nupur Kumari, Bingliang Zhang, Richard Zhang, Eli Shechtman, Jun-Yan Zhu*\u003cbr\u003e\nCVPR 2023\n[[Paper](https://openaccess.thecvf.com/content/CVPR2023/papers/Kumari_Multi-Concept_Customization_of_Text-to-Image_Diffusion_CVPR_2023_paper.pdf)]\n[[Project](https://www.cs.cmu.edu/~custom-diffusion/)]\n[[Code](https://github.com/adobe-research/custom-diffusion)]\n\n**Collaborative Diffusion for Multi-Modal Face Generation and Editing**\u003cbr\u003e\n*Ziqi Huang, Kelvin C.K. Chan, Yuming Jiang, Ziwei Liu*\u003cbr\u003e\nCVPR 2023\n[[Paper](https://arxiv.org/pdf/2304.10530v1.pdf)]\n[[Project](https://ziqihuangg.github.io/projects/collaborative-diffusion.html)]\n[[Code](https://github.com/ziqihuangg/Collaborative-Diffusion)]\n\n**Plug-and-Play Diffusion Features for Text-Driven Image-to-Image Translation**\u003cbr\u003e\n*Narek Tumanyan, Michal Geyer, Shai Bagon, Tali Dekel*\u003cbr\u003e\nCVPR 2023\n[[Paper](https://openaccess.thecvf.com/content/CVPR2023/papers/Tumanyan_Plug-and-Play_Diffusion_Features_for_Text-Driven_Image-to-Image_Translation_CVPR_2023_paper.pdf)]\n[[Project](https://pnp-diffusion.github.io/)]\n[[Code](https://github.com/MichalGeyer/plug-and-play)]\n\n**SINE: SINgle Image Editing with Text-to-Image Diffusion Models**\u003cbr\u003e\n*Zhixing Zhang, Ligong Han, Arnab Ghosh, Dimitris Metaxas, Jian Ren*\u003cbr\u003e\nCVPR 2023\n[[Paper](https://openaccess.thecvf.com/content/CVPR2023/papers/Zhang_SINE_SINgle_Image_Editing_With_Text-to-Image_Diffusion_Models_CVPR_2023_paper.pdf)]\n[[Project](https://zhang-zx.github.io/SINE/)]\n[[Code](https://github.com/zhang-zx/SINE)]\n\n**NULL-Text Inversion for Editing Real Images Using Guided Diffusion Models**\u003cbr\u003e\n*Ron Mokady, Amir Hertz, Kfir Aberman, Yael Pritch, Daniel Cohen-Or*\u003cbr\u003e\nCVPR 2023\n[[Paper](https://openaccess.thecvf.com/content/CVPR2023/papers/Mokady_NULL-Text_Inversion_for_Editing_Real_Images_Using_Guided_Diffusion_Models_CVPR_2023_paper.pdf)]\n[[Project](https://null-text-inversion.github.io/)]\n[[Code](https://github.com/google/prompt-to-prompt/#null-text-inversion-for-editing-real-images)]\n\n**Paint by Example: Exemplar-Based Image Editing With Diffusion Models**\u003cbr\u003e\n*Binxin Yang, Shuyang Gu, Bo Zhang, Ting Zhang, Xuejin Chen, Xiaoyan Sun, Dong Chen, Fang Wen*\u003cbr\u003e\nCVPR 2023\n[[Paper](https://openaccess.thecvf.com/content/CVPR2023/papers/Yang_Paint_by_Example_Exemplar-Based_Image_Editing_With_Diffusion_Models_CVPR_2023_paper.pdf)]\n[[Demo](https://huggingface.co/spaces/Fantasy-Studio/Paint-by-Example)]\n[[Code](https://github.com/Fantasy-Studio/Paint-by-Example)]\n\n**SpaText: Spatio-Textual Representation for Controllable Image Generation**\u003cbr\u003e\n*Omri Avrahami, Thomas Hayes, Oran Gafni, Sonal Gupta, Yaniv Taigman, Devi Parikh, Dani Lischinski, Ohad Fried, Xi Yin*\u003cbr\u003e\nCVPR 2023\n[[Paper](https://arxiv.org/pdf/2211.14305.pdf)]\n[[Project](https://omriavrahami.com/spatext/)]\n\n**Align your Latents: High-Resolution Video Synthesis with Latent Diffusion Models**\u003cbr\u003e\n*Andreas Blattmann, Robin Rombach, Huan Ling, Tim Dockhorn, Seung Wook Kim, Sanja Fidler, Karsten Kreis*\u003cbr\u003e\nCVPR 2023\n[[Paper](https://arxiv.org/pdf/2304.08818.pdf)]\n[[Project](https://research.nvidia.com/labs/toronto-ai/VideoLDM/)]\n\n\n**InstructPix2Pix Learning to Follow Image Editing Instructions**\u003cbr\u003e\n*Tim Brooks, Aleksander Holynski, Alexei A. Efros*\u003cbr\u003e\nCVPR 2023\n[[Paper](https://openaccess.thecvf.com/content/CVPR2023/papers/Brooks_InstructPix2Pix_Learning_To_Follow_Image_Editing_Instructions_CVPR_2023_paper.pdf)]\n[[Project](https://www.timothybrooks.com/instruct-pix2pix/)]\n[[Code](https://github.com/timothybrooks/instruct-pix2pix)]\n\n**Unite and Conquer: Plug \u0026 Play Multi-Modal Synthesis using Diffusion Models**\u003cbr\u003e\n*Nithin Gopalakrishnan Nair, Chaminda Bandara, Vishal M Patel*\u003cbr\u003e\nCVPR 2023\n[[Paper](https://openaccess.thecvf.com/content/CVPR2023/papers/Nair_Unite_and_Conquer_Plug__Play_Multi-Modal_Synthesis_Using_Diffusion_CVPR_2023_paper.pdf)]\n[[Project](https://nithin-gk.github.io/projectpages/Multidiff/index.html)]\n[[Code](https://github.com/Nithin-GK/UniteandConquer)]\n\n**DiffEdit: Diffusion-based semantic image editing with mask guidance**\u003cbr\u003e\n*Guillaume Couairon, Jakob Verbeek, Holger Schwenk, Matthieu Cord*\u003cbr\u003e\nCVPR 2023\n[[Paper](https://arxiv.org/pdf/2210.11427.pdf)]\n\n\n**eDiff-I: Text-to-Image Diffusion Models with an Ensemble of Expert Denoisers**\u003cbr\u003e\n*Yogesh Balaji, Seungjun Nah, Xun Huang, Arash Vahdat, Jiaming Song, Qinsheng Zhang, Karsten Kreis, Miika Aittala, Timo Aila, Samuli Laine, Bryan Catanzaro, Tero Karras, Ming-Yu Liu*\u003cbr\u003e\nArxiv 2022\n[[Paper](https://arxiv.org/pdf/2211.01324.pdf)]\n[[Project](https://research.nvidia.com/labs/dir/eDiff-I/)]\n\n\n**Prompt-to-Prompt Image Editing with Cross-Attention Control**\u003cbr\u003e\n*Amir Hertz, Ron Mokady, Jay Tenenbaum, Kfir Aberman1 Yael Pritch, Daniel Cohen-Or*\u003cbr\u003e\nArxiv 2022\n[[Paper](https://prompt-to-prompt.github.io/ptp_files/Prompt-to-Prompt_preprint.pdf)]\n[[Project](https://prompt-to-prompt.github.io/)]\n[[Code](https://github.com/google/prompt-to-prompt)]\n\n**An Image is Worth One Word: Personalizing Text-to-Image Generation using Textual Inversion**\u003cbr\u003e\n*Rinon Gal, Yuval Alaluf, Yuval Atzmon, Or Patashnik, Amit H. Bermano, Gal Chechik, Daniel Cohen-Or*\u003cbr\u003e\nArxiv 2022\n[[Paper](https://arxiv.org/pdf/2208.01618.pdf)]\n[[Project](https://textual-inversion.github.io/)]\n[[Code](https://github.com/rinongal/textual_inversion)]\n\n\n**Text2Human: Text-Driven Controllable Human Image Generation**\u003cbr\u003e\n*Yuming Jiang, Shuai Yang, Haonan Qiu, Wayne Wu, Chen Change Loy, Ziwei Liu*\u003cbr\u003e\nSIGGRAPH 2022\n[[Paper](https://arxiv.org/pdf/2205.15996.pdf)]\n[[Project](https://yumingj.github.io/projects/Text2Human.html)]\n[[Code](https://github.com/yumingj/Text2Human)]\n\n\n**[DALL-E 2] Hierarchical Text-Conditional Image Generation with CLIP Latents**\u003cbr\u003e\n*Aditya Ramesh, Prafulla Dhariwal, Alex Nichol, Casey Chu, Mark Chen*\u003cbr\u003e\n[[Paper](https://cdn.openai.com/papers/dall-e-2.pdf)]\n[[Code](https://github.com/lucidrains/DALLE2-pytorch)]\n\n**High-Resolution Image Synthesis with Latent Diffusion Models**\u003cbr\u003e\n*Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, Björn Ommer*\u003cbr\u003e\nCVPR 2022\n[[Paper](https://arxiv.org/abs/2112.10752)]\n[[Code](https://github.com/CompVis/latent-diffusion)]\n\n**v objective diffusion**\u003cbr\u003e\n*Katherine Crowson*\u003cbr\u003e\n[[Code](https://github.com/crowsonkb/v-diffusion-pytorch)]\n\n**GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models**\u003cbr\u003e\n*Alex Nichol, Prafulla Dhariwal, Aditya Ramesh, Pranav Shyam, Pamela Mishkin, Bob McGrew, Ilya Sutskever, Mark Chen*\u003cbr\u003e\narxiv 2021\n[[Paper](https://arxiv.org/abs/2112.10741)]\n[[Code](https://github.com/openai/glide-text2im)]\n\n**Vector Quantized Diffusion Model for Text-to-Image Synthesis**\u003cbr\u003e\n*Shuyang Gu, Dong Chen, Jianmin Bao, Fang Wen, Bo Zhang, Dongdong Chen, Lu Yuan, Baining Guo*\u003cbr\u003e\narxiv 2021\n[[Paper](https://arxiv.org/abs/2111.14822)]\n[[Code](https://github.com/microsoft/VQ-Diffusion)]\n\n**DiffusionCLIP: Text-Guided Diffusion Models for Robust Image Manipulation**\u003cbr\u003e\n*Gwanghyun Kim, Jong Chul Ye*\u003cbr\u003e\narxiv 2021\n[[Paper](https://arxiv.org/abs/2110.02711)]\n\n**Blended Diffusion for Text-driven Editing of Natural Images**\u003cbr\u003e\n*Omri Avrahami, Dani Lischinski, Ohad Fried*\u003cbr\u003e\nCVPR 2022\n[[Paper](https://arxiv.org/abs/2111.14818)]\n[[Project](https://omriavrahami.com/blended-diffusion-page/)]\n[[Code](https://github.com/omriav/blended-diffusion)]\n\n\n\n\n\u003cbr\u003e\n\n\n\n## Autoregressive-Methods\n\n**MaskGIT: Masked Generative Image Transformer**\u003cbr\u003e\n*Huiwen Chang, Han Zhang, Lu Jiang, Ce Liu, William T. Freeman*\u003cbr\u003e\narxiv 2022\n[[Paper](https://arxiv.org/abs/2202.04200)] \n\u003c!-- [[Project](https://wenxin.baidu.com/wenxin/ernie-vilg)] --\u003e\n\n**ERNIE-ViLG: Unified Generative Pre-training for Bidirectional Vision-Language Generation**\u003cbr\u003e\n*Han Zhang, Weichong Yin, Yewei Fang, Lanxin Li, Boqiang Duan, Zhihua Wu, Yu Sun, Hao Tian, Hua Wu, Haifeng Wang*\u003cbr\u003e\narxiv 2021\n[[Paper](https://arxiv.org/abs/2112.15283)] \n[[Project](https://wenxin.baidu.com/wenxin/ernie-vilg)]\n\n**NÜWA: Visual Synthesis Pre-training for Neural visUal World creAtion**\u003cbr\u003e\n*Chenfei Wu, Jian Liang, Lei Ji, Fan Yang, Yuejian Fang, Daxin Jiang, Nan Duan*\u003cbr\u003e\narxiv 2021\n[[Paper](https://arxiv.org/abs/2111.12417)] \n[[Code](https://github.com/microsoft/NUWA)] \n[[Video](https://youtu.be/C9CTnZJ9ZE0)]\n\n**L-Verse: Bidirectional Generation Between Image and Text**\u003cbr\u003e\n*Taehoon Kim, Gwangmo Song, Sihaeng Lee, Sangyun Kim, Yewon Seo, Soonyoung Lee, Seung Hwan Kim, Honglak Lee, Kyunghoon Bae*\u003cbr\u003e\narxiv 2021\n[[Paper](https://arxiv.org/abs/2111.11133)] \n[[Code](https://github.com/tgisaturday/L-Verse)] \n\u003c!-- [[Video](https://youtu.be/C9CTnZJ9ZE0)] --\u003e\n\n**M6-UFC: Unifying Multi-Modal Controls for Conditional Image Synthesis**\u003cbr\u003e\n*Zhu Zhang, Jianxin Ma, Chang Zhou, Rui Men, Zhikang Li, Ming Ding, Jie Tang, Jingren Zhou, Hongxia Yang*\u003cbr\u003e\nNeurIPS 2021\n[[Paper](https://arxiv.org/abs/2105.14211v3)] \n\u003c!-- [[Project](https://compvis.github.io/imagebart/)] --\u003e\n\n**ImageBART: Bidirectional Context with Multinomial Diffusion for Autoregressive Image Synthesis**\u003cbr\u003e\n*Patrick Esser, Robin Rombach, Andreas Blattmann, Björn Ommer*\u003cbr\u003e\nNeurIPS 2021\n[[Paper](https://openreview.net/pdf?id=-1AAgrS5FF)] \n[[Code](https://github.com/CompVis/imagebart)] \n[[Project](https://compvis.github.io/imagebart/)]\n\n**A Picture is Worth a Thousand Words: A Unified System for Diverse Captions and Rich Images Generation**\u003cbr\u003e\n*Yupan Huang, Bei Liu, Jianlong Fu, Yutong Lu*\u003cbr\u003e\nACM MM 2021\n[[Paper](https://arxiv.org/abs/2110.09756)] \n[[Code](https://github.com/researchmm/generate-it)] \n\n**Unifying Multimodal Transformer for Bi-directional Image and Text Generation**\u003cbr\u003e\n*Yupan Huang, Hongwei Xue, Bei Liu, Yutong Lu*\u003cbr\u003e\nACM MM 2021\n[[Paper](https://arxiv.org/abs/2110.09753)] \n[[Code](https://github.com/researchmm/generate-it)] \n\n**Taming Transformers for High-Resolution Image Synthesis**\u003cbr\u003e\n*Patrick Esser, Robin Rombach, Björn Ommer*\u003cbr\u003e\nCVPR 2021\n[[Paper](https://openaccess.thecvf.com/content/CVPR2021/papers/Esser_Taming_Transformers_for_High-Resolution_Image_Synthesis_CVPR_2021_paper.pdf)] \n[[Code](https://github.com/CompVis/taming-transformers)] \n[[Project](https://compvis.github.io/taming-transformers/)]\n\n**RuDOLPH: One Hyper-Modal Transformer can be creative as DALL-E and smart as CLIP**\u003cbr\u003e\n*Alex Shonenkov and Michael Konstantinov*\u003cbr\u003e\narxiv 2022\n[[Code](https://github.com/sberbank-ai/ru-dolph)]\n\n**Generate Images from Texts in Russian (ruDALL-E)**\u003cbr\u003e\n[[Code](https://github.com/sberbank-ai/ru-dalle)]\n[[Project](https://rudalle.ru/en/)]\n\n**Zero-Shot Text-to-Image Generation**\u003cbr\u003e\n*Aditya Ramesh, Mikhail Pavlov, Gabriel Goh, Scott Gray, Chelsea Voss, Alec Radford, Mark Chen, Ilya Sutskever*\u003cbr\u003e\narxiv 2021\n[[Paper](https://arxiv.org/abs/2102.12092)]\n[[Code](https://github.com/openai/DALL-E)]\n[[Project](https://openai.com/blog/dall-e/)]\n\n**Compositional Transformers for Scene Generation**\u003cbr\u003e\n*Drew A. Hudson, C. Lawrence Zitnick*\u003cbr\u003e\nNeurIPS 2021\n[[Paper](https://openreview.net/pdf?id=YQeWoRnwTnE)]\n[[Code](https://github.com/dorarad/gansformer)]\n\n**X-LXMERT: Paint, Caption and Answer Questions with Multi-Modal Transformers**\u003cbr\u003e\n*Jaemin Cho, Jiasen Lu, Dustin Schwenk, Hannaneh Hajishirzi, Aniruddha Kembhavi*\u003cbr\u003e\nEMNLP 2020\n[[Paper](https://arxiv.org/abs/2009.11278)] \n[[Code](https://github.com/allenai/x-lxmert)] \n\n**One-shot Talking Face Generation from Single-speaker Audio-Visual Correlation Learning**\u003cbr\u003e\n*Suzhen Wang, Lincheng Li, Yu Ding, Xin Yu*\u003cbr\u003e\nAAAI 2022\n[[Paper](https://arxiv.org/abs/2112.02749)]\n\n\u003cbr\u003e\n\n### Image-Quantizer\n\n**[TE-VQGAN] Translation-equivariant Image Quantizer for Bi-directional Image-Text Generation**\u003cbr\u003e\n*Woncheol Shin, Gyubok Lee, Jiyoung Lee, Joonseok Lee, Edward Choi*\u003cbr\u003e\narxiv 2021\n[[Paper](https://arxiv.org/abs/2110.04627)]\n[[Code](https://github.com/wcshin-git/TE-VQGAN)]\n\n**[ViT-VQGAN] Vector-quantized Image Modeling with Improved VQGAN**\u003cbr\u003e\n*Jiahui Yu, Xin Li, Jing Yu Koh, Han Zhang, Ruoming Pang, James Qin, Alexander Ku, Yuanzhong Xu, Jason Baldridge, Yonghui Wu*\u003cbr\u003e\narxiv 2021\n[[Paper](https://arxiv.org/abs/2110.04627)]\n\u003c!-- [[Code](https://github.com/CompVis/taming-transformers)] --\u003e\n\n**[PeCo] PeCo: Perceptual Codebook for BERT Pre-training of Vision Transformers**\u003cbr\u003e\n*Xiaoyi Dong, Jianmin Bao, Ting Zhang, Dongdong Chen, Weiming Zhang, Lu Yuan, Dong Chen, Fang Wen, Nenghai Yu*\u003cbr\u003e\narxiv 2021\n[[Paper](https://arxiv.org/abs/2111.12710)]\n\u003c!-- [[Code](https://github.com/CompVis/taming-transformers)] --\u003e\n\n**[VQ-GAN] Taming Transformers for High-Resolution Image Synthesis**\u003cbr\u003e\n*Patrick Esser, Robin Rombach, Björn Ommer*\u003cbr\u003e\nCVPR 2021\n[[Paper](https://openaccess.thecvf.com/content/CVPR2021/papers/Esser_Taming_Transformers_for_High-Resolution_Image_Synthesis_CVPR_2021_paper.pdf)]\n[[Code](https://github.com/CompVis/taming-transformers)]\n\n**[Gumbel-VQ] vq-wav2vec: Self-Supervised Learning of Discrete Speech Representations**\u003cbr\u003e\n*Alexei Baevski, Steffen Schneider, Michael Auli*\u003cbr\u003e\nICLR 2020\n[[Paper](https://openreview.net/pdf?id=rylwJxrYDS)]\n[[Code](https://github.com/pytorch/fairseq/blob/main/examples/wav2vec/README.md)]\n\n**[EM VQ-VAE] Theory and Experiments on Vector Quantized Autoencoders**\u003cbr\u003e\n*Aurko Roy, Ashish Vaswani, Arvind Neelakantan, Niki Parmar*\u003cbr\u003e\narxiv 2018\n[[Paper](https://arxiv.org/abs/1805.11063)]\n[[Code](https://github.com/jaywalnut310/Vector-Quantized-Autoencoders)]\n\n**[VQ-VAE] Neural Discrete Representation Learning**\u003cbr\u003e\n*Aaron van den Oord, Oriol Vinyals, Koray Kavukcuoglu*\u003cbr\u003e\nNIPS 2017\n[[Paper](https://proceedings.neurips.cc/paper/2017/file/7a98af17e63a0ac09ce2e96d03992fbc-Paper.pdf)]\n[[Code](https://github.com/ritheshkumar95/pytorch-vqvae)]\n\n**[VQ-VAE2 or EMA-VQ] Generating Diverse High-Fidelity Images with VQ-VAE-2**\u003cbr\u003e\n*Ali Razavi, Aaron van den Oord, Oriol Vinyals*\u003cbr\u003e\nNIPS 2019\n[[Paper](https://proceedings.neurips.cc/paper/2019/file/5f8e2fa1718d1bbcadf1cd9c7a54fb8c-Paper.pdf)]\n[[Code](https://github.com/rosinality/vq-vae-2-pytorch)]\n\n**[Discrete VAE] Discrete Variational Autoencoders**\u003cbr\u003e\n*Jason Tyler Rolfe*\u003cbr\u003e\nICLR 2017\n[[Paper](https://arxiv.org/abs/1609.02200)]\n[[Code](https://github.com/openai/DALL-E)]\n\n**[DVAE++] DVAE++: Discrete Variational Autoencoders with Overlapping Transformations**\u003cbr\u003e\n*Arash Vahdat, William G. Macready, Zhengbing Bian, Amir Khoshaman, Evgeny Andriyash*\u003cbr\u003e\nICML 2018\n[[Paper](https://arxiv.org/abs/1802.04920)]\n[[Code](https://github.com/xmax1/dvae)]\n\n**[DVAE#] DVAE#: Discrete Variational Autoencoders with Relaxed Boltzmann Priors**\u003cbr\u003e\n*Arash Vahdat, Evgeny Andriyash, William G. Macready*\u003cbr\u003e\nNIPS 2018\n[[Paper](https://arxiv.org/abs/1805.07445)]\n[[Code](https://github.com/xmax1/dvae)]\n\n\u003cbr\u003e\n\n\n\n\n\n\n\n\n## GAN-based-Methods\n\n**GauGAN2**\u003cbr\u003e\n*NVIDIA*\u003cbr\u003e\n[[Project](http://gaugan.org/gaugan2/)]\n[[Video](https://www.youtube.com/watch?v=p9MAvRpT6Cg)]\n\n**Multimodal Conditional Image Synthesis with Product-of-Experts GANs**\u003cbr\u003e\n*Xun Huang, Arun Mallya, Ting-Chun Wang, Ming-Yu Liu*\u003cbr\u003e\narxiv 2021\n[[Paper](https://arxiv.org/abs/2112.05130)]\n\n**RiFeGAN2: Rich Feature Generation for Text-to-Image Synthesis from Constrained Prior Knowledge**\u003cbr\u003e\n*Jun Cheng, Fuxiang Wu, Yanling Tian, Lei Wang, Dapeng Tao*\u003cbr\u003e\nTCSVT 2021\n[[Paper](https://ieeexplore.ieee.org/abstract/document/9656731/authors#authors)]\n\n**TRGAN: Text to Image Generation Through Optimizing Initial Image**\u003cbr\u003e\n*Liang Zhao, Xinwei Li, Pingda Huang, Zhikui Chen, Yanqi Dai, Tianyu Li*\u003cbr\u003e\nICONIP 2021\n[[Paper](https://link.springer.com/chapter/10.1007/978-3-030-92307-5_76)]\n\n\u003c!-- **Image Synthesis From Layout With Locality-Aware Mask Adaption [Layout2Image]**\u003cbr\u003e\n*Zejian Li, Jingyu Wu, Immanuel Koh, Yongchuan Tang, Lingyun Sun*\u003cbr\u003e\nGCPR 2021\n[[Paper](https://arxiv.org/pdf/2103.13722.pdf)]\n[[Code](https://github.com/stanifrolov/AttrLostGAN)]\n\n**AttrLostGAN: Attribute Controlled Image Synthesis from Reconfigurable Layout and Style [Layout2Image]**\u003cbr\u003e\n*Stanislav Frolov, Avneesh Sharma, Jörn Hees, Tushar Karayil, Federico Raue, Andreas Dengel*\u003cbr\u003e\nICCV 2021\n[[Paper](https://openaccess.thecvf.com/content/ICCV2021/papers/Li_Image_Synthesis_From_Layout_With_Locality-Aware_Mask_Adaption_ICCV_2021_paper.pdf)] --\u003e\n\n**Audio-Driven Emotional Video Portraits [Audio2Image]**\u003cbr\u003e\n*Xinya Ji, Hang Zhou, Kaisiyuan Wang, Wayne Wu, Chen Change Loy, Xun Cao, Feng Xu*\u003cbr\u003e\nCVPR 2021\n[[Paper](https://arxiv.org/abs/2104.07452)]\n[[Code](https://github.com/jixinya/EVP/)]\n[[Project](https://jixinya.github.io/projects/evp/)]\n\n**SketchyCOCO: Image Generation from Freehand Scene Sketches**\u003cbr\u003e\n*Chengying Gao, Qi Liu, Qi Xu, Limin Wang, Jianzhuang Liu, Changqing Zou*\u003cbr\u003e\nCVPR 2020\n[[Paper](https://arxiv.org/pdf/2003.02683.pdf)]\n[[Code](https://github.com/sysu-imsl/SketchyCOCO)]\n[[Project](https://mikexuq.github.io/test_building_pages/index.html)]\n\n**Direct Speech-to-Image Translation [Audio2Image]**\u003cbr\u003e\n*Jiguo Li, Xinfeng Zhang, Chuanmin Jia, Jizheng Xu, Li Zhang, Yue Wang, Siwei Ma, Wen Gao*\u003cbr\u003e\nJSTSP 2020\n[[Paper](https://ieeexplore.ieee.org/document/9067083/authors#authors)]\n[[Code](https://github.com/smallflyingpig/speech-to-image-translation-without-text)]\n[[Project](https://smallflyingpig.github.io/speech-to-image/main)]\n\n**MirrorGAN: Learning Text-to-image Generation by Redescription [Text2Image]**\u003cbr\u003e\n*Tingting Qiao, Jing Zhang, Duanqing Xu, Dacheng Tao*\u003cbr\u003e\nCVPR 2019\n[[Paper](https://arxiv.org/abs/1903.05854)]\n[[Code](https://github.com/qiaott/MirrorGAN)]\n\n**AttnGAN: Fine-Grained Text to Image Generation with Attentional Generative Adversarial Networks [Text2Image]**\u003cbr\u003e\n*Tao Xu, Pengchuan Zhang, Qiuyuan Huang, Han Zhang, Zhe Gan, Xiaolei Huang, Xiaodong He*\u003cbr\u003e\nCVPR 2018\n[[Paper](https://openaccess.thecvf.com/content_cvpr_2018/papers/Xu_AttnGAN_Fine-Grained_Text_CVPR_2018_paper.pdf)]\n[[Code](https://github.com/taoxugit/AttnGAN)]\n\n**Plug \u0026 Play Generative Networks: Conditional Iterative Generation of Images in Latent Space**\u003cbr\u003e\n*Anh Nguyen, Jeff Clune, Yoshua Bengio, Alexey Dosovitskiy, Jason Yosinski*\u003cbr\u003e\nCVPR 2017\n[[Paper](https://openaccess.thecvf.com/content_cvpr_2017/papers/Nguyen_Plug__Play_CVPR_2017_paper.pdf)]\n[[Code](https://github.com/Evolving-AI-Lab/ppgn)]\n\n**StackGAN++: Realistic Image Synthesis with Stacked Generative Adversarial Networks [Text2Image]**\u003cbr\u003e\n*Han Zhang, Tao Xu, Hongsheng Li, Shaoting Zhang, Xiaogang Wang, Xiaolei Huang, Dimitris Metaxas*\u003cbr\u003e\nTPAMI 2018\n[[Paper](https://arxiv.org/abs/1710.10916)]\n[[Code](https://github.com/hanzhanggit/StackGAN-v2)]\n\n**StackGAN: Text to Photo-realistic Image Synthesis with Stacked Generative Adversarial Networks [Text2Image]**\u003cbr\u003e\n*Han Zhang, Tao Xu, Hongsheng Li, Shaoting Zhang, Xiaogang Wang, Xiaolei Huang, Dimitris Metaxas*\u003cbr\u003e\nICCV 2017\n[[Paper](https://arxiv.org/abs/1612.03242)]\n[[Code](https://github.com/hanzhanggit/StackGAN)]\n\n\u003cbr\u003e\n\n### GAN-Inversion-Methods \n\n**Drag Your GAN: Interactive Point-based Manipulation on the Generative Image Manifold**\u003cbr\u003e\n*Xingang Pan, Ayush Tewari, Thomas Leimkühler, Lingjie Liu, Abhimitra Meka, Christian Theobalt*\u003cbr\u003e\nSIGGRAPH 2023\n[[Paper](https://arxiv.org/abs/2305.10973)]\n[[Code](https://github.com/XingangPan/DragGAN)]\n\n**HairCLIP: Design Your Hair by Text and Reference Image**\u003cbr\u003e\n*Tianyi Wei, Dongdong Chen, Wenbo Zhou, Jing Liao, Zhentao Tan, Lu Yuan, Weiming Zhang, Nenghai Yu*\u003cbr\u003e\narxiv 2021\n[[Paper](https://arxiv.org/abs/2112.05142)]\n[[Code](https://github.com/wty-ustc/HairCLIP)]\n\n**FuseDream: Training-Free Text-to-Image Generation with Improved CLIP+ GAN Space Optimization**\u003cbr\u003e\n*Xingchao Liu, Chengyue Gong, Lemeng Wu, Shujian Zhang, Hao Su, Qiang Liu*\u003cbr\u003e\narxiv 2021\n[[Paper](https://arxiv.org/abs/2112.01573)]\n[[Code](https://github.com/gnobitab/FuseDream)]\n\n**StyleMC: Multi-Channel Based Fast Text-Guided Image Generation and Manipulation**\u003cbr\u003e\n*Umut Kocasari, Alara Dirik, Mert Tiftikci, Pinar Yanardag*\u003cbr\u003e\nWACV 2022\n[[Paper](https://arxiv.org/abs/2112.08493)]\n[[Code](https://github.com/catlab-team/stylemc)]\n[[Project](https://catlab-team.github.io/stylemc/)]\n\n**Cycle-Consistent Inverse GAN for Text-to-Image Synthesis**\u003cbr\u003e\n*Hao Wang, Guosheng Lin, Steven C. H. Hoi, Chunyan Miao*\u003cbr\u003e\nACM MM 2021\n[[Paper](https://dl.acm.org/doi/10.1145/3474085.3475226)]\n\n**StyleCLIP: Text-Driven Manipulation of StyleGAN Imagery**\u003cbr\u003e\n*Or Patashnik, Zongze Wu, Eli Shechtman, Daniel Cohen-Or, Dani Lischinski*\u003cbr\u003e\nICCV 2021\n[[Paper](https://openaccess.thecvf.com/content/ICCV2021/papers/Patashnik_StyleCLIP_Text-Driven_Manipulation_of_StyleGAN_Imagery_ICCV_2021_paper.pdf)]\n[[Code](https://github.com/orpatashnik/StyleCLIP)]\n[[Video](https://www.youtube.com/watch?v=PhR1gpXDu0w)]\n\n**Talk-to-Edit: Fine-Grained Facial Editing via Dialog**\u003cbr\u003e\n*Yuming Jiang, Ziqi Huang, Xingang Pan, Chen Change Loy, Ziwei Liu*\u003cbr\u003e\nICCV 2021\n[[Paper](https://openaccess.thecvf.com/content/ICCV2021/papers/Jiang_Talk-To-Edit_Fine-Grained_Facial_Editing_via_Dialog_ICCV_2021_paper.pdf)]\n[[Code](https://github.com/yumingj/Talk-to-Edit)]\n[[Project](https://www.mmlab-ntu.com/project/talkedit/)]\n\n**TediGAN: Text-Guided Diverse Face Image Generation and Manipulation**\u003cbr\u003e\n*Weihao Xia, Yujiu Yang, Jing-Hao Xue, Baoyuan Wu*\u003cbr\u003e\nCVPR 2021\n[[Paper](https://openaccess.thecvf.com/content/CVPR2021/papers/Xia_TediGAN_Text-Guided_Diverse_Face_Image_Generation_and_Manipulation_CVPR_2021_paper.pdf)]\n[[Code](https://github.com/IIGROUP/TediGAN)]\n[[Video](https://www.youtube.com/watch?v=L8Na2f5viAM)]\n\n**Paint by Word**\u003cbr\u003e\n*David Bau, Alex Andonian, Audrey Cui, YeonHwan Park, Ali Jahanian, Aude Oliva, Antonio Torralba*\u003cbr\u003e\narxiv 2021\n[[Paper](https://arxiv.org/abs/2112.01573)]\n\n\n\n\n\n\u003cbr\u003e\n\n\n## Other-Methods\n\n**Language-Driven Image Style Transfer**\u003cbr\u003e\n*Tsu-Jui Fu, Xin Eric Wang, William Yang Wang*\u003cbr\u003e\narxiv 2021\n[[Paper](https://arxiv.org/abs/2106.00178)]\n\n**CLIPstyler: Image Style Transfer with a Single Text Condition**\u003cbr\u003e\n*Gihyun Kwon, Jong Chul Ye*\u003cbr\u003e\narxiv 2021\n[[Paper](https://arxiv.org/abs/2112.00374)]\n[[Code](https://github.com/paper11667/CLIPstyler)]\n\n**Wakey-Wakey: Animate Text by Mimicking Characters in a GIF**\u003cbr\u003e\n*Liwenhan Xie, Zhaoyu Zhou, Kerun Yu, Yun Wang, Huamin Qu, Siming Chen*\u003cbr\u003e\nUIST 2023\n[[Paper]](https://arxiv.org/pdf/2308.00224.pdf)\n[[Code]](https://github.com/KeriYuu/Wakey-Wakey)\n[[Project]](https://shellywhen.github.io/projects/Wakey-Wakey)\n\n\n\u003cbr\u003e\n\n\n\n\u003cbr\u003e\n\n\n\n## Text-Encoding\n\n**FLAVA: A Foundational Language And Vision Alignment Model**\u003cbr\u003e\n*Amanpreet Singh, Ronghang Hu, Vedanuj Goswami, Guillaume Couairon, Wojciech Galuba, Marcus Rohrbach, Douwe Kiela*\u003cbr\u003e\narxiv 2021\n[[Paper](https://arxiv.org/abs/2112.04482)]\n\u003c!-- [[Code](https://github.com/paper11667/CLIPstyler)] --\u003e\n\n**Learning Transferable Visual Models From Natural Language Supervision (CLIP)**\u003cbr\u003e\n*Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, Ilya Sutskever*\u003cbr\u003e\narxiv 2021\n[[Paper](https://arxiv.org/abs/2103.00020)]\n[[Code](https://github.com/OpenAI/CLIP)]\n\n\n\u003cbr\u003e\n\n\n## Audio-Encoding\n\n**Wav2CLIP: Learning Robust Audio Representations From CLIP (Wav2CLIP)**\u003cbr\u003e\n*Ho-Hsiang Wu, Prem Seetharaman, Kundan Kumar, Juan Pablo Bello*\u003cbr\u003e\nICASSP 2022\n[[Paper](https://arxiv.org/abs/2110.11499)]\n[[Code](https://github.com/descriptinc/lyrebird-wav2clip)]\n\n\n\n\n## Datasets\n\nMultimodal CelebA-HQ (https://github.com/IIGROUP/MM-CelebA-HQ-Dataset)\n\nDeepFashion MultiModal (https://github.com/yumingj/DeepFashion-MultiModal)\n\n\n## Citation\nIf you use this code for your research, please cite our papers.\n```bibtex\n@inproceedings{zhan2023mise,\n  title={Multimodal Image Synthesis and Editing: The Generative AI Era},\n  author={Zhan, Fangneng and Yu, Yingchen and Wu, Rongliang and Zhang, Jiahui and Lu, Shijian and Liu, Lingjie and Kortylewski, Adam and Theobalt, Christian and Xing, Eric},\n  booktitle={IEEE Transactions on Pattern Analysis and Machine Intelligence},\n  year={2023},\n  publisher={IEEE}\n}\n```\n","funding_links":[],"categories":["TeX"],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffnzhan%2FGenerative-AI","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ffnzhan%2FGenerative-AI","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffnzhan%2FGenerative-AI/lists"}