{"id":29009903,"url":"https://github.com/tencentarc/blobctrl","last_synced_at":"2025-06-25T15:33:40.940Z","repository":{"id":283029014,"uuid":"949818111","full_name":"TencentARC/BlobCtrl","owner":"TencentARC","description":"[Arxiv'25] BlobCtrl: A Unified and Flexible Framework for Element-level Image Generation and Editing","archived":false,"fork":false,"pushed_at":"2025-03-20T05:20:02.000Z","size":53713,"stargazers_count":79,"open_issues_count":1,"forks_count":2,"subscribers_count":9,"default_branch":"main","last_synced_at":"2025-04-13T00:53:04.816Z","etag":null,"topics":["aigc","image-editing"],"latest_commit_sha":null,"homepage":"https://liyaowei-stu.github.io/project/BlobCtrl/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/TencentARC.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE.txt","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":"CITATION.cff","codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-03-17T07:29:29.000Z","updated_at":"2025-04-12T08:43:41.000Z","dependencies_parsed_at":"2025-04-13T01:03:12.620Z","dependency_job_id":null,"html_url":"https://github.com/TencentARC/BlobCtrl","commit_stats":null,"previous_names":["tencentarc/blobctrl"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/TencentARC/BlobCtrl","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/TencentARC%2FBlobCtrl","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/TencentARC%2FBlobCtrl/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/TencentARC%2FBlobCtrl/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/TencentARC%2FBlobCtrl/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/TencentARC","download_url":"https://codeload.github.com/TencentARC/BlobCtrl/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/TencentARC%2FBlobCtrl/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":261901407,"owners_count":23227593,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["aigc","image-editing"],"created_at":"2025-06-25T15:33:40.087Z","updated_at":"2025-06-25T15:33:40.929Z","avatar_url":"https://github.com/TencentARC.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# BlobCtrl\n\n😃 This repository contains the implementation of \"BlobCtrl: A Unified and Flexible Framework for Element-level Image Generation and Editing\".\n\nKeywords: Image Generation, Image Editing, Diffusion Models, Element-level\n\n\u003e TL;DR: BlobCtrl enables precise, user-friendly multi-round element-level visual manipulation.\u003cbr\u003e\n\u003e Main Features: 🦉Element-level Add/Remove/Move/Replace/Enlarge/Shrink.\n\n\u003e [Yaowei Li](https://github.com/liyaowei-stu) \u003csup\u003e1\u003c/sup\u003e, [Lingen Li](https://lg-li.github.io/) \u003csup\u003e3\u003c/sup\u003e, [Zhaoyang Zhang](https://zzyfd.github.io/#/) \u003csup\u003e2‡\u003c/sup\u003e, [Xiaoyu Li](https://github.com/zhuang2002) \u003csup\u003e2\u003c/sup\u003e, [Guangzhi Wang](http://gzwang.xyz/) \u003csup\u003e2\u003c/sup\u003e, [Hongxiang Li](https://lihxxx.github.io/) \u003csup\u003e1\u003c/sup\u003e, [Xiaodong Cun](https://vinthony.github.io/academic/) \u003csup\u003e2\u003c/sup\u003e, [Ying Shan](https://www.linkedin.com/in/YingShanProfile/) \u003csup\u003e2\u003c/sup\u003e, [Yuexian Zou](https://www.ece.pku.edu.cn/info/1046/2146.htm) \u003csup\u003e1✉\u003c/sup\u003e\u003cbr\u003e\n\u003e \u003csup\u003e1\u003c/sup\u003ePeking University \u003csup\u003e2\u003c/sup\u003eARC Lab, Tencent PCG \u003csup\u003e3\u003c/sup\u003eThe Chinese University of Hong Kong  \u003csup\u003e‡\u003c/sup\u003eProject Lead \u003csup\u003e✉\u003c/sup\u003eCorresponding Author\n\n\u003cp align=\"center\"\u003e\n  \u003ca href=\"https://liyaowei-stu.github.io/project/BlobCtrl/\"\u003e🌐Project Page\u003c/a\u003e |\n  \u003ca href=\"http://arxiv.org/abs/2503.13434\"\u003e📜Arxiv\u003c/a\u003e |\n  \u003ca href=\"https://youtu.be/rdR4QRR-mbE\"\u003e📹Video\u003c/a\u003e |\n  \u003ca href=\"https://huggingface.co/spaces/Yw22/BlobCtrl\"\u003e🤗Hugging Face Demo\u003c/a\u003e |\n  \u003ca href=\"https://huggingface.co/Yw22/BlobCtrl\"\u003e🤗Hugging Model\u003c/a\u003e\n  \u003c/p\u003e\n\n\u003cp align=\"center\"\u003e\n  \u003ca href=\"\"\u003e🤗Hugging Data (TBD)\u003c/a\u003e |\n  \u003ca href=\"\"\u003e🤗Hugging Benchmark (TBD)\u003c/a\u003e\n\u003c/p\u003e\n\nhttps://github.com/user-attachments/assets/ec5fab3c-fa84-4f5d-baf9-1e744f577515\n\nYoutube Introduction Video: [Youtube](https://youtu.be/rdR4QRR-mbE).\n\n**📖 Table of Contents**\n\n- [BlobCtrl](#blobctrl)\n  - [🔥 Update Logs](#-update-logs)\n  - [🛠️ Method Overview](#️-method-overview)\n  - [🚀 Getting Started](#-getting-started)\n  - [🏃🏼 Running Scripts](#-running-scripts)\n  - [🤝🏼 Cite Us](#-cite-us)\n  - [💖 Acknowledgement](#-acknowledgement)\n  - [❓ Contact](#-contact)\n  - [🌟 Star History](#-star-history)\n\n## 🔥 Update Logs\n\n- [TBD] Release the data preprocessing code.\n- [TBD] Release the BlobData and BlobBench.\n- [TBD] Release the training code\n- [X] [20/03/2025] Release the inference code.\n- [X] [17/03/2025] Release the paper, webpage and gradio demo.\n\n## 🛠️ Method Overview\n\nWe introduce BlobCtrl, a framework that unifies element-level generation and editing using a probabilistic blob-based representation. By employing blobs as visual primitives, our approach effectively decouples and represents spatial location, semantic content, and identity information, enabling precise element-level manipulation. Our key contributions include: 1) a dual-branch diffusion architecture with hierarchical feature fusion for seamless foreground-background integration; 2) a self-supervised training paradigm with tailored data augmentation and score functions; and 3) controllable dropout strategies to balance fidelity and diversity. To support further research, we introduce BlobData for large-scale training and BlobBench for systematic evaluation. Experiments show that BlobCtrl excels in various element-level manipulation tasks, offering a practical solution for precise and flexible visual content creation.\n\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"examples/blobctrl/assets/blobctrl_teaser.png\" width=\"80%\"\u003e\n\u003c/p\u003e\n\n## 🚀 Getting Started\n\n\u003cdetails\u003e\n\u003csummary\u003e\u003cb\u003eEnvironment Requirement 🌍\u003c/b\u003e\u003c/summary\u003e\n\u003cbr\u003e\nBlobCtrl has been implemented and tested on CUDA121, Pytorch 2.2.0, python 3.10.15.\n\nClone the repo:\n\n```\ngit clone git@github.com:TencentARC/BlobCtrl.git\n```\n\nWe recommend you first use `conda` to create virtual environment, and install needed libraries. For example:\n\n```\nconda create -n blobctrl python=3.10.15 -y\nconda activate blobctrl\npython -m pip install --upgrade pip\npip install torch==2.2.0 torchvision==0.17.0 torchaudio==2.2.0 --index-url https://download.pytorch.org/whl/cu121\npip install xformers torch==2.2.0 --index-url https://download.pytorch.org/whl/cu121\npip install -r requirements.txt\n```\n\nThen, you can install diffusers (implemented in this repo) with:\n\n```\npip install -e .\n```\n\n\u003c/details\u003e\n\n\u003cdetails\u003e\n\u003csummary\u003e\u003cb\u003eDownload Model Checkpoints 💾\u003c/b\u003e\u003c/summary\u003e\n\u003cbr\u003e\nDownload the corresponding checkpoints of BlobCtrl.\n\n```\nsh examples/blobctrl/scripts/download_models.sh\n```\n\n**The ckpt folder contains**\n\n- Our provided [BlobCtrl](https://huggingface.co/Yw22/BlobCtrl) checkpoints (`UNet LoRA` + `BlobNet`).\n- Pretrained [SD-v1.5](https://huggingface.co/stable-diffusion-v1-5/stable-diffusion-v1-5) checkpoint.\n- Pretrained [DINOv2](https://huggingface.co/facebook/dinov2-large) checkpoint.\n- Pretrained [SAM](https://dl.fbaipublicfiles.com/segment_anything/sam_vit_h_4b8939.pth) checkpoint.\n\nThe checkpoint structure should be like:\n\n```\n|-- models\n    |-- blobnet\n        |-- config.json\n        |-- diffusion_pytorch_model.safetensors\n    |-- dinov2-large\n        |-- config.json\n        |-- model.safetensors\n        ...\n    |-- sam\n        |-- sam_vit_h_4b8939.pth\n    |-- unet_lora\n        |-- pytorch_lora_weights.safetensors\n```\n\n\u003c/details\u003e\n\n## 🏃🏼 Running Scripts\n\n\u003cdetails\u003e\n\u003csummary\u003e\u003cb\u003eBlobCtrl demo 🤗\u003c/b\u003e\u003c/summary\u003e\n\u003cbr\u003e\nYou can run the demo using the script:\n\n```\nsh examples/blobctrl/scripts/run_app.sh\n```\n\n\u003c/details\u003e\n\n\u003cdetails\u003e\n\u003csummary\u003e\u003cb\u003eBlobCtrl Inference 🌠\u003c/b\u003e\u003c/summary\u003e\n\u003cbr\u003e\nYou can run the inference using the script:\n\n```\nexamples/blobctrl/scripts/inference.sh\n```\n\n\u003c/details\u003e\n\n\n## 🤝🏼 Cite Us\n\n```\n@misc{li2024brushedit,\n  title={BlobCtrl: A Unified and Flexible Framework for Element-level Image Generation and Editing}, \n  author={Yaowei Li, Lingen Li, Zhaoyang Zhang, Xiaoyu Li, Guangzhi Wang, Hongxiang Li, Xiaodong Cun, Ying Shan, Yuexian Zou},\n  year={2025},\n  eprint={2503.13434},\n  archivePrefix={arXiv},\n  primaryClass={cs.CV}\n}\n```\n\n## 💖 Acknowledgement\n\nOur implementation builds upon the [diffusers](https://github.com/huggingface/diffusers) library. We extend our sincere gratitude to all the contributors of the diffusers project!\n\nWe also acknowledge the [BlobGAN](https://github.com/dave-epstein/blobgan) project for providing valuable insights and inspiration for our blob-based representation approach.\n\n## ❓ Contact\n\nFor any question, feel free to email `liyaowei01@gmail.com`.\n\n## 🌟 Star History\n\n\u003cp align=\"center\"\u003e\n    \u003ca href=\"https://star-history.com/#TencentARC/BlobCtrl\" target=\"_blank\"\u003e\n        \u003cimg width=\"500\" src=\"https://api.star-history.com/svg?repos=TencentARC/BlobCtrl\u0026type=Date\" alt=\"Star History Chart\"\u003e\n    \u003c/a\u003e\n\u003cp\u003e\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftencentarc%2Fblobctrl","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ftencentarc%2Fblobctrl","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftencentarc%2Fblobctrl/lists"}