{"id":13488821,"url":"https://github.com/yeungchenwa/FontDiffuser","last_synced_at":"2025-03-28T02:31:09.825Z","repository":{"id":213651394,"uuid":"729391325","full_name":"yeungchenwa/FontDiffuser","owner":"yeungchenwa","description":"[AAAI2024] FontDiffuser: One-Shot Font Generation via Denoising Diffusion with Multi-Scale Content Aggregation and Style Contrastive Learning","archived":false,"fork":false,"pushed_at":"2024-03-14T09:15:35.000Z","size":17195,"stargazers_count":287,"open_issues_count":24,"forks_count":25,"subscribers_count":5,"default_branch":"main","last_synced_at":"2024-10-31T01:34:41.163Z","etag":null,"topics":["deep-learning","diffusers","diffusion","font-generation","image-generation"],"latest_commit_sha":null,"homepage":"https://yeungchenwa.github.io/fontdiffuser-homepage/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/yeungchenwa.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-12-09T04:48:34.000Z","updated_at":"2024-10-29T02:37:17.000Z","dependencies_parsed_at":"2024-10-31T01:30:30.156Z","dependency_job_id":"02973bc4-93c9-45f6-87c4-46e19b87f427","html_url":"https://github.com/yeungchenwa/FontDiffuser","commit_stats":null,"previous_names":["yeungchenwa/fontdiffuser"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/yeungchenwa%2FFontDiffuser","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/yeungchenwa%2FFontDiffuser/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/yeungchenwa%2FFontDiffuser/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/yeungchenwa%2FFontDiffuser/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/yeungchenwa","download_url":"https://codeload.github.com/yeungchenwa/FontDiffuser/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":245957626,"owners_count":20700304,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["deep-learning","diffusers","diffusion","font-generation","image-generation"],"created_at":"2024-07-31T18:01:22.363Z","updated_at":"2025-03-28T02:31:07.870Z","avatar_url":"https://github.com/yeungchenwa.png","language":"Python","funding_links":[],"categories":["Text Generation"],"sub_categories":[],"readme":"\u003cdiv align=center\u003e\n\n# FontDiffuser: One-Shot Font Generation via Denoising Diffusion with Multi-Scale Content Aggregation and Style Contrastive Learning\n\n\u003c/div\u003e\n\n![FontDiffuser_LOGO](figures/logo.png)  \n\n\u003cdiv align=center\u003e\n\n[![arXiv preprint](http://img.shields.io/badge/arXiv-2312.12142-b31b1b)](https://arxiv.org/abs/2312.12142) \n[![Gradio demo](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-FontDiffuser-ff7c00)](https://huggingface.co/spaces/yeungchenwa/FontDiffuser-Gradio)\n[![Homepage](https://img.shields.io/badge/Homepage-FontDiffuser-green)](https://yeungchenwa.github.io/fontdiffuser-homepage/)\n[![Code](https://img.shields.io/badge/Code-FontDiffuser-yellow)](https://github.com/yeungchenwa/FontDiffuser)\n\n\u003c/div\u003e\n\n\n\u003cp align=\"center\"\u003e\n   \u003cstrong\u003e\u003ca href=\"#🔥-model-zoo\"\u003e🔥 Model Zoo \u003c/a\u003e\u003c/strong\u003e •\n   \u003cstrong\u003e\u003ca href=\"#🛠️-installation\"\u003e🛠️ Installation \u003c/a\u003e\u003c/strong\u003e •\n   \u003cstrong\u003e\u003ca href=\"#🏋️-training\"\u003e🏋️ Training\u003c/a\u003e\u003c/strong\u003e •\n   \u003cstrong\u003e\u003ca href=\"#📺-sampling\"\u003e📺 Sampling\u003c/a\u003e\u003c/strong\u003e •\n   \u003cstrong\u003e\u003ca href=\"#📱-run-webui\"\u003e📱 Run WebUI\u003c/a\u003e\u003c/strong\u003e   \n\u003c/p\u003e\n\n## 🌟 Highlights\n![Vis_1](figures/vis_1.png)\n![Vis_2](figures/with_instructpix2pix.png)\n+ We propose **FontDiffuser**, which can generate unseen characters and styles and can be extended to cross-lingual generation, such as Chinese to Korean.\n+ **FontDiffuser** excels in generating complex characters and handling large style variations. And it achieves state-of-the-art performance. \n+ The generated results by **FontDiffuser** can be perfectly used for **InstructPix2Pix** for decoration, as shown in thr above figure.\n+ We release the 💻[Hugging Face Demo](https://huggingface.co/spaces/yeungchenwa/FontDiffuser-Gradio) online! Welcome to Try it Out!  \n\n## 📅 News\n- **2024.01.27**: The training of phase 2 is released.\n- **2023.12.20**: Our repository is public! 👏🤗\n- **2023.12.19**: 🔥🎉 The 💻[Hugging Face Demo](https://huggingface.co/spaces/yeungchenwa/FontDiffuser-Gradio) is public! Welcome to try it out!\n- **2023.12.16**: The gradio app demo is released.   \n- **2023.12.10**: Release source code with phase 1 training and sampling.   \n- **2023.12.09**: 🎉🎉 Our [paper](https://arxiv.org/abs/2312.12142) is accepted by AAAI2024.   \n- **Previously**: Our [Recommendations-of-Diffusion-for-Text-Image](https://github.com/yeungchenwa/Recommendations-Diffusion-Text-Image) repo is public, which contains a paper collection of recent diffusion models for text-image generation tasks. Welcome to check it out!\n\n## 🔥 Model Zoo\n| **Model**                                    | **chekcpoint** | **status** |\n|----------------------------------------------|----------------|------------|\n| **FontDiffuer**                              | [GoogleDrive](https://drive.google.com/drive/folders/12hfuZ9MQvXqcteNuz7JQ2B_mUcTr-5jZ?usp=drive_link) / [BaiduYun:gexg](https://pan.baidu.com/s/19t1B7le8x8L2yFGaOvyyBQ) | Released  |\n| **SCR**                                      | [GoogleDrive](https://drive.google.com/drive/folders/12hfuZ9MQvXqcteNuz7JQ2B_mUcTr-5jZ?usp=drive_link) / [BaiduYun:gexg](https://pan.baidu.com/s/19t1B7le8x8L2yFGaOvyyBQ) | Released     |\n\n## 🚧 TODO List\n- [x] Add phase 1 training and sampling script.\n- [x] Add WebUI demo.\n- [x] Push demo to Hugging Face.\n- [x] Add phase 2 training script and checkpoint.\n- [ ] Add the pre-training of SCR module.\n- [ ] Combined with InstructPix2Pix.\n\n## 🛠️ Installation\n### Prerequisites (Recommended)\n- Linux\n- Python 3.9\n- Pytorch 1.13.1\n- CUDA 11.7\n\n### Environment Setup\nClone this repo:\n```bash\ngit clone https://github.com/yeungchenwa/FontDiffuser.git\n```\n\n**Step 0**: Download and install Miniconda from the [official website](https://docs.conda.io/en/latest/miniconda.html).\n\n**Step 1**: Create a conda environment and activate it.\n```bash\nconda create -n fontdiffuser python=3.9 -y\nconda activate fontdiffuser\n```\n\n**Step 2**: Install related version Pytorch following [here](https://pytorch.org/get-started/previous-versions/).\n```bash\n# Suggested\npip install torch==1.13.1+cu117 torchvision==0.14.1+cu117 torchaudio==0.13.1 --extra-index-url https://download.pytorch.org/whl/cu117\n```\n\n**Step 3**: Install the required packages.\n```bash\npip install -r requirements.txt\n```\n\n## 🏋️ Training\n### Data Construction\nThe training data files tree should be (The data examples are shown in directory `data_examples/train/`):\n```\n├──data_examples\n│   └── train\n│       ├── ContentImage\n│       │   ├── char0.png\n│       │   ├── char1.png\n│       │   ├── char2.png\n│       │   └── ...\n│       └── TargetImage.png\n│           ├── style0\n│           │     ├──style0+char0.png\n│           │     ├──style0+char1.png\n│           │     └── ...\n│           ├── style1\n│           │     ├──style1+char0.png\n│           │     ├──style1+char1.png\n│           │     └── ...\n│           ├── style2\n│           │     ├──style2+char0.png\n│           │     ├──style2+char1.png\n│           │     └── ...\n│           └── ...\n```\n### Training Configuration\nBefore running the training script (including the following three modes), you should set the training configuration, such as distributed training, through:\n```bash\naccelerate config\n```\n\n### Training - Pretraining of SCR\n```bash\nComing Soon ...\n```\n\n### Training - Phase 1\n```bash\nsh train_phase_1.sh\n```\n- `data_root`: The data root, as `./data_examples`\n- `output_dir`: The training output logs and checkpoints saving directory.\n- `resolution`: The resolution of the UNet in our diffusion model.\n- `style_image_size`: The resolution of the style image, can be different with `resolution`.\n- `content_image_size`: The resolution of the content image, should be the same as the `resolution`.\n- `channel_attn`: Whether to use the channel attention in the MCA block.\n- `train_batch_size`: The batch size in the training.\n- `max_train_steps`: The maximum of the training steps.\n- `learning_rate`: The learning rate when training.\n- `ckpt_interval`: The checkpoint saving interval when training.\n- `drop_prob`: The classifier-free guidance training probability.\n\n### Training - Phase 2\nAfter the phase 2 training, you should put the trained checkpoint files (`unet.pth`, `content_encoder.pth`, and `style_encoder.pth`) to the directory `phase_1_ckpt`. During phase 2, these parameters will be resumed.\n```bash\nsh train_phase_2.sh\n```\n- `phase_2`: Tag to phase 2 training.\n- `phase_1_ckpt_dir`: The model checkpoints saving directory after phase 1 training.\n- `scr_ckpt_path`: The ckpt path of pre-trained SCR module. You can download it from above 🔥Model Zoo.\n- `sc_coefficient`: The coefficient of style contrastive loss for supervision.\n- `num_neg`: The number of negative samples, default to be `16`.\n\n## 📺 Sampling\n### Step 1 =\u003e Prepare the checkpoint   \nOption (1) Download the checkpoint following [GoogleDrive](https://drive.google.com/drive/folders/12hfuZ9MQvXqcteNuz7JQ2B_mUcTr-5jZ?usp=drive_link) / [BaiduYun:gexg](https://pan.baidu.com/s/19t1B7le8x8L2yFGaOvyyBQ), then put the `ckpt` to the root directory, including the files `unet.pth`, `content_encoder.pth`, and `style_encoder.pth`.  \nOption (2) Put your re-training checkpoint folder `ckpt` to the root directory, including the files `unet.pth`, `content_encoder.pth`, and `style_encoder.pth`.\n\n### Step 2 =\u003e Run the script  \n**(1) Sampling image from content image and reference image.**  \n```bash\nsh script/sample_content_image.sh\n```\n- `ckpt_dir`: The model checkpoints saving directory.  \n- `content_image_path`: The content/source image path.\n- `style_image_path`: The style/reference image path.\n- `save_image`: set `True` if saving as images.\n- `save_image_dir`: The image saving directory, the saving files including an `out_single.png` and an `out_with_cs.png`.\n- `device`: The sampling device, recommended GPU acceleration.\n- `guidance_scale`: The classifier-free sampling guidance scale.\n- `num_inference_steps`: The inference step by DPM-Solver++.\n\n**(2) Sampling image from content character.**  \n**Note** Maybe you need a ttf file that contains numerous Chinese characters, you can download it from [BaiduYun:wrth](https://pan.baidu.com/s/1LhcXG4tPcso9BLaUzU6KtQ).\n```bash\nsh script/sample_content_character.sh\n```\n- `character_input`: If set `True`, use character string as content/source input.\n- `content_character`: The content/source content character string.\n- The other parameters are the same as the above option (1).\n\n## 📱 Run WebUI\n### (1) Sampling by FontDiffuser\n```bash\ngradio gradio_app.py\n```\n**Example**:   \n\u003cp align=\"center\"\u003e\n\u003cimg src=\"figures/gradio_fontdiffuer_new.png\" width=\"80%\" height=\"auto\"\u003e\n\u003c/p\u003e\n\n### (2) Sampling by FontDiffuser and Rendering by InstructPix2Pix\n```bash\nComing Soon ...\n```\n\n## 🌄 Gallery\n### Characters of hard level of complexity\n![vis_hard](figures/vis_hard.png)\n\n### Characters of medium level of complexity\n![vis_medium](figures/vis_medium.png)\n\n### Characters of easy level of complexity\n![vis_easy](figures/vis_easy.png)\n\n### Cross-Lingual Generation (Chinese to Korean)\n![vis_korean](figures/vis_korean.png)\n\n## 💙 Acknowledgement\n- [diffusers](https://github.com/huggingface/diffusers)\n\n## Copyright\n- This repository can only be used for non-commercial research purposes.\n- For commercial use, please contact Prof. Lianwen Jin (eelwjin@scut.edu.cn).\n- Copyright 2023, [Deep Learning and Vision Computing Lab (DLVC-Lab)](http://www.dlvc-lab.net), South China University of Technology. \n\n## Citation\n```\n@inproceedings{yang2024fontdiffuser,\n  title={FontDiffuser: One-Shot Font Generation via Denoising Diffusion with Multi-Scale Content Aggregation and Style Contrastive Learning},\n  author={Yang, Zhenhua and Peng, Dezhi and Kong, Yuxin and Zhang, Yuyi and Yao, Cong and Jin, Lianwen},\n  booktitle={Proceedings of the AAAI conference on artificial intelligence},\n  year={2024}\n}\n```\n\n## ⭐ Star Rising\n[![Star Rising](https://api.star-history.com/svg?repos=yeungchenwa/FontDiffuser\u0026type=Timeline)](https://star-history.com/#yeungchenwa/FontDiffuser\u0026Timeline)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fyeungchenwa%2FFontDiffuser","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fyeungchenwa%2FFontDiffuser","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fyeungchenwa%2FFontDiffuser/lists"}