{"id":14769222,"url":"https://github.com/nahyeonkaty/textboost","last_synced_at":"2026-03-06T10:05:59.651Z","repository":{"id":256800008,"uuid":"856466413","full_name":"nahyeonkaty/textboost","owner":"nahyeonkaty","description":"TextBoost: Towards One-Shot Personalization of Text-to-Image Models via Fine-tuning Text Encoder","archived":false,"fork":false,"pushed_at":"2025-01-24T07:00:38.000Z","size":2895,"stargazers_count":57,"open_issues_count":2,"forks_count":4,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-08-22T02:40:25.597Z","etag":null,"topics":["ai","deep-learning","diffusion","image-generation","pytorch","stable-diffusion","text2image","torch","txt2img"],"latest_commit_sha":null,"homepage":"https://textboost.github.io","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/nahyeonkaty.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-09-12T16:11:27.000Z","updated_at":"2025-06-18T09:12:51.000Z","dependencies_parsed_at":null,"dependency_job_id":"4868d5f5-28f5-4c29-b9e1-d43b587538ae","html_url":"https://github.com/nahyeonkaty/textboost","commit_stats":{"total_commits":14,"total_committers":2,"mean_commits":7.0,"dds":0.4285714285714286,"last_synced_commit":"ae1b97faf8d9e29942e4425bacea69f87920431d"},"previous_names":["nahyeonkaty/textboost"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/nahyeonkaty/textboost","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nahyeonkaty%2Ftextboost","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nahyeonkaty%2Ftextboost/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nahyeonkaty%2Ftextboost/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nahyeonkaty%2Ftextboost/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/nahyeonkaty","download_url":"https://codeload.github.com/nahyeonkaty/textboost/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nahyeonkaty%2Ftextboost/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":30171657,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-03-06T07:56:45.623Z","status":"ssl_error","status_checked_at":"2026-03-06T07:55:55.621Z","response_time":250,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai","deep-learning","diffusion","image-generation","pytorch","stable-diffusion","text2image","torch","txt2img"],"created_at":"2024-09-16T13:00:31.294Z","updated_at":"2026-03-06T10:05:59.631Z","avatar_url":"https://github.com/nahyeonkaty.png","language":"Python","funding_links":[],"categories":["New Concept Learning"],"sub_categories":[],"readme":"# TextBoost: Towards One-Shot Personalization of Text-to-Image Models via Fine-tuning Text Encoder\n\n[![arXiv](https://img.shields.io/badge/arXiv-2409.08248-B31B1B.svg)](https://arxiv.org/abs/2409.08248)\n[![Project page](https://img.shields.io/badge/Project-Page-brightgreen)](https://textboost.github.io)\n\n\u003cdiv style=\"text-align: center;\"\u003e\n  \u003cimg src=\"assets/teaser.jpg\" alt=\"Alt text\"\u003e\n\u003c/div\u003e\n\nAbstract: *Recent breakthroughs in text-to-image models have opened up promising research avenues in personalized image generation, enabling users to create diverse images of a specific subject using natural language prompts. However, existing methods often suffer from performance degradation when given only a single reference image. They tend to overfit the input, producing highly similar outputs regardless of the text prompt. This paper addresses the challenge of one-shot personalization by mitigating overfitting, enabling the creation of controllable images through text prompts. Specifically, we propose a selective fine-tuning strategy that focuses on the text encoder. Furthermore, we introduce three key techniques to enhance personalization performance: (1) augmentation tokens to encourage feature disentanglement and alleviate overfitting, (2) a knowledge-preservation loss to reduce language drift and promote generalizability across diverse prompts, and (3) SNR-weighted sampling for efficient training. Extensive experiments demonstrate that our approach efficiently generates high-quality, diverse images using only a single reference image while significantly reducing memory and storage requirements.*\n\n\n## Installation\n\nOur code has been tested on `python3.10` with `NVIDIA A6000 GPU`. However, it should work with the other recent Python versions and NVIDIA GPUs.\n\n### Installing Python Packages\n\nWe recommend using a Python virtual environment or anaconda for managing dependencies. You can install the required packages using one of the following methods:\n\n#### Using `pip`:\n```sh\npython -m venv .venv\nsource .venv/bin/activate\npip install -r requirements.txt\n```\n\n#### Using `conda`:\n\n```sh\nconda env create -f environment.yml\nconda activate textboost\n```\n\nFor the exact package versions we used, please refer to [requirements.txt](requirements.txt) file.\n\n\n## Training\n\nTo get started, you will need to download the human-written prompts dataset. Follow the instructions from [InstructPix2Pix](https://github.com/timothybrooks/instruct-pix2pix) to download `human-written-prompts.jsonl`, and then place it in the `data` directory.\n\nWe used a single image from each instance of [DreamBooth](https://github.com/google/dreambooth) benchmark.\nYou can find images for each instance in [data/dreambooth_n1.txt](data/dreambooth_n1.txt).\nWe provided a simple [script](split_dreambooth.py) to help automate this.\n\n```sh\ngit clone https://github.com/google/dreambooth\npython split_dreambooth.py --dreambooth-dir dreambooth/dataset\n```\n\nIf not specified, the code will attempt to use a first `n=--num_samples` images in the directory.\n\n**Notice**: Our method was primarily tested using Stable Diffusion v1.5; however, this version is currently unavailable. You can use another version such as v1.4.\n\nTo train the model, you can use the following command:\n\n```sh\naccelerate launch train_textboost.py \\\n--pretrained_model_name_or_path=CompVis/stable-diffusion-v1-4 \\\n--instance_data_dir data/dreambooth_n1_train/dog  \\\n--output_dir=output/tb/dog \\\n--instance_token '\u003cdog\u003e dog' \\\n--class_token 'dog' \\\n--validation_prompt 'a \u003cdog\u003e dog in the jungle' \\\n--validation_steps=50 \\\n--placeholder_token '\u003cdog\u003e' \\\n--initializer_token 'dog' \\\n--learning_rate=5e-5 \\\n--emb_learning_rate=1e-3 \\\n--train_batch_size=8 \\\n--max_train_steps=250 \\\n--checkpointing_steps=50 \\\n--num_samples=1 \\\n--augment=paug \\\n--lora_rank=4 \\\n--augment_inversion\n```\n\nAlternatively, you can also use `torchrun` command. Here's an example:\n\n```sh\nCUDA_VISIBLE_DEVICES=0 torchrun --rdzv-backend=c10d --rdzv-endpoint=localhost:0 --nproc-per-node=1 train_textboost.py \\\n--pretrained_model_name_or_path=CompVis/stable-diffusion-v1-4 \\\n--instance_data_dir data/dreambooth_n1_train/dog  \\\n--output_dir=output/tb/dog \\\n--instance_token '\u003cdog\u003e dog' \\\n--class_token 'dog' \\\n--validation_prompt 'a \u003cdog\u003e dog in the jungle' \\\n--validation_steps=50 \\\n--placeholder_token '\u003cdog\u003e' \\\n--initializer_token 'dog' \\\n--learning_rate=5e-5 \\\n--emb_learning_rate=1e-3 \\\n--train_batch_size=8 \\\n--max_train_steps=250 \\\n--checkpointing_steps=50 \\\n--num_samples=1 \\\n--augment=paug \\\n--lora_rank=4 \\\n--augment_inversion\n```\n\n### Training on All Instances\n\nTo train the model on all DreamBooth instances, run the following command:\n\n```sh\npython run_textboost_db.py\n```\n\n## Inference\n\nAfter training, you can generate images using the following command:\n\n```sh\npython inference.py output/tb/dog --model CompVis/stable-diffusion-v1-4 --prompt \"photo of a \u003cdog\u003e dog\" --output test.jpg\n```\n\n## Evaluation\n\n\nTo evaluate the trained model, ensure that the folder structure follows the format shown below:\n\n```\n.\n├── output\n│   └── tb-sd1.5-n1\n│      ├── backpack\n│      ├── backpack_dog\n│      ...\n│      └── wolf_plushie\n└── ...\n```\n\nOnce the folder structure is correctly set up, run the following command:\n\n```sh\nCUDA_VISIBLE_DEVICES=0 python eval_dreambooth.py output/tb-sd1.5-n1 --token-format '\u003cINSTANCE\u003e SUBJECT'\n```\n\n* Here, `\u003cINSTANCE\u003e` can be replaced with your own modifier token (e.g. `\u003cnew\u003e`).\n\n## Citation\n\n```bibtex\n@article{park2024textboost,\n  title   = {TextBoost: Towards One-Shot Personalization of Text-to-Image Models via Fine-tuning Text Encoder},\n  author  = {Park, NaHyeon and Kim, Kunhee and Shim, Hyunjung},\n  journal = {arXiv preprint},\n  year    = {2024},\n  eprint  = {arXiv:2409.08248}\n}\n```\n\n## License\n\nAll materials in this repository are available under the [MIT License](LICENSE).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnahyeonkaty%2Ftextboost","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fnahyeonkaty%2Ftextboost","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnahyeonkaty%2Ftextboost/lists"}