{"id":11439421,"url":"https://github.com/Litalby1/make-it-count","last_synced_at":"2025-09-30T03:30:24.237Z","repository":{"id":245221599,"uuid":"815928605","full_name":"Litalby1/make-it-count","owner":"Litalby1","description":"Official implemention of \"Make It Count: Text-to-Image Generation with an Accurate Number of Objects\"","archived":false,"fork":false,"pushed_at":"2024-06-16T15:04:51.000Z","size":226,"stargazers_count":55,"open_issues_count":1,"forks_count":5,"subscribers_count":3,"default_branch":"main","last_synced_at":"2024-09-27T17:31:24.545Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Litalby1.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-06-16T15:00:58.000Z","updated_at":"2024-09-26T13:30:25.000Z","dependencies_parsed_at":"2024-06-20T16:32:19.116Z","dependency_job_id":"fa830a21-f590-4fe9-9b44-68fa84c45c44","html_url":"https://github.com/Litalby1/make-it-count","commit_stats":null,"previous_names":["litalby1/make-it-count"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Litalby1%2Fmake-it-count","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Litalby1%2Fmake-it-count/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Litalby1%2Fmake-it-count/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Litalby1%2Fmake-it-count/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Litalby1","download_url":"https://codeload.github.com/Litalby1/make-it-count/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":234695459,"owners_count":18872984,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-06-20T16:04:47.191Z","updated_at":"2025-09-30T03:30:24.231Z","avatar_url":"https://github.com/Litalby1.png","language":"Python","funding_links":[],"categories":["T2I Diffusion Model augmentation"],"sub_categories":[],"readme":"\n# Make It Count: Text-to-Image Generation with an Accurate Number of Objects (CVPR 2025)\n\u003e **Lital Binyamin, Yoad Tewel, Hilit Segev, Eran Hirsch, Royi Rassin, Gal Chechik**\n\u003e \n\u003e Despite the unprecedented success of text-to-image diffusion models, controlling the number of depicted objects using text is surprisingly hard. This is important for various applications from technical documents, to children’s books to illustrating cooking recipes. Generating object-correct counts is fundamentally challenging because the generative model needs to keep a sense of separate identity for every instance of the object, even if several objects look identical or overlap, and then carry out a global computation implicitly during generation. It is still unknown if such representations exist. To address count-correct generation, we first identify\nfeatures within the diffusion model that can carry the object identity information.\nWe then use them to separate and count instances of objects during the denoising\nprocess and detect over-generation and under-generation. We fix the latter by\ntraining a model that proposes the right location for missing objects, based on\nthe layout of existing ones, and show how it can be used to guide denoising\nwith correct object count. Our approach, CountGen, does not depend on external\nsource to determine object layout, but rather uses the prior from the diffusion\nmodel itself, creating prompt-dependent and seed-dependent layouts. Evaluated on\ntwo benchmark datasets, we find that CountGen strongly outperforms the count accuracy of existing baselines.\n\n\u003ca href=\"https://make-it-count-paper.github.io/\"\u003e\u003cimg src=\"https://img.shields.io/static/v1?label=Project\u0026message=Website\u0026color=red\" height=20.5\u003e\u003c/a\u003e \n\u003ca href=\"https://arxiv.org/abs/2406.10210\"\u003e\u003cimg src=\"https://img.shields.io/badge/arXiv-2406.10210-b31b1b.svg\" height=20.5\u003e\u003c/a\u003e\n\n\u003cp align=\"center\"\u003e\n\u003cimg src=\"figures/teaser.jpg\" width=\"800px\"/\u003e\n\u003c/p\u003e\n\n## Description  \nOfficial implementation of our \"Make It Count: Text-to-Image Generation with an Accurate Number of Objects\" paper.\n\n\n## Setup\nClone the repository and navigate into the directory:\n```\ngit clone https://github.com/Litalby1/make-it-count.git\ncd make-it-count\n```\n\n## Environment\nInstall necessary packages:\n```\npip install -r requirements.txt\npython -m spacy download en_core_web_trf\n```\n\n## Relayout Checkpoints \u0026 Datasets\nFor inference, download the checkpoints from [here](https://drive.google.com/file/d/1xyfkwmX9plMB5-c0VDwl7WiPuQ2qt5yb/view?usp=drive_link).\nAdditionally, the Relayout training dataset can be accessed [here](https://drive.google.com/drive/folders/1BoXN2KQZQZ7fCeD3aTMCrZByiVPzeyIM?usp=drive_link).\n\nAfter downloading the weights, place them in `pipeline/mask_extraction/relayout_weights/relayout_checkpoint.pth`.\n#### Datasets Creation:\n1. CoCoCount Dataset:\n   to generate the CoCoCount dataset, use this following command:\n\n```\npython dataset/create_data_CoCoCount.py \\\n--output_directory \u003coutput_path_for_json\u003e \\\n--N_samples \u003cnumber_of_data_samples\u003e\n```\nYou can also adjust the proportion of samples that contain no scenes by using the option: --no_scene_percent \u003cp_between_0-1\u003e.\n\n2. Compbench Dataset:\n   first, download the necessary CSV file from the provided link [here](https://drive.google.com/file/d/1Lya24Qc1D36wlcXeUZHEi5TwItQg2Hen/view?usp=drive_link).\n   After downloading the CSV, use the command below to generate the dataset:\n```\npython dataset/create_data_compbench.py \\\n--output_directory \u003coutput_path_for_json\u003e \\\n--compbench_csv \u003cpath_to_compbench_csv\u003e\n```\nReplace \u003coutput_path_for_json\u003e with the path where you want to save the JSON files and \u003cpath_to_compbench_csv\u003e with the path to the downloaded CSV file.\n\n## Run CountGen 🌠\n\n1. Begin by downloading the necessary relayout checkpoints. Once downloaded, place them in `pipeline/mask_extraction/relayout_weights/relayout_checkpoint.pth`. Additionally, define the output_path in the same configuration file to designate where the results should be saved.\n3. To run CountGen, use the following command template:\n```\npython pipeline/run_countgen.py \\\n--prompt \u003cyour_prompt\u003e \\\n--seed \u003cseed\u003e \\\n--config \u003coptional:config_path\u003e \\\n--dataset_file \u003coptional:path_to_dataset\u003e \n```\n\nSingle Prompt Example:\n```\npython pipeline/run_countgen.py \\\n--prompt \"A photo of six kittens sitting on a branch\" \\\n--seed 1\n```\n\nDataset Run Example:\n```\npython pipeline/run_countgen.py \\\n--prompt \"A photo of six kittens sitting on a branch\" \\\n--seed 1 \\\n--dataset_file \"dataset/CoCoCount.json\"\n```\n\nModify the settings in pipeline/pipeline_config.yaml as needed for custom configurations.\n\n### Run Using the Notebook\nRun the notebook at `pipeline/countgen.ipynb`\n\n\n## Train Relayout 📉\nFirst, install necessary packages:\n```\npip install -r train_relayout/relayout_requirements.txt\n```\n\n\n#### 1️⃣ Data Creation:\nGenerate training datasets for ReLayout:\n```\npython train_relayout/data_creation/generate_counting_unet_data.py \\\n--output_dir \"path/to/your/output_directory\"\n```\nThis will save the generated data to the specified output directory. \n\n#### 2️⃣ Data Preperation:\nTo prepare data, run the following matching algorithm:\n\n```\npython train_relayout/data_creation/matching_algorithm.py\n--data_dir \u003cpath_for_dataset_created_from_previous_code\u003e\n--output_dir \u003csave_training_data_results_after_matching\u003e\n```\nEnsure to apply this algorithm to both the training and test sets.\n\n#### 3️⃣ Training:\nConfigure the paths in train_relayout/unet_config.yaml:\n1. train_data_dir: [\"\u003cpath_to_train_data_dir_after_matching_algorithm\u003e\"]\n2. test_data_dir: [\"\u003cpath_to_test_data_dir_after_matching_algorithm\u003e\"]\n\nYou may run multiple files, for example: [\"\u003cpath_to_train_dir_1\u003e, \"\u003cpath_to_train_dir_2\u003e\"]\nAdjust other settings and hyperparameters in train_relayout/unet_config.yaml as needed. Start the training process:\n\n```\npython train_relayout/train_unet.py \n```\n\n## Evaluation\n\nFor evaluation purposes an evaluation script using YoloV9 is provided. Given a directory of images, the script predicts the number of objects in each image in the directory and compares it to an expected count of objects.\n\nTo run the evaluation script, run the following:\n\n1. Download yolov9e.pt (model weights):\n```\nwget https://github.com/ultralytics/assets/releases/download/v8.2.0/yolov9e.pt\n```\n\n2. Then, install evaluation requirements:\n```\npip install -r evaluation_requirements.py\n```\n\n3. Finally, run the script.\n```\npython evaluation_script.py --images_dir examples --output evaluation_output\n```\n\n* `images_dir` : a path to a directory with images, in the format described below. For an example directory, see `examples/`.\n* `output_dir` : a path to an output directory (automatically created if does not exist). The output of the script is a `results.csv` file with the predcitions, as well as annotated images with labeled bounding boxes.\n \nIn the directory with images, each image should have the following name format: `{expected_count}__{class_name}__{...}.png`, such as `4__donut__A_photo_of_four_donuts_on_the_road.png`.\n* `{expected_count}` : an integer, representing the number of expected objects (e.g., 4). \n* `{class_name}` : a string, representing the class that is to be generated (e.g., donut).\n* `{...}` : this is optional and can contain whatever, such as the prompt (e.g., A_photo_of_four_donuts_on_the_road).\n\n## Citation\n\nIf you use this code for your research, please cite our paper:\n```\n@article{binyamin2024count,\n    title={Make It Count: Text-to-Image Generation with an Accurate Number of Objects},\n    author={Binyamin, Lital and Tewel, Yoad and Segev, Hilit and Hirsch, Eran and Rassin, Royi and Chechik, Gal},\n    journal={arXiv preprint arXiv:2406.10210},\n    year={2024}\n  }\n```","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FLitalby1%2Fmake-it-count","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FLitalby1%2Fmake-it-count","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FLitalby1%2Fmake-it-count/lists"}