{"id":31211518,"url":"https://github.com/cyberagentailab/type-r","last_synced_at":"2025-09-21T05:30:27.643Z","repository":{"id":304102489,"uuid":"998164689","full_name":"CyberAgentAILab/Type-R","owner":"CyberAgentAILab","description":null,"archived":false,"fork":false,"pushed_at":"2025-09-08T07:13:31.000Z","size":5786,"stargazers_count":6,"open_issues_count":1,"forks_count":1,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-09-10T07:42:49.538Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/CyberAgentAILab.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-06-08T02:13:52.000Z","updated_at":"2025-09-08T07:13:34.000Z","dependencies_parsed_at":"2025-07-11T08:53:58.703Z","dependency_job_id":null,"html_url":"https://github.com/CyberAgentAILab/Type-R","commit_stats":null,"previous_names":["cyberagentailab/type-r"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/CyberAgentAILab/Type-R","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/CyberAgentAILab%2FType-R","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/CyberAgentAILab%2FType-R/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/CyberAgentAILab%2FType-R/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/CyberAgentAILab%2FType-R/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/CyberAgentAILab","download_url":"https://codeload.github.com/CyberAgentAILab/Type-R/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/CyberAgentAILab%2FType-R/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":276195627,"owners_count":25601152,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-09-21T02:00:07.055Z","response_time":72,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-09-21T05:30:25.341Z","updated_at":"2025-09-21T05:30:27.617Z","avatar_url":"https://github.com/CyberAgentAILab.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003cdiv align=\"center\"\u003e\n\u003ch1\u003e \u003ca href=\"https://arxiv.org/abs/2411.18159\"\u003eType-R: Automatically Retouching Typos for Text-to-Image Generation\u003c/a\u003e \u003c/h1\u003e\n\n\u003ch4 align=\"center\"\u003e\n    \u003ca href=\"https://scholar.google.co.jp/citations?user=fdXoV1UAAAAJ\"\u003eWataru Shimoda\u003c/a\u003e\u003csup\u003e1\u003c/sup\u003e\u0026emsp;\n    \u003ca href=\"https://naoto0804.github.io/\"\u003eNaoto Inoue\u003c/a\u003e\u003csup\u003e1\u003c/sup\u003e\u0026emsp;\n    \u003ca href=\"https://sites.google.com/view/daichiharaguchi/english\"\u003eDaichi Haraguchi\u003c/a\u003e\u003csup\u003e1\u003c/sup\u003e\u0026emsp;\u003cbr\u003e\n    \u003ca href=\"https://scholar.google.com/citations?user=rSAChi4AAAAJ\"\u003eHayato Mitani\u003c/a\u003e\u003csup\u003e2\u003c/sup\u003e\u0026emsp;\n    \u003ca href=\"https://human.ait.kyushu-u.ac.jp/~uchida/index-e.html\"\u003eSeichi Uchida\u003c/a\u003e\u003csup\u003e2\u003c/sup\u003e\u0026emsp;\n    \u003ca href=\"https://sites.google.com/view/kyamagu\"\u003eKota Yamaguchi\u003c/a\u003e\u003csup\u003e1\u003c/sup\u003e\u0026emsp;\n    \u003cbr\u003e\n    \u003cbr\u003e\n    \u003csup\u003e1\u003c/sup\u003eCyberAgent, \u003csup\u003e2\u003c/sup\u003eKyushu University\n\u003c/h4\u003e\n\n\u003ch4 align=\"center\"\u003e\nAccepted to CVPR 2025 as a highlight paper\n\u003c/h4\u003e\n\n\u003c!-- ![alt text](figs/main_results.png) --\u003e\n\n\u003c/div\u003e\n\n\u003cdiv align=\"center\"\u003e\n\n[![arxiv paper](https://img.shields.io/badge/arxiv-paper-orange)](https://arxiv.org/abs/2411.18159)\n\u003ca href='https://cyberagentailab.github.io/Type-R/'\u003e\u003cimg src='https://img.shields.io/badge/Project-Page-Green'\u003e\u003c/a\u003e\n\n\u003c/div\u003e\n\n\n![teaser](images/teaser.png)\n\nThe repository is the official implementation of the paper entitled Type-R: Automatically Retouching Typos for Text-to-Image Generation.  \n\n\n# Pipeline\nThe implementation of Type-R in this repository consists of a three-step pipeline: \n- Text-to-image generation\n  - Generate images from prompts.\n- Layout correction\n  - Performs layout refinement by detecting errors, erasing text, and regenerating the layout.\n- Typo correction.  \n  - Renders corrected raster text using a text editing model with OCR-based verification\n\nThe pipeline is designed to be plug-and-play, with each module configured using [Hydra](https://github.com/facebookresearch/hydra).  \nAll configuration files are located in [src/type_r_app/config](src/type_r_app/config).\n\n![teaser](images/pipeline.png)\n\n\n# Requirements\n\n## 📘 Environment\nWe check the reproducibility under this environment.\n- Ubuntu 24.04\n- Python 3.12\n- CUDA 12.6\n- [PyTorch](https://pytorch.org/get-started/locally/) 2.7.0\n- [uv](https://docs.astral.sh/uv/) 0.7.6\n\n\n## 📘 Install\nThis project manages Python runtime via [uv](https://docs.astral.sh/uv/).  \nThis project depends on several packages that involve heavy compilation such as [Apex](https://github.com/NVIDIA/apex), [MaskTextSpotterv3](https://github.com/MhLiao/MaskTextSpotterV3), [DeepSolo](https://github.com/ViTAE-Transformer/DeepSolo), and [Detectron2](https://github.com/facebookresearch/detectron2).  \n\nThis project assumes that the environment includes a GPU and CUDA support.\nIf your system does not have CUDA installed, you can install the required CUDA components using the following commands:\n```bash\nwget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2404/x86_64/cuda-keyring_1.1-1_all.deb\nsudo dpkg -i cuda-keyring_1.1-1_all.deb\n```\n\nThen, install the required build tools using the command below:\n```bash\napt-mark unhold $(apt-mark showhold)\napt update\napt -y install \\\n  libfontconfig1 \\\n  libglib2.0-0 \\\n  cuda-nvcc-12-6 \\\n  cuda-profiler-api-12-6 \\\n  libcusparse-dev-12-6 \\\n  libcublas-dev-12-6 \\\n  libcusolver-dev-12-6 \\\n  python3-dev \\\n  libgl1 \n```\nFor more details, see the [Dockerfile](Dockerfile).\n\n\u003e ⚠️ The above command assumes that CUDA 12.6 is already installed.  \n\u003e If you're using a different CUDA version, replacing `12-6` with the appropriate version number should work.  \n\n\nOnce the build dependencies are installed, run the following command:\n```bash\ngit clone --recursive https://github.com/CyberAgentAILab/Type-R\ncd Type-R\n./script/apply_patch.sh\nuv sync --extra full\n```\n\n\u003e ⚠️ uv sync may take up to 30 minutes due to building some dependencies. If it completes instantly, your environment might be misconfigured. In that case, refer to the Dockerfile, or try building within a Docker container. You may omit the `--extra full` option if you do not run the evaluation pipeline to reduce dependencies.\n\u003e ⚠️ This project uses a namespace package, which is currently incompatible with editable installs. Be sure to pass the --no-editable option to uv when syncing dependencies.  \n\nTo reset the applied patch:\n```bash\n./script/clean_patch.sh\n```\n\n\n## 📘 Data resources\n\nWe provide the data resources via [Hugging Face Datasets](https://huggingface.co/cyberagent/type-r).\nYou can download them using the following command:\n\n```bash\nuv run python tools/dl_resources.py\n```\n\nOr, add the `--full` option to download all resources:\n\n```bash\nuv run python tools/dl_resources.py --full\n```\n\nThese resources include pretrained model weights, font files, and the MarioEval benchmark dataset if specified full.\n\n\u003e ⚠️ Some resources with stricter licenses must be downloaded manually.\nPlease refer to the [link](https://huggingface.co/cyberagent/type-r) for details.\n\n## 📘 GPU resources\nType-R requires different machine specs for each step: \n- text-to-image generation\n  - The text-to-image generation step requires a large amount of VRAM—more than an A100 40GB GPU, especially when using Flux.\n  - \u003e ⚠️ The `run_on_low_vram_gpus` option in [src/type_r_app/config/t2i/flux.yaml](src/type_r_app/config/t2i/flux.yaml) allows the model to run on an L4 machine, but inference may take a few minutes.\n- layout correction\n  - Layout correction is relatively lightweight in terms of computational cost compared to the other steps.\n- typo correction.  \n  - Typo correction requires a GPU with L4-level specifications when using AnyText.\n\n\n## 📘 Permissions of text-to-image models\nFlux requires authentication of your Hugging Face profile in order to download model files.\nPlease see their [model card](https://huggingface.co/black-forest-labs/FLUX.1-dev) for more information.\nYou must authenticate your Hugging Face account before running the text-to-image models in the text-to-image generation step by executing:\n```bash\nuv run huggingface-cli login\n```\n\n# Usage\n\n## 📘 Type-R\n### 🔹 Demo\nType-R is designed to be plug-and-play, and module selection is managed via [Hydra](https://github.com/facebookresearch/hydra) configuration.  \nWe provide a convenient script to try Type-R using a sample prompt.\n\nTo run the demo (configured via [src/type_r_app/config/demo.yaml](src/type_r_app/config/demo.yaml)):\n``` bash\nbash script/demo.sh\n```\n- Default output directory: `results/demo`\n- Input prompts are read from [resources/prompt/example.txt](https://huggingface.co/cyberagent/type-r/blob/main/prompt/example.txt)\n- Prompts should be separated by line breaks, with renderable text enclosed in double quotes (\")\n\n\n### 🔹 Mario-Eval Benchmark (Trial version)\n\nA script is also provided for running Type-R on the [Mario-Eval benchmark](https://github.com/microsoft/unilm/tree/master/textdiffuser#chart_with_upwards_trendevaluation) using only components with permissive licenses and no paid APIs.\n```bash\nbash script/marioevalbench_trial.sh  \n```\n\n- Config file: [src/type_r_app/config/marioevalbench_trial.yaml](src/type_r_app/config/marioevalbench_trial.yaml)\n- Output directory: `results/marioevalbench_trial`\n- Prompt data (including GPT-4o augmented versions) is provided in: [resources/data/marioevalbench/hfds](https://huggingface.co/cyberagent/type-r/tree/main/data/marioevalbench/hfds)\n\nThis script is configured to process a subset of 10 images for the ablation study in the MarioEval benchmark.  \nSee [src/type_r_app/config/dataset/marioeval_trial.yaml](src/type_r_app/config/dataset/marioeval_trial.yaml)\n\n\n\n### 🔹 Mario-Eval Benchmark (Best configuration)\nThis configuration achieves the best results reported in the paper. It uses an external model with a non-commercial license and accesses a paid API.\n```bash\nbash script/marioevalbench_best.sh  \n```\n- Config file: [src/type_r_app/config/marioevalbench_best.yaml](src/type_r_app/config/marioevalbench_best.yaml)\n- Output directory: `results/marioevalbench_best`\n\n\u003e ⚠️ Layout correction assumes that the OpenAI API is used. See the usage of the setting from [OpenAI API config](https://github.com/CyberAgentAILab/Type-R-dev#-openai-api-configuration).   \n\u003e To use Azure OpenAI instead, set `use_azure: true` in [src/type_r_app/config/marioevalbench_best.yaml](src/type_r_app/config/marioevalbench_best.yaml):\n\nThis script is configured to process a subset of all 500 images for the ablation study in the MarioEval benchmark.  \nSee [src/type_r_app/config/dataset/marioeval.yaml](src/type_r_app/config/dataset/marioeval.yaml)　　\n\nTo run the test set of the MarioEval benchmark, set `sub_set: test` in [src/type_r_app/config/dataset/marioeval.yaml](src/type_r_app/config/dataset/marioeval.yaml).  \nPlease note that this will process 5,000 images.\n\n\n\n## 📘 Evaluation\nWe provide evaluation scripts in this repository.\nTo run the evaluation scripts on images generated with the best setting:\n\n```bash\nuv run python -m type_r_app --config-name marioevalbench_best command=evaluation\n```\n- You can change the evaluation target by editing the YAML config.\n- By default, evaluation includes: VLM evaluation, OCR accuracy, FID score, and CLIPScore.\n\n\n\n### VLM evaluation options.\n- VLM evaluation requires a paid API.\n- By default, the system evaluates graphic design quality using `rating_design_quality`.\n- To evaluate other criteria, modify the `evaluation` field in [src/type_r_app/config/evaluation.yaml](src/type_r_app/config/evaluation.yaml).\n\n\u003e ⚠️ The VLM evaluation assumes that the OpenAI API is used. See the usage of the setting from [OpenAI API config](https://github.com/CyberAgentAILab/Type-R-dev#-openai-api-configuration).   \n\u003e To use Azure OpenAI instead, set `use_azure: true` in [src/type_r_app/config/evaluation.yaml](src/type_r_app/config/evaluation.yaml):\n\n\n\n## 📘 Prompt augmentation\nWe provide both the data and the code for prompt augmentation. This process requires a paid API.\n```bash\nuv run python -m type_r_app --config-name demo command=prompt-augmentation\n```\n- Input: [resources/prompt/example.txt](https://huggingface.co/cyberagent/type-r/blob/main/prompt/example.txt)\n- Output: `prompt/augmented.txt` under the configured results directory\n- Optionally, HFDS format output is also supported (see [src/type_r_app/launcher/prompt_augmentation.py](src/type_r_app/launcher/prompt_augmentation.py))\n\n\u003e ⚠️ Prompt augmentation assumes that the OpenAI API is used. See the usage of the setting from [OpenAI API config](https://github.com/CyberAgentAILab/Type-R-dev#-openai-api-configuration). \n  \n\u003e To use Azure OpenAI instead, set `use_azure: true` in [src/type_r_app/config/prompt_augmentation.yaml](src/type_r_app/config/prompt_augmentation.yaml):\n\n\n## 📘 OpenAI API configuration\nThis repository manages the configuration of the OpenAI API via environment variables.\nPlease set the following variable:\n- `OPENAI_API_KEY`\n\nTo use the Azure OpenAI API instead, please configure the following environment variables accordingly:\n- `OPENAI_API_VERSION`\n- `AZURE_OPENAI_DEPLOYMENT_NAME`\n- `AZURE_OPENAI_GPT4_DEPLOYMENT_NAME`\n- `AZURE_OPENAI_ENDPOINT`\n- `AZURE_OPENAI_API_KEY`\n\nNote that we only verified the basic functionality of the Azure OpenAI API.\n\n\n## 📘 Result\nWe assume the output directory is as follows:\n\u003cpre\u003e\nresults/\n├── ref_img               # T2I-generated images\n├── layout_corrected_img  # Images with surplus text removed\n├── typo_corrected_img    # Final output\n├── word_mapping          # JSON files with OT-based mapping\n└── evaluation            # Evaluation results\n\u003c/pre\u003e\nTo convert the results into an Excel file for easier viewing:\n```bash\nuv run python tools/result2xlsx.py\n```\n\n## 📘 Test\n\nTo run tests, run the following.\n```bash\nuv run pytest tests --gpufunc\n```\n\n\n# License\n\nThis project is licensed under the Apache License.  \nSee [LICENSE](./LICENSE) for details.\n\n### Third-party licenses\n\nThis project depends on the following third-party libraries/components, each of which has its own license:\n\n#### OCR-related projects\n\n- [Deepsolo](https://github.com/ViTAE-Transformer/DeepSolo) — Licensed under [Adelaidet](https://github.com/ViTAE-Transformer/DeepSolo/blob/main/LICENSE)\n- [MaskTextSpotterV3](https://github.com/MhLiao/MaskTextSpotterV3) — Licensed under [CC BY-NC 4.0](https://github.com/MhLiao/MaskTextSpotterV3/blob/master/LICENSE.md)\n- [Apex](https://github.com/NVIDIA/apex) — Licensed under [BSD 3-Clause](https://github.com/NVIDIA/apex/blob/master/LICENSE)\n- [CRAFT](https://github.com/clovaai/CRAFT-pytorch) — Licensed under [MIT License](https://github.com/clovaai/CRAFT-pytorch/blob/master/LICENSE)\n- [MaskRCNN Benchmark](https://github.com/facebookresearch/maskrcnn-benchmark) — Licensed under [MIT License](https://github.com/facebookresearch/maskrcnn-benchmark/blob/main/LICENSE)\n- [Clova Recognition](https://github.com/clovaai/deep-text-recognition-benchmark) — Licensed under [Apache 2.0](https://github.com/clovaai/deep-text-recognition-benchmark/blob/master/LICENSE.md)\n- [Detectron2](https://github.com/facebookresearch/detectron2) — Licensed under [Apache 2.0](https://github.com/facebookresearch/detectron2/blob/main/LICENSE)\n- [Hi-SAM](https://github.com/ymy-k/Hi-SAM) — Licensed under [Apache 2.0](https://github.com/ymy-k/Hi-SAM/blob/main/LICENSE)\n- [Paddle](https://github.com/PaddlePaddle/PaddleOCR) — Licensed under [Apache 2.0](https://github.com/PaddlePaddle/PaddleOCR/blob/main/LICENSE)\n\n\n#### Text editor\n\n- [AnyText](https://github.com/tyxsspa/AnyText) — Licensed under [Apache 2.0](https://github.com/tyxsspa/AnyText/blob/main/LICENSE)\n- [UDiffText](https://github.com/ZYM-PKU/UDiffText) — Licensed under [MIT License](https://github.com/ZYM-PKU/UDiffText/blob/main/LICENSE)\n\n#### Text remover\n\n- [Lama](https://github.com/advimman/lama) — Licensed under [Apache 2.0](https://github.com/advimman/lama/blob/main/LICENSE)\n- [Garnet](https://github.com/naver/garnet) — Licensed under [Apache 2.0](https://github.com/naver/garnet/blob/master/LICENSE)\n\n\n#### Evaluation metrics\n\n- [CLIP score](https://github.com/jmhessel/clipscore) — Licensed under [MIT License](https://github.com/jmhessel/clipscore/blob/main/LICENSE)\n- [Pytorch FID](https://github.com/mseitzer/pytorch-fid) — Licensed under [Apache 2.0](https://github.com/mseitzer/pytorch-fid/blob/master/LICENSE)\n- [VLMEval](https://github.com/open-compass/VLMEvalKit) — Licensed under [Apache 2.0](https://github.com/open-compass/VLMEvalKit/blob/main/LICENSE)\n\n#### Data\n- [Mario-Eval Benchmark](https://github.com/microsoft/unilm/tree/master/textdiffuser#chart_with_upwards_trendevaluation) — Licensed under [MIT License](https://github.com/microsoft/unilm/blob/master/LICENSE)\n\n\n### No license projects\nOur repository does not contain code from the following repositories due to the absence of a license.  \nPlease gather codes and weights from the following links.\n- [CLIP4str](https://github.com/large-ocr-model/large-ocr-model.github.io) — Licensed under N/A\n- [Mostel](https://github.com/qqqyd/MOSTEL) — Licensed under N/A\n- [TextCtrl](https://github.com/weichaozeng/TextCtrl) — Licensed under N/A\n\n\n\n# Citation\n\nIf you find this code useful for your research, please cite our paper:\n\n```\n@inproceedings{shimoda2025typer,\n  title={{Type-R: Towards Reproducible Automatic Graphic Design Generation}},\n  author={Wataru Shimoda and Naoto Inoue and Daichi Haraguchi and Hayato Mitani and Seiichi Uchida and Kota Yamaguchi},\n  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},\n  year={2025},\n}\n```\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcyberagentailab%2Ftype-r","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcyberagentailab%2Ftype-r","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcyberagentailab%2Ftype-r/lists"}