{"id":19932305,"url":"https://github.com/amazon-science/glass-text-spotting","last_synced_at":"2025-09-03T18:31:43.664Z","repository":{"id":47479778,"uuid":"512396433","full_name":"amazon-science/glass-text-spotting","owner":"amazon-science","description":"Official implementation for \"GLASS: Global to Local Attention for Scene-Text Spotting\" (ECCV'22)","archived":false,"fork":false,"pushed_at":"2024-06-28T07:35:37.000Z","size":2645,"stargazers_count":102,"open_issues_count":15,"forks_count":12,"subscribers_count":5,"default_branch":"main","last_synced_at":"2025-04-04T19:32:36.587Z","etag":null,"topics":["attention","deep-learning","detection","ocr","text-spotting"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/amazon-science.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2022-07-10T09:39:58.000Z","updated_at":"2024-11-02T11:05:55.000Z","dependencies_parsed_at":"2024-12-23T19:11:51.524Z","dependency_job_id":"adf88301-99c4-4848-85eb-3cfd77026607","html_url":"https://github.com/amazon-science/glass-text-spotting","commit_stats":{"total_commits":9,"total_committers":2,"mean_commits":4.5,"dds":"0.11111111111111116","last_synced_commit":"15b3a348fa6d75913e153e774071b114f3d5c9a0"},"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/amazon-science/glass-text-spotting","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/amazon-science%2Fglass-text-spotting","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/amazon-science%2Fglass-text-spotting/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/amazon-science%2Fglass-text-spotting/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/amazon-science%2Fglass-text-spotting/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/amazon-science","download_url":"https://codeload.github.com/amazon-science/glass-text-spotting/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/amazon-science%2Fglass-text-spotting/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":273489741,"owners_count":25115013,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-09-03T02:00:09.631Z","response_time":76,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["attention","deep-learning","detection","ocr","text-spotting"],"created_at":"2024-11-12T23:09:38.178Z","updated_at":"2025-09-03T18:31:43.232Z","avatar_url":"https://github.com/amazon-science.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# GLASS: Global to Local Attention for Scene-Text Spotting\n\n\nThis is a PyTorch implementation of the following paper:\n\n[**GLASS: Global to Local Attention for Scene-Text Spotting**](https://arxiv.org/abs/2208.03364), ECCV 2022.\n\nRoi Ronen*, Shahar Tsiper*, Oron Anschel, Inbal Lavi, Amir Markovitz and R. Manmatha.\n\n[Paper](https://arxiv.org/pdf/2208.03364) \n| [Pretrained Models](#Models-and-Configs)\n|  [Citation](#citation) | [Demo](#demo)\n\n![Intro Figure](readme/architecture.png)\n\n**Abstract:**\u003cbr\u003e\nIn recent years, the dominant paradigm for text spotting is to combine the tasks of text detection and recognition into a single end-to-end framework. \nUnder this paradigm, both tasks are accomplished by operating over a shared global feature map extracted from the input image.\nAmong the main challenges end-to-end approaches face is the performance degradation when recognizing text across scale variations (smaller or larger text), and arbitrary word rotation angles.\nIn this work, we address these challenges by proposing a novel global-to-local attention mechanism for text spotting, termed GLASS, that fuses together global and local features.\nThe global features are extracted from the shared backbone, preserving contextual information from the entire image, while the local features are computed individually on resized, high resolution rotated word crops. \nThe information extracted from the local crops alleviates much of the inherent difficulties with scale and word rotation.\nWe show a performance analysis across scales and angles, highlighting improvement over scale and angle extremities.\nIn addition, we introduce a periodic, orientation-aware loss term supervising the detection task, and show its contribution on both detection and recognition performance across all angles.\nFinally, we show that GLASS is agnostic to architecture choice, and apply it to other leading text spotting algorithms, improving their text spotting performance.\nOur method achieves state-of-the-art results on multiple benchmarks, including the newly released TextOCR.\n\n\n**Result on Total-Text test dataset:**\n\n![Results Figure](readme/results.png)\n\n\n\n## Installation\nCompilation of this package requires Detectron2==0.6 package.\nInstallation has been tested on Linux using using anaconda package management.\n\nClone the repository into your local machine\n```bash\ngit clone https://github.com/amazon-research/glass-text-spotting\ncd glass\n```\n\nStart a clean virtual environment and setup enviroment variables\n```bash\nconda create -n glass python=3.8\nconda activate glass\n```\n\nInstall required packages\n```bash\npip install -e .\n```\n\n## Running Inference\n\n### Demo\n\nA Lab Collab demo is available for running inference:\n\n\u003ch4 id=\"demo\"\u003e \n    \u003ca href=\"https://colab.research.google.com/github/amazon-science/glass-text-spotting/blob/master/demo/glass_demo.ipynb\" target=\"_parent\"\u003e\n    Glass Demo \u003cimg src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/\u003e\n    \u003c/a\u003e \n\u003c/h4\u003e\n\n### Models and Configs\n\nYou can check out all of our fine-tuned models and configs here:\n\n1. Pretrained + fine-tuned on ICDAR'15: [IC'15 Model](https://glass-text-spotting.s3.eu-west-1.amazonaws.com/models/glass_250k_icdar15_fintune.pth), [IC'15 config](https://glass-text-spotting.s3.eu-west-1.amazonaws.com/configs/glass_config_icdar15.yaml)\n2. Pretrained + fine-tuned on TotalText: [TotalText Model](https://glass-text-spotting.s3.eu-west-1.amazonaws.com/models/glass_250k_totaltext_finetune.pth), [TotalText config](https://glass-text-spotting.s3.eu-west-1.amazonaws.com/configs/glass_config_totaltext.yaml)\n3. Pretrained + fine-tuned on all datasets, inc. TextOCR: [TextOCR Model](https://glass-text-spotting.s3.eu-west-1.amazonaws.com/models/glass_250k_full_textocr_finetune.pth), [TextOCR config](https://glass-text-spotting.s3.eu-west-1.amazonaws.com/configs/glass_config_textocr.yaml)\n\nAll of these models can be run together with the default pre-training config, or invoked using the demo above.\n\n## Training\n\nPretraining on SynthText dataset\n```bash\npython ./tools/train_glass.py  \\\n    # The dataset configuration\n  --datasets ./data_configs/data_config_pretrain.yaml \\\n  # The architecture config\n  --config ./configs/glass_pretrain \\\n  # The output path of the train artifacts\n  --output \u003coutput_path\u003e              \n```\n\nFine-tuning the model\n```\npython ./tools/train_glass.py  \\\n   --datasets \u003cpath_to_dataset_config\u003e \\\n   --resume \u003cpretrained_weights_path\u003e \\\n   --output \u003coutput_path\u003e\n```\n\n## Data\n\n### Data Preparation\n\nSee [DATA.md](DATA.md) for instructions on data prepation and ingestion.\n\n\n### Datasets\n\nThe model in this work were trained the following datasets:\n\n1. [SynthText](https://www.robots.ox.ac.uk/~vgg/data/scenetext/)\n2. [ICDAR2013](https://rrc.cvc.uab.es/)\n3. [ICDAR2015](https://rrc.cvc.uab.es/)\n4. [Total-Text](https://www.robots.ox.ac.uk/~vgg/data/scenetext/)\n5. [TextOCR](https://textvqa.org/textocr/dataset/)\n\n\n## Updates\n\n* December 12th, 2022 - Added Lab Collab demo, numerous bug fixes and improvments, and code cleanup\n* November 14th, 2022 - Evalutaion code and updated post-processing code is included\n* September 20th, 2022 - Including the models, additional components for rotated box training and bug fixes\n* July 10th, 2022 - Initial commit, getting things ready towards ECCV'22, main training code and architecture are included\n\n## Citation\nPlease consider citing our work if you find it useful for your research.\n\n```bibtex\n@article{ronen2022glass,\n  title={GLASS: Global to Local Attention for Scene-Text Spotting},\n  author={Ronen, Roi and Tsiper, Shahar and Anschel, Oron and Lavi, Inbal and Markovitz, Amir and Manmatha, R},\n  journal={arXiv preprint arXiv:2208.03364},\n  year={2022}\n}\n```\n\n## Contribution and License\n\n\nSee [CONTRIBUTING](CONTRIBUTING.md) for more information on contributions.\n\nThis project is licensed under the Apache-2.0 License.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Famazon-science%2Fglass-text-spotting","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Famazon-science%2Fglass-text-spotting","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Famazon-science%2Fglass-text-spotting/lists"}