{"id":20464553,"url":"https://github.com/bytedance/sptsv2","last_synced_at":"2025-08-18T16:03:27.919Z","repository":{"id":170927370,"uuid":"647179012","full_name":"bytedance/SPTSv2","owner":"bytedance","description":"The official implementation of SPTS v2: Single-Point Text Spotting","archived":true,"fork":false,"pushed_at":"2023-06-29T07:40:48.000Z","size":691,"stargazers_count":136,"open_issues_count":15,"forks_count":18,"subscribers_count":5,"default_branch":"main","last_synced_at":"2025-08-18T16:02:13.840Z","etag":null,"topics":["artificial-intelligence","computer-vision","deep-learning","ocr","research"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/bytedance.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2023-05-30T08:21:05.000Z","updated_at":"2025-08-04T19:00:05.000Z","dependencies_parsed_at":null,"dependency_job_id":"26fd4fa5-8efc-4b21-ade3-5d884c9dd5ae","html_url":"https://github.com/bytedance/SPTSv2","commit_stats":null,"previous_names":["bytedance/sptsv2"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/bytedance/SPTSv2","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bytedance%2FSPTSv2","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bytedance%2FSPTSv2/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bytedance%2FSPTSv2/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bytedance%2FSPTSv2/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/bytedance","download_url":"https://codeload.github.com/bytedance/SPTSv2/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bytedance%2FSPTSv2/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":271019327,"owners_count":24685677,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-08-18T02:00:08.743Z","response_time":89,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["artificial-intelligence","computer-vision","deep-learning","ocr","research"],"created_at":"2024-11-15T13:15:36.912Z","updated_at":"2025-08-18T16:03:27.872Z","avatar_url":"https://github.com/bytedance.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\n# SPTS v2: Single-Point Scene Text Spotting\n\nThe official implementation of [SPTS v2: Single-Point Text Spotting](https://arxiv.org/pdf/2301.01635.pdf). The SPTSv2 which achieves 19× faster inference speed tackles scene text spotting as an end-to-end sequence prediction task and requires only extremely low-cost single-point annotations. Below is the overall architecture of SPTSv2.  \n\n![Image text](IMG/pipeline.png)\n\n## Environment\nWe recommend using [Anaconda](https://www.anaconda.com/) to manage environments. Run the following commands to install dependencies.\n```\nconda create -n sptsv2 python=3.7 -y\nconda activate sptsv2\nconda install pytorch==1.8.1 torchvision==0.9.1 torchaudio==0.8.1 -c pytorch\ngit clone git@github.com:bytedance/SPTSv2.git\ncd SPTSv2\npip install -r requirements.txt\n```\n\n## Dataset \n\n- CurvedSynText150k [[paper]](https://openaccess.thecvf.com/content_CVPR_2020/papers/Liu_ABCNet_Real-Time_Scene_Text_Spotting_With_Adaptive_Bezier-Curve_Network_CVPR_2020_paper.pdf): \n  - Part1 (94,723) Download (15.8G) ([Origin](https://universityofadelaide.box.com/s/xyqgqx058jlxiymiorw8fsfmxzf1n03p), [Google](https://drive.google.com/file/d/1OSJ-zId2h3t_-I7g_wUkrK-VqQy153Kj/view?usp=sharing), [BaiduNetDisk](https://pan.baidu.com/s/1Y5pqVqfjcc4FKxW4y8R5jw) password: 4k3x) \n  - Part2 (54,327) Download (9.7G) ([Origin](https://universityofadelaide.box.com/s/e0owoic8xacralf4j5slpgu50xfjoirs), [Google](https://drive.google.com/file/d/1EzkcOlIgEp5wmEubvHb7-J5EImHExYgY/view?usp=sharing), [BaiduNetDisk](https://pan.baidu.com/s/1gRv-IjqAUu6qnXN5BXlOzQ) password: a5f5)\n\n- Totaltext [[paper]](https://ieeexplore.ieee.org/abstract/document/8270088/) [[source]](https://github.com/cs-chan/Total-Text-Dataset). \n  - Download (0.4G) ([Google](https://drive.google.com/file/d/1jfBYrAmh6Zshb7Jc0bctRjQKpK839SFq/view?usp=sharing), [BaiduNetDisk](https://pan.baidu.com/s/18brRQAwnqGd4A_uwPRYRng) password: 5nhw) \n  \n- SCUT-CTW1500 [[paper]](https://www.sciencedirect.com/science/article/pii/S0031320319300664) [[source]](https://github.com/Yuliang-Liu/Curve-Text-Detector).\n  - Download (0.8G) ([Google](https://drive.google.com/file/d/1yjpsNmcjNHBPAeFNvSpYJOQPb1gRkV0K/view?usp=sharing), [BaiduNetDisk](https://pan.baidu.com/s/193y6N_Ek1184PZ7PbEljmA) password: 82vs)\n   \n- MLT [[paper]](https://ieeexplore.ieee.org/abstract/document/8270168).\n  - Download (6.8G) ([Origin](https://universityofadelaide.box.com/s/qu2wctdcsxh73bb94krdredpmx9nzf8m), [Google](https://drive.google.com/file/d/1nE2d_sIfcAejgVIv6-UjGNcBXgxc4QfD/view?usp=sharing), [BaiduNetDisk](https://pan.baidu.com/s/1rjqmb3uuki_Ppcxq-tl7oQ) password: zqrm)\n\n- ICDAR2013 [[paper]](https://rrc.cvc.uab.es/?ch=2) [[source]](https://rrc.cvc.uab.es/?ch=2). \n  - Download (0.2G) ([Google](https://drive.google.com/file/d/1dMffINYhIRa9UD_3pzTFllVwL6PK7KXD/view?usp=sharing), [BaiduNetDisk](https://pan.baidu.com/s/1PiSZxZlG38qjj7Xb05cXdg) password: 5ddh) \n \n- ICDAR2015 [[paper]](https://rrc.cvc.uab.es/?ch=4) [[source]](https://rrc.cvc.uab.es/?ch=4). \n  - Download (0.1G) ([Google](https://drive.google.com/file/d/1THhzo_WH1RY5DlGdBfjRA_dwu9tAmQUE/view?usp=sharing), [BaiduNetDisk](https://pan.baidu.com/s/1x3EpYLRa4EtSMNg5JqszVg) password: wjrh) \n\n- Inverse-Text (images): [OneDrive](https://1drv.ms/u/s!AimBgYV7JjTlgccVhlbD4I3z5QfmsQ?e=myu7Ue), [BaiduNetdisk](https://pan.baidu.com/s/1A0JaNameuM0GZxch8wdm6g)(6a2n). \n\nPlease download and extract the above datasets into the `data` folder following the file structure below.\n\n```\ndata\n├─CTW1500\n│  ├─annotations\n│  │      test_ctw1500_maxlen25.json\n│  │      train_ctw1500_maxlen25_v2.json\n│  ├─ctwtest_text_image\n│  └─ctwtrain_text_image\n├─icdar2013\n│  │  ic13_test.json\n│  │  ic13_train.json\n│  ├─test_images\n│  └─train_images\n├─icdar2015\n│  │  ic15_test.json\n│  │  ic15_train.json\n│  ├─test_images\n│  └─train_images\n|- inversetext\n|  |- test_images\n|  └─ test_poly.json\n├─mlt2017\n│  │  train.json\n│  └─MLT_train_images\n├─syntext1\n│  │  train.json\n│  └─syntext_word_eng\n├─syntext2\n│  │  train.json\n│  └─emcs_imgs\n└─totaltext\n    │  test.json\n    │  train.json\n    ├─test_images\n    └─train_images\n```\n\n## Train and finetune\n\nThe model training in the original paper uses 16 GPUs (2 nodes, 8 A100 GPUs per node). Below are the instructions for the training using a single machine with 8 GPUs, which can be simply modified to multi-node training following [PyTorch Distributed Docs](https://pytorch.org/docs/1.8.0/distributed.html).\n\nYou can download our pretrained weight from [Google Drive](https://drive.google.com/file/d/1tzaq8XCR72FzPMzPiY-ooOfubqzbxtD7/view?usp=share_link) or [BaiduNetDisk](https://pan.baidu.com/s/1v0WreR5yZtKa_XHMjX_3wQ?pwd=3pcu), password: 3pcu, or pretrain the model from scratch using the `run.sh` file. If finetuning, just set `--resume` and `--finetune` in `run.sh`.\n\n## Inference and visualization\nThe trained models can be obtained after finishing the above steps. You can also download the models for the Total-Text, SCUT-CTW1500, ICDAR2013, ICDAR2015 and inversetext datasets from [GoogleDrive](https://drive.google.com/drive/folders/18sTx9hPBXZuD1_pURLZiYxa4xMLOK193?usp=share_link) or [BaiduNetDisk](https://pan.baidu.com/s/1c0-4QYAWD8huKBrL_Yp6VQ?pwd=2k2m) password: 2k2m. Then you can use `test.sh` or `predict.py` to output results and visualization.\n\n![Image text](IMG/test_0000095.jpg)\n## Evaluation\n\nFirst, download the ground-truth files ([GoogleDrive](https://drive.google.com/file/d/1ztyjczfn3YdBf6hpLuV2Vs2UJPlRdAjm/view?usp=sharing), [BaiduNetDisk](https://pan.baidu.com/s/1ERkKR8L58ZVlB12SpCwEVQ) password: 35tr) and lexicons ([GoogleDrive](https://drive.google.com/file/d/1JxmuDsOZ-x_WO5lck2ZQZHRcjoUtUiLo/view?usp=sharing), [BaiduNetDisk](https://pan.baidu.com/s/1so_s94_XysLjlcWasos8mA) password: 9eml), and extracted them into the `evaluation` folder.\n\n```\nevaluation\n│  eval.py\n├─gt\n│  ├─gt_ctw1500\n│  ├─gt_ic13\n│  ├─gt_ic15\n│  └─gt_totaltext\n└─lexicons\n    ├─ctw1500\n    ├─ic13\n    ├─ic15\n    └─totaltext\n``` \nWe provide two evaluation scripts, including `eval_ic15.py` for evaluating icdar2015 dataset, and `eval.py` for other benchmarks. The command for evaluating the inference result of Total-Text is:\n```\npython evaluation/eval.py \\\n       --result_path ./output/totaltext_val.json \\\n       # --with_lexicon \\ # uncomment this line if you want to evaluate with lexicons.\n       # --lexicon_type 0 # used for ICDAR2013 and ICDAR2015. 0: Generic; 1: Weak; 2: Strong.\n```\n\n## Performance\n\nThe end-to-end recognition performances of SPTSv2 on five public benchmarks are:\n\n| Dataset | Strong | Weak | Generic |\n| ------- | ------ | ---- | ------- |\n| ICDAR 2013 | 93.9 | 91.8 | 88.6 |\n| ICDAR 2015 | 82.3 | 77.7 | 72.6 |\n\n| Dataset | None | Full |\n| ------- | ---- | ---- |\n| Total-Text | 75.5 | 84.0 |\n| inversetext | 63.5 | 74.9 |\n| SCUT-CTW1500 | 63.6 | 84.3 |\n\n## Citation\n```\n@inproceedings{peng2022spts,\n  title={SPTS: Single-Point Text Spotting},\n  author={Peng, Dezhi and Wang, Xinyu and Liu, Yuliang and Zhang, Jiaxin and Huang, Mingxin and Lai, Songxuan and Zhu, Shenggao and Li, Jing and Lin, Dahua and Shen, Chunhua and Bai, Xiang and Jin, Lianwen},\n  booktitle={Proceedings of the 30th ACM International Conference on Multimedia},\n  year={2022}\n}\n\n@article{liu2023spts,\n  title={SPTS v2: Single-Point Scene Text Spotting},\n  author={Liu, Yuliang and Zhang, Jiaxin and Peng, Dezhi and Huang, Mingxin and Wang, Xinyu and Tang, Jingqun and Huang, Can and Lin, Dahua and Shen, Chunhua and Bai, Xiang and Jin, Lianwen},\n  journal={arXiv preprint arXiv:2301.01635},\n  year={2023}\n}\n```\n\n## Copyright\nThis repository can only be used for non-commercial research purpose.\n\nFor commercial use, please contact Jiaxin Zhang (zhangjiaxin.zjx1995@bytedance.com).\n\n## Acknowledgement\nWe sincerely thank [Stable-Pix2Seq](https://github.com/gaopengcuhk/Stable-Pix2Seq), [Pix2Seq](https://github.com/google-research/pix2seq), [DETR](https://github.com/facebookresearch/detr), [Swin-Transformer](https://github.com/microsoft/Swin-Transformer), [SPTS](https://github.com/shannanyinxiang/SPTS) and [ABCNet](https://github.com/aim-uofa/AdelaiDet) for their excellent works.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbytedance%2Fsptsv2","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fbytedance%2Fsptsv2","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbytedance%2Fsptsv2/lists"}