{"id":13543005,"url":"https://github.com/Media-Smart/vedastr","last_synced_at":"2025-04-02T12:31:00.017Z","repository":{"id":41151531,"uuid":"242113168","full_name":"Media-Smart/vedastr","owner":"Media-Smart","description":"A scene text recognition toolbox based on PyTorch","archived":false,"fork":false,"pushed_at":"2021-09-07T08:52:20.000Z","size":381,"stargazers_count":535,"open_issues_count":23,"forks_count":100,"subscribers_count":17,"default_branch":"master","last_synced_at":"2024-11-03T09:33:37.678Z","etag":null,"topics":["ocr","ocr-recognition","pytorch","scene-text-recognition","text-recognition","transformer"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Media-Smart.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2020-02-21T10:27:40.000Z","updated_at":"2024-10-07T09:04:07.000Z","dependencies_parsed_at":"2022-07-10T15:32:20.897Z","dependency_job_id":null,"html_url":"https://github.com/Media-Smart/vedastr","commit_stats":null,"previous_names":[],"tags_count":5,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Media-Smart%2Fvedastr","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Media-Smart%2Fvedastr/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Media-Smart%2Fvedastr/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Media-Smart%2Fvedastr/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Media-Smart","download_url":"https://codeload.github.com/Media-Smart/vedastr/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":246815362,"owners_count":20838430,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ocr","ocr-recognition","pytorch","scene-text-recognition","text-recognition","transformer"],"created_at":"2024-08-01T11:00:21.176Z","updated_at":"2025-04-02T12:30:55.008Z","avatar_url":"https://github.com/Media-Smart.png","language":"Python","funding_links":[],"categories":["Text detection and localization"],"sub_categories":["Form Segmentation"],"readme":"## Introduction\nvedastr is an open source scene text recognition toolbox based on PyTorch. It is designed to be flexible\nin order to support rapid implementation and evaluation for scene text recognition task.  \n\n## Features\n- **Modular design**\\\n  We decompose the scene text recognition framework into different components and one can \n  easily construct a customized scene text recognition framework by combining different modules.\n  \n- **Flexibility**\\\n  vedastr is flexible enough to be able to easily change the components within a module.\n\n- **Module expansibility**\\\n  It is easy to integrate a new module into the vedastr project. \n\n- **Support of multiple frameworks**\\\n  The toolbox supports several popular scene text recognition framework, e.g., [CRNN](https://arxiv.org/abs/1507.05717),\n   [TPS-ResNet-BiLSTM-Attention](https://github.com/clovaai/deep-text-recognition-benchmark), Transformer, etc.\n\n- **Good performance**\\\n  We re-implement the best model in  [deep-text-recognition-benchmark](https://github.com/clovaai/deep-text-recognition-benchmark)\n  and get better average accuracy. What's more, we implement a simple baseline(ResNet-FC)\n   and the performance is acceptable.\n  \n\n## License\nThis project is released under [Apache 2.0 license](https://github.com/Media-Smart/vedastr/blob/master/LICENSE).\n\n## Benchmark and model zoo\nNote: \n- We use [MJSynth(MJ)](http://www.robots.ox.ac.uk/~vgg/data/text/) and\n [SynthText(ST)](http://www.robots.ox.ac.uk/~vgg/data/scenetext/) as training data,  and test the models on \n [IIIT5K_3000](http://cvit.iiit.ac.in/research/projects/cvit-projects/the-iiit-5k-word-dataset),\n [SVT](http://vision.ucsd.edu/~kai/svt/),\n  [IC03_867](http://www.iapr-tc11.org/mediawiki/index.php?title=ICDAR_2003_Robust_Reading_Competitions), \n  [IC13_1015](http://dagdata.cvc.uab.es/icdar2013competition/?ch=2\u0026com=downloads),\n[IC15_2077](https://rrc.cvc.uab.es/?ch=4\u0026com=downloads), SVTP,\n[CUTE80](http://cs-chan.com/downloads_CUTE80_dataset.html). You can find the \n datasets [below](https://github.com/Media-Smart/vedastr/tree/opencv-version#prepare-data).\n  \n| MODEL|CASE SENSITIVE| IIIT5k_3000|\tSVT\t|IC03_867|\tIC13_1015|\t IC15_2077|\tSVTP|\tCUTE80| AVERAGE|\n|:----:|:----:| :----: | :----: |:----: |:----: |:----: |:----: |:----: | :----:|\n|[ResNet-CTC](https://drive.google.com/file/d/1gtTcc5kpVs_s5a6OR7eBh431Otk_-NrE/view?usp=sharing)| False|87.97 | 84.54 | 90.54 | 88.28 |67.99|72.71|77.08|81.58|\n|[ResNet-FC](https://drive.google.com/file/d/1OnUGdv9RFhFbQGXUUkWMcxUZg0mPV0kK/view?usp=sharing)  | False|88.80  | 88.41 | 92.85| 90.34|72.32|79.38|76.74|84.24|\n|[TPS-ResNet-BiLSTM-Attention](https://drive.google.com/file/d/1YUOAU7xcrrsAtEqEGtI5ZD7eryP7Zr04/view?usp=sharing)| False|90.93 | 88.72 | 93.89| 92.12|76.41|80.31|79.51|86.49|\n|[Small-SATRN](https://drive.google.com/file/d/1bcKtEcYGIOehgPfGi_TqPkvrm6rjOUKR/view?usp=sharing)| False|91.97 | 88.10 | 94.81 | 93.50|75.64|83.88|80.90|87.19|\n\nTPS : [Spatial transformer network](https://arxiv.org/abs/1603.03915)\n\nSmall-SATRN: [On Recognizing Texts of Arbitrary Shapes with 2D Self-Attention](https://arxiv.org/abs/1910.04396), \ntraining phase is case sensitive while testing phase is case insensitive.\n\nAVERAGE : Average accuracy over all test datasets\n\nCASE SENSITIVE : If true, the output is case sensitive and contain common characters.\nIf false, the output is not case sensetive and contains only numbers and letters. \n\n## Installation\n### Requirements\n\n- Linux\n- Python 3.6+\n- PyTorch 1.4.0 or higher\n- CUDA 9.0 or higher\n\nWe have tested the following versions of OS and softwares:\n\n- OS: Ubuntu 16.04.6 LTS\n- CUDA: 10.2\n- Python 3.6.9\n- PyTorch: 1.5.1\n\n### Install vedastr\n\n1. Create a conda virtual environment and activate it.\n\n```shell\nconda create -n vedastr python=3.6 -y\nconda activate vedastr\n```\n\n2. Install PyTorch and torchvision following the [official instructions](https://pytorch.org/),\n *e.g.*,\n\n```shell\nconda install pytorch torchvision -c pytorch\n```\n\n3. Clone the vedastr repository.\n\n```shell\ngit clone https://github.com/Media-Smart/vedastr.git\ncd vedastr\nvedastr_root=${PWD}\n```\n\n4. Install dependencies.\n\n```shell\npip install -r requirements.txt\n```\n\n## Prepare data\n1. Download Lmdb data from [deep-text-recognition-benchmark](https://github.com/clovaai/deep-text-recognition-benchmark),\n which contains training, validation and evaluation data. \n **Note: we use the ST dataset released by [ASTER](https://github.com/ayumiymk/aster.pytorch#data-preparation).**  \n\n2. Make directory data as follows:\n\n```shell\ncd ${vedastr_root}\nmkdir ${vedastr_root}/data\n```\n\n3. Put the download LMDB data into this data directory, the structure of data directory will look like as follows: \n\n```shell\ndata\n└── data_lmdb_release\n    ├── evaluation\n    ├── training\n    │   ├── MJ\n    │   │   ├── MJ_test\n    │   │   ├── MJ_train\n    │   │   └── MJ_valid\n    │   └── ST\n    └── validation\n```\n\n\n\n## Train\n\n1. Config\n\nModify configuration files in [configs/](configs) according to your needs(e.g. [configs/tps_resnet_bilstm_attn.py](configs/tps_resnet_bilstm_attn.py)). \n\n2. Run\n\n```shell\n# train using GPUs with gpu_id 0, 1, 2, 3\npython tools/train.py configs/tps_resnet_bilstm_attn.py \"0, 1, 2, 3\" \n```\n\nSnapshots and logs by default will be generated at `${vedastr_root}/workdir/name_of_config_file`(you can specify workdir in config files).\n\n## Test\n\n1. Config\n\nModify configuration as you wish(e.g. [configs/tps_resnet_bilstm_attn.py](configs/tps_resnet_bilstm_attn.py)).\n\n2. Run\n\n```shell\n# test using GPUs with gpu_id 0, 1\n./tools/dist_test.sh configs/tps_resnet_bilstm_attn.py path/to/checkpoint.pth \"0, 1\" \n```\n\n## Inference\n1. Run\n\n```shell\n# inference using GPUs with gpu_id 0\npython tools/inference.py configs/tps_resnet_bilstm_attn.py checkpoint_path img_path \"0\"\n```\n\n## Deploy\n1. Install [volksdep](https://github.com/Media-Smart/volksdep) following the \n[official instructions](https://github.com/Media-Smart/volksdep#installation)\n\n2. Benchmark (optional)\n```python\n# Benchmark model using GPU with gpu_id 0\nCUDA_VISIBLE_DEVICES=\"0\" python tools/benchmark.py configs/resnet_ctc.py checkpoint_path out_path --dummy_input_shape \"3,32,100\"\n```\n\nMore available arguments are detailed in [tools/deploy/benchmark.py](https://github.com/Media-Smart/vedastr/blob/master/tools/deploy/benchmark.py).\n\nThe result of resnet_ctc is as follows(test device: Jetson AGX Xavier, CUDA:10.2):\n\n| framework  |  version   |     input shape      |         data type         |   throughput(FPS)    |   latency(ms)   |\n|   :---:    |   :---:    |        :---:         |           :---:           |        :---:         |      :---:      |\n|  PyTorch   |   1.5.0    |   (1, 1, 32, 100)    |           fp32            |          64          |      15.81      |\n|  TensorRT  |  7.1.0.16  |   (1, 1, 32, 100)    |           fp32            |         109          |      9.66       |\n|  PyTorch   |   1.5.0    |   (1, 1, 32, 100)    |           fp16            |         113          |      10.75      |\n|  TensorRT  |  7.1.0.16  |   (1, 1, 32, 100)    |           fp16            |         308          |      3.55       |\n|  TensorRT  |  7.1.0.16  |   (1, 1, 32, 100)    |      int8(entropy_2)      |         449          |      2.38       |\n\n\n\n3. Export model to ONNX format\n\n```python\n# export model to onnx using GPU with gpu_id 0\nCUDA_VISIBLE_DEVICES=\"0\" python tools/torch2onnx.py configs/resnet_ctc.py checkpoint_path --dummy_input_shape \"3,32,100\" --dynamic_shape\n```\n\n  More available arguments are detailed in [tools/torch2onnx.py](https://github.com/Media-Smart/vedastr/blob/master/tools/torch2onnx.py).\n\n4. Inference SDK\n\n  You can refer to [FlexInfer](https://github.com/Media-Smart/flexinfer) for details.\n\n## Citation\n\nIf you use this toolbox or benchmark in your research, please cite this project.\n\n```\n@misc{2020vedastr,\n    title  = {vedastr: A Toolbox for Scene Text Recognition},\n    author = {Sun, Jun and Cai, Hongxiang and Xiong, Yichao},\n    url    = {https://github.com/Media-Smart/vedastr},\n    year   = {2020}\n}\n```\n\n## Contact\n\nThis repository is currently maintained by Jun Sun([@ChaseMonsterAway](https://github.com/ChaseMonsterAway)), Hongxiang Cai ([@hxcai](http://github.com/hxcai)), Yichao Xiong ([@mileistone](https://github.com/mileistone)).\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FMedia-Smart%2Fvedastr","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FMedia-Smart%2Fvedastr","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FMedia-Smart%2Fvedastr/lists"}