{"id":13543074,"url":"https://github.com/dafanghe/DeepSceneTextReader","last_synced_at":"2025-04-02T12:31:11.023Z","repository":{"id":201792894,"uuid":"128553753","full_name":"dafanghe/DeepSceneTextReader","owner":"dafanghe","description":"This is a c++ project deploying a deep scene text reading pipeline with tensorflow. It reads text from natural scene images. It uses frozen tensorflow graphs. The detector detect scene text locations. The recognizer reads word from each detected bounding box.","archived":false,"fork":false,"pushed_at":"2018-10-31T02:02:04.000Z","size":1113,"stargazers_count":49,"open_issues_count":3,"forks_count":19,"subscribers_count":9,"default_branch":"master","last_synced_at":"2024-11-03T09:33:41.116Z","etag":null,"topics":["deep-learning","deployment","end-to-end-ocr","scene-text","scene-text-detection","scene-text-recognition"],"latest_commit_sha":null,"homepage":"","language":"C++","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/dafanghe.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2018-04-07T17:49:58.000Z","updated_at":"2024-03-30T19:43:25.000Z","dependencies_parsed_at":"2023-10-20T05:51:06.487Z","dependency_job_id":null,"html_url":"https://github.com/dafanghe/DeepSceneTextReader","commit_stats":null,"previous_names":["dafanghe/deepscenetextreader"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dafanghe%2FDeepSceneTextReader","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dafanghe%2FDeepSceneTextReader/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dafanghe%2FDeepSceneTextReader/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dafanghe%2FDeepSceneTextReader/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/dafanghe","download_url":"https://codeload.github.com/dafanghe/DeepSceneTextReader/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":246815451,"owners_count":20838441,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["deep-learning","deployment","end-to-end-ocr","scene-text","scene-text-detection","scene-text-recognition"],"created_at":"2024-08-01T11:00:22.586Z","updated_at":"2025-04-02T12:31:06.017Z","avatar_url":"https://github.com/dafanghe.png","language":"C++","funding_links":[],"categories":["Text detection and localization"],"sub_categories":["Form Segmentation"],"readme":"# DeepSceneTextReader\nThis is a c++ project deploying a deep scene text reading pipeline. It reads text from natural scene images.\n\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"images/1.jpg\" width=1280 height=720\u003e\n  \u003cimg src=\"images/2.jpg\" width=288 height=200\u003e\n  \u003cimg src=\"images/3.jpg\" width=256 height=200\u003e\n  \u003cimg src=\"images/4.jpg\" width=256 height=200\u003e\n\u003c/p\u003e\n\n\n# Prerequsites\n\nThe project is written in c++ using tensorflow computational framework. It is tested using tensorflow 1.4. Newer version should be ok too, but not tested.\nPlease install:\n\n* Tensorflow\n\n* nsync project: https://github.com/google/nsync.git  This is needed for building tensorflow.\n\n* opencv3.3\n\n* protobuf\n\n* eigen\n\nPlease check this project on how to build project using tensorflow with cmake:\nhttps://github.com/cjweeks/tensorflow-cmake\nIt greatly helped the progress of building this project.\nWhen building tensorflow library, please be careful since we need to use opencv. Looks like there is still problem when including tensorflow and opencv together.\nIt will make opencv unable to read image.\nCheck out this issue: https://github.com/tensorflow/tensorflow/issues/14267\nThe answer by allenlavoie solved my problem, so I paste it here:\n\n\"In the meantime, as long as you're not using any custom ops you can build libtensorflow_cc.so with bazel build --config=monolithic, which will condense everything together into one shared object (no libtensorflow_framework dependence) and seal off non-TensorFlow symbols. That shared object will have protocol buffer symbols.\"\n\n# Status\nCurrently two pretrained model is provided. One for scene text detection, and one for scene text recognition.\nMore model will be provided.\nNote that the current model is not so robust. U can easily change to ur trained model.\nThe models will be continuously updated.\n\n# build process\n\ncd build\n\ncmake ..\n\nmake\n\nIt will create an excutable named **DetectText** in bin folder.\n\n# Usage:\nThe excutable could be excuted in three modes:  (1) Detect  (2) Recognize  (3) Detect and Recognize\n\n## Detect\nDownload the pretrained detector model and put it in model/\n\n./DetectText --detector_graph='model/Detector_model.pb' \\\n   --image_filename='test_images/test_img1.jpg' --mode='detect' --output_filename='results/output_image.jpg'\n\n## Recognize\nDownload the pretrained recognizer model and put it in model/\nDownload the dictionary file and put it in model\n\n\n./DetectText --recognizer_graph='model/Recognizer_model.pb'  \\\n   --image_filename='test_images/recognize_image1.jpg' --mode='recognize' \\\n   --im_height=32  --im_width=128\n\n## Detect and Recognize\nDownload the pretrained detector and recognizer model and put it in model/ as described previously.\n\n./DetectText --recognizer_graph=$recognizer_graph --detector_graph='model/Detector_model.pb' \\\n   --image_filename='model/Recognizer_model.pb' --mode='detect_and_read' --output_filename='results/output_image.jpg' \n\n# Model Description\n### *Detector*\n1. Faster RCNN Detector Model\nThe detector is trained with modified tensorflow [object detector api]: (https://github.com/tensorflow/models/tree/master/research/object_detection)\nI modify it by changing the proposal scheme to regress to the 4 coordinates of the oriented bounding box rather than regular rectangular bounding box.\nCheck out this [repo](https://github.com/dafanghe/Tensorflow_SceneText_Oriented_Box_Predictor) for the training code.\nPretrained model: FasterRCNN_detector_model.pb\n\n2. R2CNN will be updated. See [R2CNN](https://arxiv.org/abs/1706.09579) for details.\nThe code is also modified with tnesorflow [object detector api]: (https://github.com/tensorflow/models/tree/master/research/object_detection)\nThe training code will be released soon.\n\n\n### *Recognizer*\n1. CTC scene text recognizer.\nThe recognizer model follows the famous scene text recognition [CRNN model](https://arxiv.org/abs/1507.05717)\n\n2. Spatial Attention OCR will be updated soon. It is based on [GoogleOCR](https://github.com/tensorflow/models/tree/master/research/attention_ocr)\n\n### *Detect and Recognize*\nThe whole scene text reading pipeline detects the text and rotate it horizontally and read it with recognizer.\nThe pipeline is here:\n\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"images/pipeline.jpg\" width=1280 height=436\u003e\n\u003c/p\u003e\n\n### *Pretrained Models*\nYou can play with the code with provided pretrained models. \\\nThey are not fully optimized yet, but could be used for being familiar with the code. \\\nCheck them out here: [models](https://drive.google.com/drive/folders/1Ao0ZrSVf0YjU6pnzGY0C3QJ2Qz0ljRIU?usp=sharing) \n\nYou will find two detection models called: (1) **FasterRCNN_detector_model.pb** (2) **R2CNN_detector_model.pb** \\\nTwo recognition models with their charset: (1) **Recognizer_model.pb + charset_full.txt** and (2)**Recognizer_model_case_insen.pb + charset_case_insen.txt**. \\\nFull charset means English letters + digit and case insen means case insensitive English letters + digit.\nLet me know if u have any problens using them.\n\n\n# Reference and Related Projects\n- [Faster RCNN](https://arxiv.org/abs/1506.01497) Faster RCNN paper.\n- [Tensorflow Object Detection API](https://github.com/tensorflow/models/tree/master/research/object_detection).\n- [An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition](https://arxiv.org/abs/1507.05717), reference paper for CRNN model.\n- [tensorflow-cmake](https://github.com/cjweeks/tensorflow-cmake), Tutorial of Building Project with tensorflow using cmake.\n- [R2CNN](https://arxiv.org/abs/1706.09579) Reference paper for R2CNN.\n\n# Contact:\n\n* Dafang He. The Penn State University.  hdfcraig@gmail.com   http://personal.psu.edu/duh188/\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdafanghe%2FDeepSceneTextReader","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdafanghe%2FDeepSceneTextReader","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdafanghe%2FDeepSceneTextReader/lists"}