{"id":13543071,"url":"https://github.com/dafanghe/Tensorflow_SceneText_Oriented_Box_Predictor","last_synced_at":"2025-04-02T12:31:11.616Z","repository":{"id":202335759,"uuid":"155472716","full_name":"dafanghe/Tensorflow_SceneText_Oriented_Box_Predictor","owner":"dafanghe","description":"This project modify tensorflow object detection api code to predict oriented bounding boxes. It can be used for scene text detection.","archived":false,"fork":false,"pushed_at":"2018-10-31T02:00:25.000Z","size":914,"stargazers_count":29,"open_issues_count":4,"forks_count":13,"subscribers_count":3,"default_branch":"master","last_synced_at":"2024-11-03T09:33:40.878Z","etag":null,"topics":["deep-learning","faster-rcnn","scene-text","scene-text-detection","ssd","tensorflow"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/dafanghe.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2018-10-31T00:05:03.000Z","updated_at":"2024-03-28T09:23:06.000Z","dependencies_parsed_at":null,"dependency_job_id":"5d2cda15-aa0b-41b0-ae72-9a3905071916","html_url":"https://github.com/dafanghe/Tensorflow_SceneText_Oriented_Box_Predictor","commit_stats":null,"previous_names":["dafanghe/tensorflow_scenetext_oriented_box_predictor"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dafanghe%2FTensorflow_SceneText_Oriented_Box_Predictor","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dafanghe%2FTensorflow_SceneText_Oriented_Box_Predictor/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dafanghe%2FTensorflow_SceneText_Oriented_Box_Predictor/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dafanghe%2FTensorflow_SceneText_Oriented_Box_Predictor/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/dafanghe","download_url":"https://codeload.github.com/dafanghe/Tensorflow_SceneText_Oriented_Box_Predictor/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":246815447,"owners_count":20838440,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["deep-learning","faster-rcnn","scene-text","scene-text-detection","ssd","tensorflow"],"created_at":"2024-08-01T11:00:22.511Z","updated_at":"2025-04-02T12:31:06.607Z","avatar_url":"https://github.com/dafanghe.png","language":"Python","funding_links":[],"categories":["Text detection and localization"],"sub_categories":["Form Segmentation"],"readme":"This is an oriented object detector based on [tensorflow object detection API](https://github.com/tensorflow/models/tree/master/research/object_detection).\nMost of the code is not changed except for those related to the need of predicinting oriented bounding boxes rather than regular horizontal bounding boxes.\n\nMany tasks need to predict an oriented bounding box, e.g: Scene Text Detection.\nCheck out the detection results:\n(Note that this code doesn't train model to recognize text. Only the bounding boxes are predicted) \n\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"pics/1.jpg\" width=1280 height=720\u003e\n  \u003cimg src=\"pics/2.jpg\" width=288 height=200\u003e\n  \u003cimg src=\"pics/3.jpg\" width=256 height=200\u003e\n  \u003cimg src=\"pics/4.jpg\" width=256 height=200\u003e\n\u003c/p\u003e\n\n# Goals\nFor each predicted bounding boxes, in addition to the regular horizontal bounding box, we need to predict one oriented bounding box.\nBasically it means that we need to regress to an oriented bounding box.\nIn this project, we simply regress to the encoded 4 corners of the oriented bounding boxes(8 values).\nSee below equation for the encoding function. j is the index for each corner. g represents ground truth oriented bounding boxes.\nw_a and h_a is the anchor width and height, respectively.\n\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"pics/encoding.png\" height=120\u003e\n\u003c/p\u003e\n\n# The reason of adopting this Faster RCNN/SSD framework:\nThere are many object detection framework to be used. We adopt this one as the basis for the following reasons:\n### Highly modular designed code\nIt's easy to change the encoding scheme in the code. Simply changing the code in box_coders folder.\nThe encoding using [R2CNN] (https://arxiv.org/abs/1706.09579) will be released soon.\nTraining model with faster rcnn or ssd is easy to modify.\n\n### Natural integration with slim nets\nIt's easy to change feature extraction CNN backbone by using slim nets.\n\n### Easy and clear configuration setting with google protobuf\nChanging the network configuration setting is easy. For example, to change the different aspect ratios of the anchors used, simply changing the grid_anchor_generator in the configuration file.\n\n### Many supporting codes have been provided.\nIt provides many supporting code such as exporting the trained model to a frozen graph that can be used in production(For example, in your c++ project).\nCheck out my another project [DeepSceneTextReader](https://github.com/dafanghe/DeepSceneTextReader) which used the frozen graph trained with this code.\n\n# Code Changed compared to the original object detection implementation\n\n### Import path for each python file\nYou do not need to use blaze build to build the code. Simply run the code from the root directory for fast experiment.\n\n### proto files\nadded oriented related filed to the proto files. Please build them with\n\n```\nprotoc protos/*.proto --python_out=.\n```\n\n### Box encoding scheme\nadded code for encode and decode oriented bounding boxes\n\n### Added code in meta architecture for supporting oriented bounding box prediction\nAdd code to predict the oriented bounding boxes for each proposal.\nAt the same time the add code to calculate the oriented bounding boxes regression loss.\n\n### Other changes regarding data reading, data decoding and others\n\n\n# Usage:\n\n## Create the tfrecord data\nUse the code create_text_dataset.py to create the tfexample data files used for training.\nYou can create ICDAR 2015 and ICDAR 2013 data for training.\n\n### Download the pretrained weight\nIf you are training faster rcnn inception resnet v2 model, you can download the [pretrained weight](http://download.tensorflow.org/models/object_detection/faster_rcnn_inception_resnet_v2_atrous_coco_2018_01_28.tar.gz) from tensorflow [model zoo](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/detection_model_zoo.md).\n\n### change the specific configuration setting.\nSee data/faster_rcnn_inception_resnet_v2_atrous_text.config for example configuration\nThe parameter: second_stage_localization_loss_weight_oriented is the weight for the oriented bounding box prediction.\n\n### Train the model\nExample running script is provided: train_faster_rcnn_inception_resnet_v2.sh\n\n# Evaluation\nTrained with default configuration with ResNet Inception V2 or ResNet 101 backbone on ICDAR 2013 + ICDAR 2015 training set.\nThe performance on ICDAR 2015 dataset.\n\n| Backbone  | Recall | Precision | F-1 |\n| --- | --- | --- | --- | \n| ResNet Inception V2 | 0.7371 | 0.8057 | 0.7699 |\n| ResNet 101 | 0.6861 | 0.8213 | 0.7476 |\n\nTo improve the performance, try changing the configuration settings.\nMany scene text detectors have more aspect ratios anchors for each location than that was used for regular object detection.\n\n# TODO\n1. Provide support for R2CNN training.\n\n\n# Reference and Related Projects\n- [Faster RCNN](https://arxiv.org/abs/1506.01497) Faster RCNN paper.\n- [Tensorflow Object Detection API](https://github.com/tensorflow/models/tree/master/research/object_detection).\n- [R2CNN](https://arxiv.org/abs/1706.09579) Reference paper for R2CNN.\n\n# Contact:\n\n* Dafang He. The Penn State University.  hdfcraig@gmail.com   http://personal.psu.edu/duh188/\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdafanghe%2FTensorflow_SceneText_Oriented_Box_Predictor","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdafanghe%2FTensorflow_SceneText_Oriented_Box_Predictor","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdafanghe%2FTensorflow_SceneText_Oriented_Box_Predictor/lists"}