{"id":13543081,"url":"https://github.com/Bartzi/see","last_synced_at":"2025-04-02T12:31:12.015Z","repository":{"id":49370255,"uuid":"110142990","full_name":"Bartzi/see","owner":"Bartzi","description":"Code for the AAAI 2018 publication \"SEE: Towards Semi-Supervised End-to-End Scene Text Recognition\"","archived":false,"fork":false,"pushed_at":"2019-04-26T09:42:49.000Z","size":547,"stargazers_count":575,"open_issues_count":53,"forks_count":147,"subscribers_count":34,"default_branch":"master","last_synced_at":"2024-11-03T09:33:43.750Z","etag":null,"topics":["chainer","cnn","computer-vision","deep-learning","scene-text-recognition","semi-supervised-learning"],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Bartzi.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2017-11-09T17:01:55.000Z","updated_at":"2024-09-29T07:21:15.000Z","dependencies_parsed_at":"2022-08-28T15:42:11.433Z","dependency_job_id":null,"html_url":"https://github.com/Bartzi/see","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Bartzi%2Fsee","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Bartzi%2Fsee/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Bartzi%2Fsee/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Bartzi%2Fsee/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Bartzi","download_url":"https://codeload.github.com/Bartzi/see/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":246815453,"owners_count":20838441,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["chainer","cnn","computer-vision","deep-learning","scene-text-recognition","semi-supervised-learning"],"created_at":"2024-08-01T11:00:22.726Z","updated_at":"2025-04-02T12:31:07.005Z","avatar_url":"https://github.com/Bartzi.png","language":"Python","funding_links":[],"categories":["Papers","Papers \u0026 Code","Text detection and localization"],"sub_categories":["Others","Form Segmentation"],"readme":"# SEE: Towards Semi-Supervised End-to-End Scene Text Recognition\nCode for the AAAI 2018 publication \"SEE: Towards Semi-Supervised End-to-End Scene Text Recognition\". You can read a preprint on [Arxiv](http://arxiv.org/abs/1712.05404)\n\n\n# Installation\n\nYou can install the project directly on your PC or use a Docker container\n\n## Directly on your PC\n1. Make sure to use Python 3\n2. It is a good idea to create a virtual environment ([example for creating a venv](http://docs.python-guide.org/en/latest/dev/virtualenvs/))\n3. Make sure you have the latest version of [CUDA](https://developer.nvidia.com/cuda-zone) (\u003e= 8.0) installed\n4. Install [CUDNN](https://developer.nvidia.com/cudnn) (\u003e 6.0)\n5. Install [NCCL](https://developer.nvidia.com/nccl) (\u003e 2.0) [installation guide](https://docs.nvidia.com/deeplearning/sdk/nccl-archived/nccl_2212/nccl-install-guide/index.html)\n6. Install all requirements with the following command: `pip install -r requirements.txt`\n7. Check that chainer can use the GPU:\n    - start the python interpreter: `python`\n    - import chainer: `import chainer`\n    - check that cuda is available: `chainer.cuda.available`\n    - check that cudnn is enabled: `chainer.cuda.cudnn_enabled`\n    - the output of both commands should be `True`\n\n## Using Docker\n1. Install `Docker`\n   - Windows: Get it [here](https://www.docker.com/community-edition)\n   - Mac: Get it [here](https://www.docker.com/community-edition)\n   - Linux: User your favourite package manager i.e. `pacman -S docker`, or use [this guide](https://docs.docker.com/install/linux/docker-ce/ubuntu/) for Ubuntu.\n2. Install CUDA related things:\n    - [CUDA](https://developer.nvidia.com/cuda-zone) (\u003e= 8.0) installed\n    - [CUDNN](https://developer.nvidia.com/cudnn) (\u003e 6.0)\n    - nvidia-docker ([Ubuntu](https://gist.github.com/dsdenes/d9c66361df96bce3fca8f1414bb14bce), [Arch Like OS](https://aur.archlinux.org/packages/nvidia-docker2/)))\n3. Get [NCCL](https://developer.nvidia.com/nccl)\n    - make sure to download the version for Ubuntu 16.04, that fits to your local CUDA configuration (i.e. you have installed CUDA 9.1 take the version for CUDA 9.1, if you have CUDA 8, take the version for CUDA 8)\n    - place it in the root folder of the project\n4. Build the Docker image\n    - `docker build -t see .`\n    - If your host system uses CUDA with a version earlier than 9.1, specify the corresponding docker image to match the configuration of your machine (see [this list](https://hub.docker.com/r/nvidia/cuda/) for available options).\n    For example, for CUDA 8 and CUDNN 6 use the following instead:\n    ```\n    docker build -t see --build-arg FROM_IMAGE=nvidia/cuda:8.0-cudnn6-devel-ubuntu16.04 .\n    ```\n    - if you did not download a file called `nccl-repo-ubuntu1604-2.1.15-ga-cuda9.1_1-1_amd64.deb`, set the argument `NCCL_NAME` to the name of the file you downloaded. For example:\n    ```\n    docker build -t see --build-arg NCCL_NAME=nccl-repo-ubuntu1604-2.1.15-ga-cuda9.0_1-1_amd64.deb .\n    ```\n5. Check that everything is okay, by entering a shell in the container and do the following:\n    - run the container with: `nvidia-docker run -it see`\n    - start the python interpreter: `python3`\n    - import chainer: `import chainer`\n    - check that cuda is available: `chainer.cuda.available`\n    - check that cudnn is enabled: `chainer.cuda.cudnn_enabled`\n    - the output of both commands should be `True`\n6. **Hint:** make sure to mount all data folders you need into the container with the `-v` option for running a container.\n\n# General Training Hints\n\nIf you like to train a network with more than 4 words per image, you will need to adjust or delete the `loss_weights` (see [this](https://github.com/Bartzi/see/blob/master/chainer/metrics/loss_metrics.py#L206) line). Otherwise, the code will throw errors at you. They are mainly meant for training FSNS models and should be discarded when training other models.\n\n# SVHN Experiments\n\nWe performed several experiments on the SVHN dataset.\nFirst, we tried to see whether our architecture is able to reach competitive results\non the SVHN recognition challenge.\nSecond, we wanted to determine whether our localization network can find a text\ndistributed on a given grid.\nIn our last experiment we created a dataset, where we randomly distributed\nthe text samples on the image.\n\n## Datasets\n\nThis section describes what needs to be done in order to get/prepare the data.\nThere is no need for creating the custom datasets by yourself, we also offer them for download.\nThe information on how to create the datasets is included here for reference.\n\n### Original SVHN data\n1. Get the original SVHN datset from [here](http://ufldl.stanford.edu/housenumbers/).\n2. Extract the label data using the script `datasets/svn/svhn_dataextract_to_json.py`.\n3. use the script `datasets/svhn/prepare_svhn_crops.py` to crop all bounding boxes,\nincluding some background from the SVHN images. Use the script like that:\n`python prepare_svhn_crops.py \u003cpath to svhn json\u003e 64 \u003cwhere to save the cropped images\u003e \u003cname of stage\u003e`.\nFor more information about possible commands you can use `python prepare_svhn_crops.py -h`.\n\n### Grid Dataset\n1. Follow steps 1 and 2 of the last subsection in order to get all SVHN images and the corresponding groundtruth.\n2. The script `datasets/svhn/create_svhn_dataset_4_images.py` can be used to create the dataset.\n3. The command `python create_svhn_dataset_4_images.py -h` shows all available command line options for this script\n\n### Random Dataset\n1. Follow steps 1 and 2 of the first subsection in order to get all SVHN images and the corresponding groundtruth.\n2. The script `datasets/svhn/create_svhn_dataset.py` can be used to create the dataset.\n3. The command `python create_svhn_dataset.py -h` shows all available command line options for this script.\n\n### Dataset Download\n\nYou can also download already created datasets [here](https://bartzi.de/research/see).\n\n## Training the model\n\nYou can use the script `train_svhn.py` to train a model that can detect and recognize SVHN like text.\nThe script is tuned to use the custom datasets and should enable you to redo these experiments.\n\n### Preparations\n\n1. Make sure that you have one of the datasets.\n2. For training you will need:\n    1. the file `svhn_char_map.json` (you can find it in the folder `datasets/svhn`)\n    2. the ground truth files of the dataset you want to use\n3. Add one line to the beginning of each ground truth file: `\u003cnumber of house numbers in image\u003e \u003cmax number of chars per house number\u003e`\n(both values need to be separated by a tab character). If you are using the grid dataset it could look like that: `4    4`.\n3. prepare the curriculum specification as a `json` file, by following this template:\n    ```\n    [\n        {\n            \"train\": \"\u003cpath to train file\u003e\",\n            \"validation\": \"\u003cpath to validation file\u003e\"\n        }\n    ]\n    ```\n    if you want to train using the curriculum learning strategy, you just need to add\n    further dicts to this list.\n3. use the script `chainer/train_svhn.py` for training the network.\n\n### Starting the training\n\nThe training can be run on GPU or CPU. You can also use multiple GPUs in a data parallel fashion.\nIn order to specify which GPU to use just add the command line parameter `-g \u003cid of gpu to use\u003e` e.g. `-g 0` for using the first GPU.\n\nYou can get a brief explanation of each command line option of the script `train_svhn.py` by running\nthe script like this: `python train_svhn.py -h`\n\nYou will need to specify at least the following parameters:\n- `dataset_specification` - this is the path to the `json` file you just created\n- `log_dir` - this is the path to the directory where the logs shall be saved\n- `--char-map ../datasets/svhn/svhn_char_map.json` - path to the char map for mapping classes to labels.\n- `--blank-label 0` - indicates that class 0 is the blank label\n- `-b \u003cbatch-size\u003e` - set the batch size used for training\n\n# FSNS Experiments\n\nIn order to see, whether our idea is applicable in practice, we also\ndid experiments on the FSNS dataset. The FSNS dataset contains\nimages of French street name signs. The most notable characteristic of this\ndataset is, that this dataset does not contain any annotation for text\nlocalization. This fact makes this dataset quite suitable for our method,\nas we claim that we can locate and recognize text, even without the corresponding\nground truth for localization.\n\n## Preparing the Dataset\n\nGetting the dataset and making it usable with deep learning frameworks\nlike Chainer is not an easy task. We provide some scripts that will download\nthe dataset, convert it from the tensorflow format to single images and\ncreate a ground truth file, that is usable by our train code.\n\nThe folder `datasets/fsns` contains all scripts that are necessary for preparing\nthe dataset. These steps need to be done:\n\n1. use the script `download_fsns.py` for getting the dataset.\nYou will need to specify a directory, where the data shall be saved.\n2. the script `tfrecord_to_image.py` extracts all images and labels from\nthe downloaded dataset.\n3. We advise you to use the script `swap_classes.py`.\nWith this script we will set the class of the blank label to be `0`, as it is defined in\nthe class to label map `fsns_char_map.json`. You can invoke the script like this:\n`python swap_classes.py \u003cgt_file\u003e \u003coutput_file_name\u003e 0 133`\n4. next, you will need to transform the original ground truth, to the ground truth\nformat we used for training. Our ground truth format differs because we\nfound that it is not possible to train the model if the word boundaries are not\nexplicitly given to the model. We, therefore, transform the line based ground truth\nto a word based ground truth. You can use the script `transform_gt.py` for doing that.\nYou could call the script like that:\n`python transform_gt.py \u003cpath to original gt\u003e fsns_char_map.json \u003cpath to new gt\u003e`.\n\n## Training the Network\n\nBefore you can start training the network, you will need to do the following preparations:\n\nIn the last section, we already introduced the `transform_gt.py` script.\nAs we found that it is only possible to train a new model on the FSNS dataset,\nwhen using a curriculum learning strategy, we need to create a learning curriculum\nprior to starting the training. You can do this by following these steps:\n\n1. create ground truth files for each step of the curriculum with the `transform_gt.py`\nscript.\n    1. start with a reasonable number of maximum words (2 is a good choice here)\n    2. create a ground truth file with all images that contain max. 2 words by using the `transform_gt.py`\n    script: `python transform_gt.py \u003cpath to downloaded gt\u003e fsns_char_map.json \u003cpath to 2 word gt\u003e --max-words 2 --blank-label 0`\n    3. Repeat this step with 3 and 4 words (you can also take 5 and 6, too), but make sure\n    to only include images with the corresponding amount of words (`--min-words` is the flag to use)\n2. Add the path to your files to a `.json` file that could be called `curriculum.json`\nThis file works exactly the same as the file discussed in step 3 in the preparations section\nfor the SVHN experiments.\n\nOnce you are done with this, you can actually train the network :tada:\n\nTraining the network happens, by using the `train_fsns.py` script.\n`python train_fsns.py -h` shows all available command-line options.\nThis script works very similarly to the `train_svhn.py` script\n\nYou will need to specify at least the following parameters:\n- `dataset_specification` - this is the path to the `json` file you just created\n- `log_dir` - this is the path to the directory where the logs shall be saved\n- `--char-map ../datasets/fsns/fsns_char_map.json` - path to the char map for mapping classes to labels.\n- `--blank-label 0` - indicates that class 0 is the blank label\n- `-b \u003cbatch-size\u003e` - set the batch size used for training\n\n## FSNS Demo\n\nIn case you only want to see how the model behaves on a given image, you can use the `fsns_demo.py` script.\nThis script expects a trained model, an image and a char map and prints you the predicted words in the\nimage + the predicted bounding boxes.\nIf you download the model provided [here](https://bartzi.de/research/see), you could call the script like this:\n`python fsns_demo.py \u003cpath to log directory\u003e model_35000.npz \u003cpath to example image\u003e ../datasets/fsns/fsns_char_map.json`\nIt should be fairly easy to extend this script to also work with other models. Just have a look at how the different evaluators create the network\nand how they extract the characters from the predictions and you should be good to go!\n\n# Text Recognition\n\nAlthough not mentioned in the paper, we also provide a model with which, you can perform text recognition\non already cropped text lines. We also provide code for training such a model.\nEverything works very similar to the scripts provided for SVHN and FSNS.\n\n## Dataset\n\nUnfortunately, we can not offer our entire train dataset for download, as it is way too huge.\nBut if you want to train a text recognition model on your own, you can use the \"Synthetic Word Dataset\" (download it [here](http://www.robots.ox.ac.uk/~vgg/data/text/)).\nAfter you've downloaded the dataset, you will need to do some post processing and create\na groundtruth similar to the one for the FSNS dataset. We provide a sample dataset at the location,\nwhere you can also download the text recognition model (which is [here](https://bartzi.de/research/see)).\n\n## Training\nAfter you are done with preparing the dataset, you can start training.\n\nTraining the network happens, by using the `train_text_recognition.py` script.\n`python train_text_recognition.py -h` shows all available command-line options.\nThis script works very similarly to the `train_svhn.py` and `train_fsns.py` script\n\nYou will need to specify at least the following parameters:\n- `dataset_specification` - this is the path to the `json` file you just created\n- `log_dir` - this is the path to the directory where the logs shall be saved\n- `--char-map ../datasets/textrec/ctc_char_map.json` - path to the char map for mapping classes to labels.\n- `--blank-label 0` - indicates that class 0 is the blank label\n- `-b \u003cbatch-size\u003e` - set the batch size used for training\n\n## Text Recognition Demo\n\nAnalog to the `fsns_demo.py` script, we offer a demo script for text recognition named `text_recognition_demo.py`.\nThis script expects a trained model, an image and a char map and prints you the predicted words in the\nimage + the predicted bounding boxes.\nIf you download the model provided [here](https://bartzi.de/research/see), you could call the script like this:\n`python text_recognition_demo.py \u003cpath to log directory\u003e model_190000.npz \u003cpath to example image\u003e ../datasets/textrec/ctc_char_map.json`\nIt should be fairly easy to extend this script to also work with other models. Just have a look at how the different evaluators create the network\nand how they extract the characters from the predictions and you should be good to go!\n\n# Pretrained Models\n\nYou can download our best performing model on the FSNS dataset, a model\nfor our SVHN experiments and also a model for our text recognition experiments [here](https://bartzi.de/research/see).\n\n\n# General Notes on Training\n\nThis section contains information about things that happen while a network is training.\nIt includes a description of all data that is being logged and backed up for each training run\nand a description of a tool that can be used to inspect the training, while\nit is running.\n\n## Contents of the log dir\n\nThe code will create a new subdirectory in the log dir, where it puts all\ndata that is to be logged. The code logs the following pieces of data:\n- it creates a backup of the currently used network definition files\n- it saves a snapshot of the model at each epoch, or after `snapshot_interval` iterations (default 5000)\n- it saves loss and accuracy values at the configured print interval (each time after 100 iterations)\n- it will save the prediction of the model on a given or randomly chosen sample. This visualization\nhelps with assessing, whether the network is converging or not. It also enables you to inspect the training progress\nwhile the network is training.\n\n## Inspecting the training progress\n\nIf you leave the default settings, you can inspect the progress of the\ntraining in real time, by using the script `show_progress.py`. This script\nis located in the folder `utils`. You can get all supported command line arguments\nwith this command: `python show_progress.py -h`. Normally you will want to start\nthe program like this: `python show_progress.py`. It will open a TK window.\nIn case the program complains that it is not able to find TK related libraries,\nyou will need to install them.\n\nAnother approach is that you can use `ChainerUI`, execute following commands to setup `ChainerUI`:\n- `chainerui db create`\n- `chainerui db upgrade`\n\nCreate a project using the following command from the project directory:\n- `chainerui project create -d ./ -n see-ocr`\n\nTo check progress start server:\n- `chainerui server`\n\n## Creating an animation of plotted train steps\n\nThe training script contains a little helper that applies the current\nstate of the model to an image and saves the result of this application\nfor each iteration (or the way you configure it).\n\nYou can use the script `create_video.py` to create an animation out of these images.\nIn order to use the script, you will need to install ffmpeg (and have the `ffmpeg` command in your path)\nand you will need to install imagemagick (and have the `convert` command in your path).\nYou can then create a video with this command line call:\n`python create_video.py \u003cpath to directory with images\u003e \u003cpath to destination video\u003e`.\nYou can learn about further command line arguments with `python create_video.py -h`.\n\n\n# Evaluation\n\nYou can evaluate all models (svhn/fsns/textrecognition) with the script `evaluate.py` in the `chainer` directory.\n\n## Usage\n\nYou will need a directory containing the following items:\n- log_file of the training\n- saved model\n- network definition files that have been backed up by the training script\n- set the gpu to use with `--gpu \u003cid of gpu\u003e`, the code does currently not work on CPU.\n- number of labels per timestep (typically max. 5 for SVHN and 21 for FSNS)\n\n### Evaluating a SVHN model\n\nIn order to evaluate a SVHN model, you will need to invoke the script like that:\n`python evaluate.py svhn \u003cpath to dir with specified items\u003e \u003cname of snapshot to evaluate\u003e \u003cpath to ground truth file\u003e \u003cpath to char map (e.g. svhn_char_map.json)\u003e --target-shape \u003cinput shape for recogntion net (e.g. 50,50)\u003e \u003cnumber of labels per timestep\u003e`\n\n### Evaluating a FSNS model\n\nIn order to evaluate a FSNS model, you will need to invoke the script like that:\n`python evaluate.py fsns \u003cpath to dir with specified items\u003e \u003cname of snapshot to evaluate\u003e \u003cpath to ground truth file\u003e \u003cpath to char map (e.g. fsns_char_map.json)\u003e \u003cnumber of labels per timestep\u003e`\n\n### Evaluating a Text Recognition model\n\nIn order to evaluate a text recognition model, you will need to invoke the script like that:\n`python evaluate.py textrec \u003cpath to dir with specified items\u003e \u003cname of snapshot to evaluate\u003e \u003cpath to ground truth file\u003e \u003cpath to char map (e.g. ctc_char_map.json)\u003e 23`\n\n\n# Citation\n\nIf you find this code useful, please cite our paper:\n\n    @inproceedings{bartz2018see,\n        title={SEE: towards semi-supervised end-to-end scene text recognition},\n        author={Bartz, Christian and Yang, Haojin and Meinel, Christoph},\n        booktitle={Thirty-Second AAAI Conference on Artificial Intelligence},\n        year={2018}\n    }\n\n\n# Notes\n\nIf there is anything totally unclear, or not working, please feel free to file an issue.\nIf you did anything with the code, feel free to file a PR.\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FBartzi%2Fsee","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FBartzi%2Fsee","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FBartzi%2Fsee/lists"}