{"id":17641328,"url":"https://github.com/raphael-baena/DTLR","last_synced_at":"2025-03-10T10:30:47.556Z","repository":{"id":260156624,"uuid":"856330072","full_name":"raphael-baena/DTLR","owner":"raphael-baena","description":"Handwritten Text Recognition and Character Detection","archived":false,"fork":false,"pushed_at":"2024-11-06T15:37:44.000Z","size":16740,"stargazers_count":92,"open_issues_count":1,"forks_count":7,"subscribers_count":4,"default_branch":"main","last_synced_at":"2024-11-06T16:38:07.784Z","etag":null,"topics":["chinese-characters","chinese-handwriting-recogniting","cipher","htr","ocr"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/raphael-baena.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-09-12T11:56:55.000Z","updated_at":"2024-11-06T15:37:48.000Z","dependencies_parsed_at":null,"dependency_job_id":"680b6e29-14bb-419b-9325-4cb5026b4107","html_url":"https://github.com/raphael-baena/DTLR","commit_stats":null,"previous_names":["raphael-baena/dtlr"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/raphael-baena%2FDTLR","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/raphael-baena%2FDTLR/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/raphael-baena%2FDTLR/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/raphael-baena%2FDTLR/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/raphael-baena","download_url":"https://codeload.github.com/raphael-baena/DTLR/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":242833141,"owners_count":20192691,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["chinese-characters","chinese-handwriting-recogniting","cipher","htr","ocr"],"created_at":"2024-10-23T07:01:58.730Z","updated_at":"2025-03-10T10:30:47.532Z","avatar_url":"https://github.com/raphael-baena.png","language":"Python","funding_links":[],"categories":["Python"],"sub_categories":[],"readme":"\u003cdiv align=\"center\"\u003e\n\n\u003ch1\u003e\u003ca href=\"https://detection-based-text-line-recognition.github.io/\"\u003eGeneral Detection-based Text Line Recognition (DTLR)\u003c/a\u003e \u003cbr\u003eNeurIPS 2024\u003c/h1\u003e\n\n\u003cfont size=\"4\"\u003e\n\u003ca href=\"https://raphael-baena.github.io/\"\u003eRaphael Baena\u003c/a\u003e\u0026emsp;\n\u003ca href=\"https://imagine-lab.enpc.fr/staff-members/syrine-kalleli/\"\u003eSyrine Kalleli\u003c/a\u003e\u0026emsp;\n\u003ca href=\"https://imagine.enpc.fr/~aubrym/\"\u003eMathieu Aubry\u003c/a\u003e\u0026emsp;\n\u003c/font\u003e\n\u003cbr\u003e\n\u003cimg src=\"figures/teaser.png\"\u003e\n\u003c/div\u003e\n\n\n## Description\n\nThis repository is the official implementation for [General Detection-based Text Line Recognition](https://detection-based-text-line-recognition.github.io/), \nthe paper is available on [arXiv](https://arxiv.org/pdf/2409.17095).\n\nThis repository builds on the code for [DINO-DETR](https://github.com/IDEA-Research/DINO), the official implementation of the paper \"[DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection](https://arxiv.org/abs/2203.03605)\". We present a model that adapts DINO-DETR for text recognition as a detection and recognition task. The model is pretrained on synthetic data using the same loss as DINO-DETR and then fine-tuned on a real dataset with CTC loss.\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"figures/architecture.jpg\"\u003e\n\u003c/p\u003e\n\n## Content\n\u003cdetails\u003e\n\u003csummary\u003eInstallation, Datasets, and Weights\u003c/summary\u003e\n\n\n## Installation, Datasets, and Weights\n### 1. Installation\nThe model was trained with `python=3.11.0`, `pytorch=2.1.0`, `cuda=11.8` and builds on the DETR-variants [DINO](https://arxiv.org/abs/2203.03605)/[DN](https://arxiv.org/abs/2203.01305)/[DAB](https://arxiv.org/abs/2201.12329) and [Deformable-DETR](https://arxiv.org/abs/2010.04159).\n\n1. Clone this repository and create a virtual environment\n2. Follow instructions to install a [Pytorch](https://pytorch.org/get-started/locally/) version compatible with your system and CUDA version\n3. Install other dependencies\n    ```bash\n    pip install -r requirements.txt\n    ```\n4. Compiling CUDA operators\n    ```bash\n    python models/dino/ops/setup.py build install # 'cuda not available', run =\u003e export CUDA_HOME=/usr/local/cuda-\u003cversion\u003e\n    # unit test (should see all checking is True) # could output an outofmemory error\n    python models/dino/ops/test.py\n    ```\n### 2. Datasets\nDatasets should be placed in the appropriate folder specified in **datasets/config.json**. We preprocess the images and annotations for the IAM dataset, while all other datasets are used in their original form.\nFor each dataset (except IAM), a charset file (.pkl) is required. Charset files can be found in the folder [data](data).\n\n**Handwritten**\n1. IAM: the official website is [here](http://www.fki.inf.unibe.ch/databases/iam-handwriting-database). We preprocess the images and annotation following the instruction in the [PyLai Repository](https://github.com/carmocca/PyLaia-examples/tree/master/iam-htr). The annotations are stored in [data/IAM_new/labels.pkl](data/IAM_new).\n2. RIMES: TEKLIA provide the dataset [here](https://teklia.com/research/rimes-database/). After downloading, place the charset file in the same folder as the dataset.\n3. READ: the dataset is available [here](https://zenodo.org/records/1297399). After downloading, place the charset file in the same folder as the dataset.\n\n**Chinese**\nThe official website is [here](https://nlpr.ia.ac.cn/databases/handwriting/Download.html). Images and annotations are provide only in bytes format for these datasets.\n1. CASIA v1: Download the dataset in bytes format with the link above and place the charset in the same folder as the dataset.\n2. CASIA v2: We provide directly a version of the dataset with images (PNG) and annotations (TXT). Download the dataset [here](https://drive.google.com/file/d/1ZfrsxBM2uhnqa0vps-8950ZFflYgMHin/view?usp=sharing).\n\n**Ciphers**\nThe ciphers borg and copiale are available [here](https://pages.cvc.uab.es/abaro/datasets.html). The charset files are provided in the folder [data](data).\n### 3. Weights\nPretrained checkpoints can be found [here](https://drive.google.com/file/d/1sr-CSCdiVhCuUmZa3danqSvdzIvj8Pdl/view?usp=sharing). The folder includes the weights of the following **pretrained** models:\n\n- **General model**: Trained on random Latin characters. Typically used for finetuning on ciphers.\n- **English model**: Trained on English text with random erasing. Typically used for finetuning on IAM.\n- **French model**: Trained on French text with random erasing. Typically used for finetuning on RIMES.\n- **German model**: Trained on German text with random erasing. Typically used for finetuning on READ.\n- **Chinese model**: Trained on random handwritten Chinese characters from HWDB 1. Typically used for finetuning on HWDB 2.\n\nFinetuned checkpoints can be found [here](https://drive.google.com/file/d/11UXYJHBKhgI6DhhkqQ6UpHFRXt3XNFQA/view?usp=sharing).\n\nCheckpoints should be organized as follows:\n```bash\n  logs/\n    └── IAM/\n      └── checkpoint.pth\n    └── other_model/\n      └── checkpoint.pth\n    ...\n```\n\u003c/details\u003e \n\u003cdetails\u003e\n\u003csummary\u003ePretraining\u003c/summary\u003e\n\n# Pretraining\nPretraining scipts are available in **scripts/pretraining**.\n## Latin scripts \nYou need  to download the folder [resources](https://drive.google.com/file/d/1XxeizTec4XOsLfyV_Q_dVQ1rMNWzbmoO/view?usp=sharing) (background, fonts, noises, texts)  and place it in the folder **dataset**.\n\nTo train models with random erasing:\n```bash\nsh scripts/pretraining/Synthetic_english_w_masking.sh\nsh scripts/pretraining/Synthetic_german_w_masking.sh\nsh scripts/pretraining/Synthetic_french_w_masking.sh\nsh scripts/pretraining/Synthetic_general.sh\n```\n## Chinese scripts \nYou need the dataset CASIA v1 [here]\n\nTo train a model with random erasing\n```bash\nsh scripts/pretraining/Synthetic_english.sh\n```\nThen for instances to train a model for chinese with random erasing:\n```bash\nbash scripts/pretraining/Synthetic_chinese_w_masking.sh\n```\n\u003c/details\u003e \n\u003cdetails\u003e\n\u003csummary\u003eFinetuning\u003c/summary\u003e\n\n# Finetuning\nFinetuning occurs in two stages. The scripts are available in **scripts/finetuning.**. For Step 1 it is expected that a model is pretrained is placed in the folder **logs/your_model_name**.\n\n\u003c/details\u003e \n\u003cdetails\u003e\n\u003csummary\u003eEvaluation\u003c/summary\u003e \n\n# Evaluation \nUse the scripts in **scripts/evaluating** to evaluate the model on the different datasets. \n\n\u003c/details\u003e \n\u003cdetails\u003e\n\u003csummary\u003eNgram\u003c/summary\u003e\n\n# Ngram\n## Evaluation\nWe provide our N-gran models for RIMES, READ and IAM [here](). We strongly advice to create a separate environment for the ngram model and to install the libraries in the [ngram/mini_guide.md](ngram/mini_guide.md).\nTo run an evalutation with the ngram model:\n```bash\nbash python ngram/clean_gen_ngram_preds.py --config_path ngram/IAM.yaml\nbash python ngram/clean_gen_ngram_preds.py --config_path ngram/READ.yaml\nbash python ngram/clean_gen_ngram_preds.py --config_path ngram/RIMES.yaml\n```\n## Training a ngram model\nTo train you own ngram model, follow the instructions in the [ngram/mini_guide.md](ngram/mini_guide.md)\n\u003c/details\u003e \n  \n## Citation\n\nIf you find this code useful, don't forget to \u003cb\u003estar the repo :star:\u003c/b\u003e and \u003cb\u003ecite the papers :point_down:\u003c/b\u003e\n\n```                         \n@article{baena2024DTLR, title={General Detection-based Text Line Recognition}, \nauthor={Raphael Baena and Syrine Kalleli and Mathieu Aubry}, \nbooktitle={NeurIPS},\nyear={2024}},\nurl={https://arxiv.org/abs/2409.17095},  \n```\n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fraphael-baena%2FDTLR","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fraphael-baena%2FDTLR","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fraphael-baena%2FDTLR/lists"}