{"id":15361559,"url":"https://github.com/muhd-umer/pyramidtabnet","last_synced_at":"2025-10-13T07:15:51.484Z","repository":{"id":169066891,"uuid":"534552657","full_name":"muhd-umer/pyramidtabnet","owner":"muhd-umer","description":"Official PyTorch implementation of PyramidTabNet: Transformer-based Table Recognition in Image-based Documents","archived":false,"fork":false,"pushed_at":"2024-10-05T15:38:16.000Z","size":97543,"stargazers_count":28,"open_issues_count":0,"forks_count":2,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-09-05T06:06:35.398Z","etag":null,"topics":["computer-vision","deep-learning","document-analysis","implementation","pytorch","table-detection","table-structure-recognition"],"latest_commit_sha":null,"homepage":"https://doi.org/10.1007/978-3-031-41734-4_26","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/muhd-umer.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":"CITATION.bib","codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2022-09-09T07:59:45.000Z","updated_at":"2025-08-07T03:59:06.000Z","dependencies_parsed_at":null,"dependency_job_id":"c321911d-15f2-4984-8b4e-cbdc388294cd","html_url":"https://github.com/muhd-umer/pyramidtabnet","commit_stats":{"total_commits":107,"total_committers":3,"mean_commits":"35.666666666666664","dds":0.4953271028037384,"last_synced_commit":"08420b44c24d3c394342c66f7b93a3de78e3934f"},"previous_names":["muhd-umer/pyramidtabnet"],"tags_count":2,"template":false,"template_full_name":null,"purl":"pkg:github/muhd-umer/pyramidtabnet","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/muhd-umer%2Fpyramidtabnet","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/muhd-umer%2Fpyramidtabnet/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/muhd-umer%2Fpyramidtabnet/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/muhd-umer%2Fpyramidtabnet/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/muhd-umer","download_url":"https://codeload.github.com/muhd-umer/pyramidtabnet/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/muhd-umer%2Fpyramidtabnet/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":279014112,"owners_count":26085461,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-13T02:00:06.723Z","response_time":61,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["computer-vision","deep-learning","document-analysis","implementation","pytorch","table-detection","table-structure-recognition"],"created_at":"2024-10-01T12:55:38.530Z","updated_at":"2025-10-13T07:15:51.467Z","avatar_url":"https://github.com/muhd-umer.png","language":"Python","readme":"# PyramidTabNet\n\n[![License](https://img.shields.io/badge/License-MIT-blue.svg)](https://opensource.org/licenses/MIT) [![PyTorch](https://img.shields.io/badge/PyTorch-1.11.0-orange.svg)](https://pytorch.org/)\n\n[\u003cimg align=\"right\" width=\"250\" height=\"395\" src=\"https://media.springernature.com/full/springer-static/cover-hires/book/978-3-031-41734-4?as=webp\"/\u003e](https://link.springer.com/book/10.1007/978-3-031-41676-7)\n\n\u003e **PyramidTabNet: Transformer-Based Table Recognition in Image-Based Documents**\u003cbr\u003e\n\u003e [Muhammad Umer](https://github.com/muhd-umer),\n\u003e [Muhammad Ahmed Mohsin](https://github.com/ahmd-mohsin),\n\u003e [Adnan Ul-Hasan](https://dll.seecs.nust.edu.pk/author/adnan_ul_hassan/),\n\u003e and [Faisal Shafait](https://tukl.seecs.nust.edu.pk/members/faisal_shafait.html)\u003cbr\u003e\n\u003e Presented at [ICDAR 2023: International Conference on Document Analysis and Recognition](https://icdar2023.org/)\u003cbr\u003e\n\u003e [Springer Link](https://link.springer.com/chapter/10.1007/978-3-031-41734-4_26)\u003cbr\u003e\n\nIn this paper, we introduce PyramidTabNet (PTN), a method that builds upon Convolution-less Pyramid Vision Transformer to detect tables in document images. Furthermore, we present a tabular image generative augmentation technique to effectively train the architecture. The proposed augmentation process consists of three steps, namely, clustering, fusion, and patching, for the generation of new document images containing tables. Our proposed pipeline demonstrates significant performance improvements for table detection on several standard datasets. Additionally, it achieves performance comparable to the state-of-the-art methods for structure recognition tasks.\n\n## Dependencies\n_It is recommended to create a new virtual environment so that updates/downgrades of packages do not break other projects._\n\n- Environment characteristics\n  \u003cbr/\u003e`python = 3.9.12` `torch = 1.11.0` `cuda = 11.3` `torchvision = 0.12.0`\n\n  ```\n  conda install pytorch==1.11.0 torchvision==0.12.0 torchaudio==0.11.0 cudatoolkit=11.3 -c pytorch\n  ```\n\n- This repo uses toolboxes provided by `OpenMMLab` to train and test models. Head over to the official documentation of [MMDetection](https://github.com/open-mmlab/mmdetection) for [installation instructions](https://mmdetection.readthedocs.io/en/latest/) if you want to train your own model.\n\n- Alternatively, if all you want to do is to test the model, you can install `mmdet` as a third-party package. Run:\n\n  ```python\n  pip install -r requirements.txt\n  ```\n\n- After all the packages has been successfully installed, install `mmcv` by executing the following commands:\n\n  ```python\n  pip install -U openmim\n  mim install mmcv-full==1.6.0\n  ```\n\n- Alternatively, you can install `mmcv` using pip as:\n\n  ```\n  pip install mmcv-full==1.6.0 -f https://download.openmmlab.com/mmcv/dist/cu113/torch1.11/index.html\n  ```\n\n## Datasets\n\nWe provide the test set of `cTDaR - TRACK A` in `COCO JSON format` by default (for evaluation). You can access the full cTDaR dataset from the following publicly available GitHub repo: [cTDaR - All Tracks](https://github.com/cndplab-founder/ICDAR2019_cTDaR). Other public datasets can be downloaded and placed in [data](data/) directory for training/evaluation.\n\n## Data Augmentation\n\nRefer to [augmentation](augmentation#data-augmentation) directory for instructions on how to use the scripts to generate new document images.\n\n## Run \u003ca href=\"https://colab.research.google.com/github/muhd-umer/pyramidtabnet/blob/main/resources/pyramidtabnet.ipynb\"\u003e\u003cimg src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/\u003e \u003c/a\u003e\n\nFollowing sections provide instructions to evaluate and/or train PyramidTabNet on your own data.\u003cbr/\u003e\n_Note: It is recommended to execute the scripts from the project root in order to utilize the relative paths to the test set._\n\n### Training\n\n- Refer to [Data Augmentation](augmentation) to generate additional training samples to improve model performance. ❤️\n- Before firing up the `train.py` script, make sure to configure the data keys in the config file\n- _Refer to [MMDetection documentation](https://mmdetection.readthedocs.io/en/latest/2_new_data_model.html#train-with-customized-datasets) for more details on how to modify the keys._\n\n  ```python\n  python model/train.py path/to/config/file --gpu-id 0\n  ```\n\n- Alternatively, you can launch training on multiple GPUs using the following script:\n\n  ```powershell\n  bash model/dist_train.sh ${CONFIG_FILE} \\\n                          ${GPU_NUM} \\\n                          [optional args]\n  ```\n\n### Evaluation\n\n- Download link of fine-tuned weights are available in [this section.](#weights--metrics).\n- Execute `test.py` with the appropriate command line arguments. Example usage:\n\n  ```python\n  python model/test.py --config-file path/to/config/file \\\n                      --input path/to/directory \\\n                      --weights path/to/finetuned/checkpoint \\\n                      --device \"cuda\"\n  ```\n\n### Inference\n\n- To perform end-to-end table analysis (visualize detections) on a `single image/test directory`, execute `main.py`. Download the weights from [Weights \u0026 Metrics](#weights--metrics) and place them in the [weights/](weights/) directory. Example usage:\n\n  ```python\n  python main.py --config-file path/to/config/file \\\n                --input path/to/input/image or directory \\\n                --weights-dir path/to/weights/directory \\\n                --device \"cuda\"\n  ```\n\n### Detection Inference\n\n- To perform table detection on a `single image/test directory`, execute `td.py`. Example usage:\n\n  ```python\n  python model/td.py --config-file path/to/config/file \\\n                    --input path/to/input/image or directory \\\n                    --weights path/to/detection/weights \\\n                    --device \"cuda\" \\\n                    --save\n  ```\n\n### Recognition Inference\n\n- To perform stucture recognition on a `single image/test directory`, execute `tsr.py`. Example usage:\n\n  ```python\n  python model/tsr.py --config-file path/to/config/file \\\n                      --input path/to/input/image or directory \\\n                      --structure-weights path/to/structure/weights \\\n                      --cell-weights path/to/cell/weights \\\n                      --device \"cuda\" \\\n                      --save\n  ```\n\n## Weights \u0026 Metrics\n\nEvaluation metrics are displayed in the following tables. Note: End-user should place the downloaded weights in the [weights/](weights/) directory for a streamlined evaluation of scripts.\n\n- To download all the weights, execute:\n\n  ```powershell\n  bash weights/get_weights.sh\n  bash weights/fine_tuned.sh\n  ```\n\u003cdiv align=\"center\"\u003e\n\n**Table Detection**\n\n| \u003cdiv align=\"center\"\u003eModel\u003c/div\u003e | \u003cdiv align=\"center\"\u003eDataset\u003c/div\u003e | \u003cdiv align=\"center\"\u003ePrecision\u003c/div\u003e | \u003cdiv align=\"center\"\u003eRecall\u003c/div\u003e | \u003cdiv align=\"center\"\u003eF1\u003c/div\u003e | \u003cdiv align=\"center\"\u003eLink\u003c/div\u003e |\n| --- | --- | --- | --- | --- | --- |\n| PyramidTabNet | ICDAR 2017-POD \u003cbr\u003e ICDAR 2019 \u003cbr\u003e UNLV \u003cbr\u003e Marmot \u003cbr\u003e TableBank \u003cbr\u003e | 99.8 \u003cbr\u003e - \u003cbr\u003e 97.7 \u003cbr\u003e 92.1 \u003cbr\u003e 98.9 | 99.3 \u003cbr\u003e - \u003cbr\u003e 94.9 \u003cbr\u003e 98.2 \u003cbr\u003e 98.2 | 99.5 \u003cbr\u003e 98.7 \u003cbr\u003e 96.3 \u003cbr\u003e 95.1 \u003cbr\u003e 98.5 | [Link](https://github.com/muhd-umer/pyramidtabnet/releases/download/v0.1.0/icdar2017.pth) \u003cbr\u003e [Link](https://github.com/muhd-umer/pyramidtabnet/releases/download/v0.1.0/icdar2019.pth) \u003cbr\u003e [Link](https://github.com/muhd-umer/pyramidtabnet/releases/download/v0.1.0/unlv.pth) \u003cbr\u003e [Link](https://github.com/muhd-umer/pyramidtabnet/releases/download/v0.1.0/marmot.pth) \u003cbr\u003e [Link](https://github.com/muhd-umer/pyramidtabnet/releases/download/v0.1.0/tablebank.pth) |\n\n\u003c/div\u003e\n\n\u003cdiv align=\"center\"\u003e\n\n**Table Structure Recognition**\n\n| \u003cdiv align=\"center\"\u003eModel\u003c/div\u003e | \u003cdiv align=\"center\"\u003eDataset\u003c/div\u003e | \u003cdiv align=\"center\"\u003ePrecision\u003c/div\u003e | \u003cdiv align=\"center\"\u003eRecall\u003c/div\u003e | \u003cdiv align=\"center\"\u003eF1\u003c/div\u003e |\n| --- | --- | --- | --- | --- |\n| PyramidTabNet | ICDAR 2013 \u003cbr\u003e SciTSR \u003cbr\u003e FinTabNet \u003cbr\u003e| 92.3 \u003cbr\u003e 98.4 \u003cbr\u003e 93.2 | 95.3 \u003cbr\u003e 99.1 \u003cbr\u003e 88.6 | 93.8 \u003cbr\u003e 98.7 \u003cbr\u003e 90.8|\n\n\u003c/div\u003e \n\n_Note: FinTabNet fine-tuned model is for cell-detection._\n\n\u003cdiv align=\"center\"\u003e\n\n\u003c/div\u003e\n\n## Common Issues\n- Machines running variants of Microsoft Windows encounter issues with mmcv imports. Follow the [installation guide](https://mmcv.readthedocs.io/en/latest/get_started/installation.html) on the official MMCV documentation to resolve such issues. Example:\n\n  ```TypeScript\n  ModuleNotFoundError: No module named 'mmcv._ext'\n  ```\n\n- For table detection, if you get an error of the following form:\n\n  ```TypeScript\n  Error(s) in loading state_dict for TDModel; Missing key(s) in state_dict\n  ```\n\nResolve it by passing in the correct command line argument for `--config-file`.\n\n## BibTeX\nIf you find this work useful for your research, please cite our paper:\n\n```\n@inproceedings{umer2023pyramidtabnet,\n  title={PyramidTabNet: Transformer-Based Table Recognition in Image-Based Documents},\n  author={Umer, Muhammad and Mohsin, Muhammad Ahmed and Ul-Hasan, Adnan and Shafait, Faisal},\n  booktitle={International Conference on Document Analysis and Recognition},\n  pages={420--437},\n  year={2023},\n  organization={Springer}\n}\n```\n\n## Acknowledgements\n\n**Special thanks to the following contributors without which this repo would not be possible:**\n\n1. The [MMDetection](https://github.com/open-mmlab/mmdetection) team for creating their amazing framework to push the state of the art computer vision research and enabling us to experiment and build various models very easily.\n\u003cp align=\"center\"\u003e\n   \u003ca href=\"https://github.com/open-mmlab/mmdetection\"\u003e\u003cimg width=\"220\" height=\"75\" src=\"https://raw.githubusercontent.com/open-mmlab/mmdetection/master/resources/mmdet-logo.png\"/\u003e\u003c/a\u003e\n\u003c/p\u003e\n\n2. The authors of [Pyramid Vision Transformer (PVT v2)](https://arxiv.org/pdf/2106.13797.pdf) for their wonderful contribution to enhance advancements in computer vision.\n\n3. The authors of [Craft Text Detector](https://arxiv.org/abs/1904.01941) for their awesome repository for text detection.\n\n4. The author of [mAP Repo](https://github.com/Cartucho/mAP) for providing a straightforward script to evaluate deep learning models for object detection metrics.\n\n5. [Google Colaboratory](https://github.com/googlecolab) for providing free-high end GPU resources for research and development. All of the code base was developed using their platform and could not be possible without it.\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmuhd-umer%2Fpyramidtabnet","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmuhd-umer%2Fpyramidtabnet","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmuhd-umer%2Fpyramidtabnet/lists"}