https://github.com/abdoelsayed2016/TNCR_Dataset
Deep learning, Convolutional neural networks, Image processing, Document processing, Table detection, Page object detection, Table classification. https://www.sciencedirect.com/science/article/pii/S0925231221018142
https://github.com/abdoelsayed2016/TNCR_Dataset
classification cnn convolutional-neural-networks deep-learning image-classification mmdetection object-detection object-recognition pytorch table table-detection table-recognition
Last synced: 3 days ago
JSON representation
Deep learning, Convolutional neural networks, Image processing, Document processing, Table detection, Page object detection, Table classification. https://www.sciencedirect.com/science/article/pii/S0925231221018142
- Host: GitHub
- URL: https://github.com/abdoelsayed2016/TNCR_Dataset
- Owner: abdoelsayed2016
- License: mit
- Created: 2021-04-21T02:11:27.000Z (about 4 years ago)
- Default Branch: main
- Last Pushed: 2024-02-24T19:39:28.000Z (about 1 year ago)
- Last Synced: 2024-11-14T03:34:30.159Z (6 months ago)
- Topics: classification, cnn, convolutional-neural-networks, deep-learning, image-classification, mmdetection, object-detection, object-recognition, pytorch, table, table-detection, table-recognition
- Language: Python
- Homepage:
- Size: 750 KB
- Stars: 65
- Watchers: 3
- Forks: 4
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- Awesome-Table-Recognition - TNCR
README
# TNCR: Table Net Detection and Classification Dataset
[](https://pytorch.org/)
[](https://github.com/open-mmlab/mmdetection)> **TNDR: Table Net Detection and Classification Dataset**
> [Abdelrahman Abdallah](https://github.com/abdoelsayed2016),
> [Alexander Berendeev](),
> [Islam Nuradin](),
> [Daniyar Nurseitov](),
>
## Abstract
We present TNCR, a new table dataset with varying image quality collected from open access websites. TNCR dataset can be used for table detection in scanned document images and their classification into 5 different classes.TNCR contains 9428 labeled tables with approximately 6621 images . In this paper, we have implemented state-of-the-art deep learning-based methods for table detection to create several strong baselines. Deformable DERT with Resnet-50 Backbone Network achieves the highest performance compared to other methods with a precision of 86.7\%, recall of 89.6\%, and f1 score of 88.1\% on the TNCR dataset. We have made TNCR open source in the hope of encouraging more deep learning approaches to table detection, classification and structure recognition.
## keyword
Deep learning, Convolutional neural networksImage processing, Document processing, Table detection, Page object detection, Table classification
## Getting Started
### Install MMDetection v2.10.0+
TNCR has been implemented and tested with Python 3.7 and PyTorch 1.8.1.
```
%cd $project_dir$
!pip install -q mmcv terminaltables
!git clone 'https://github.com/open-mmlab/mmdetection.git'
!pip install -r "$project_dir$/mmdetection/requirements/optional.txt"
%cd mmdetection/
!python setup.py install
!python setup.py develop
!pip install -r {"$project_dir$/mmdetection/requirements.txt"}
!pip install pillow
!pip install mmcv
!pip install mmcv-full
%cd ..
!pip uninstall pycocotools
!pip uninstall mmpycocotools
!pip install mmpycocotools
```## requirements
```
Python: 3.7
PyTorch: 1.8.1
OpenCV: 4.5.2
MMCV: 1.3.5
MMDetection: v2.10.0
```## TNCR Dataset
You can download the dataset through this link
or from Google Drive divide by 5 parts
* Part #1
* Part #2
* Part #3
* Part #4
* Part #5Full Lined
Merged Cells
![]()
No lines
![]()
Partial Lined
![]()
Partial Lined Merged Cells
![]()
## Models Zoo
All config and checkpoint files available in this linkCheckout our demo notebook for loading checkpoints and performing inference
[](https://colab.research.google.com/drive/1lI0ghISktOYkEpaxDEsnEyQ2botA34oB)
#### 1. Cascade Mask R-CNN
BackbonesConfig FilesCheckpoint File
Resnet-50_1x Config Files Checkpoint
Resnet-50_20e Config Files Checkpoint
Resnet-101_1x Config Files Checkpoint
Resnet-101_20e Config Files Checkpoint
ResNeXt-101-32x4d_1x Config Files Checkpoint
ResNeXt-101-64x4d_1x Config Files Checkpoint
#### 2. Cascade R-CNN
BackbonesConfig FilesCheckpoint File
Resnet-50_1x Config Files Checkpoint
Resnet-50_20e Config Files Checkpoint
Resnet-101_1x Config Files Checkpoint
Resnet-101_20e Config Files Checkpoint
ResNeXt-101-32x4d_1x Config Files Checkpoint
ResNeXt-101-64x4d_1x Config Files Checkpoint
#### 3. Cascade RPN
Method BackbonesConfig FilesCheckpoint File
Fast R-CNN Resnet-50_1x Config Files Checkpoint
CRPN Resnet-50_1x Config Files Checkpoint
#### 4. Hybrid Task Cascade
BackbonesConfig FilesCheckpoint File
Resnet-50_1x Config Files Checkpoint
Resnet-50_20e Config Files Checkpoint
Resnet-101_1x Config Files Checkpoint
#### 5. YOLO
BackbonesConfig FilesCheckpoint File
DarkNet-53_320 Config Files Checkpoint
DarkNet-53_416 Config Files Checkpoint
DarkNet-53_608 Config Files Checkpoint
#### 5. Deformable DERT
BackbonesConfig FilesCheckpoint File
R-50_1 Config Files Checkpoint
## License
The code of TNCR is Open Source under the [MIT License](LICENSE). There is no limitation for both acadmic and commercial usage.## Cite as
If you find this work useful for your research, please cite our paper:
```
@misc{https://doi.org/10.48550/arxiv.2211.08469,
doi = {10.48550/ARXIV.2211.08469},
url = {https://arxiv.org/abs/2211.08469},
author = {Kasem, Mahmoud and Abdallah, Abdelrahman and Berendeyev, Alexander and Elkady, Ebrahem and Abdalla, Mahmoud and Mahmoud, Mohamed and Hamada, Mohamed and Nurseitov, Daniyar and Taj-Eddin, Islam},
keywords = {Computer Vision and Pattern Recognition (cs.CV), FOS: Computer and information sciences, FOS: Computer and information sciences},
title = {Deep learning for table detection and structure recognition: A survey},
publisher = {arXiv},
year = {2022},
copyright = {Creative Commons Attribution 4.0 International}
}```
```
@article{ABDALLAH2021,
title = {TNCR: Table Net Detection and Classification Dataset},
journal = {Neurocomputing},
year = {2021},
issn = {0925-2312},
doi = {https://doi.org/10.1016/j.neucom.2021.11.101},
url = {https://www.sciencedirect.com/science/article/pii/S0925231221018142},
author = {Abdelrahman Abdallah and Alexander Berendeyev and Islam Nuradin and Daniyar Nurseitov},
keywords = {Deep learning, Convolutional neural networks, Image processing, Document processing, Table detection, Page object detection},
abstract = {We present TNCR, a new table dataset with varying image quality collected from open access websites. TNCR dataset can be used for table detection in scanned document images and their classification into 5 different classes. TNCR contains 9428 labeled tables with approximately 6621 images . In this paper, we have implemented state-of-the-art deep learning-based methods for table detection to create several strong baselines. Deformable DERT with Resnet-50 Backbone Network achieves the highest performance compared to other methods with a precision of 86.7%, recall of 89.6%, and f1 score of 88.1% on the TNCR dataset. We have made TNCR open source in the hope of encouraging more deep learning approaches to table detection, classification and structure recognition. The dataset and trained model checkpoints are available at https://github.com/abdoelsayed2016/TNCR_Dataset.}
}
```
```