Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/jainammm/TableNet
Unofficial implementation of "TableNet: Deep Learning model for end-to-end Table detection and Tabular data extraction from Scanned Document Images"
https://github.com/jainammm/TableNet
Last synced: 9 days ago
JSON representation
Unofficial implementation of "TableNet: Deep Learning model for end-to-end Table detection and Tabular data extraction from Scanned Document Images"
- Host: GitHub
- URL: https://github.com/jainammm/TableNet
- Owner: jainammm
- Created: 2020-07-11T18:08:35.000Z (over 4 years ago)
- Default Branch: master
- Last Pushed: 2023-03-25T00:56:17.000Z (over 1 year ago)
- Last Synced: 2024-08-02T11:14:43.740Z (3 months ago)
- Language: Jupyter Notebook
- Size: 2.55 MB
- Stars: 314
- Watchers: 4
- Forks: 72
- Open Issues: 14
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# TableNet
Unofficial implementation of ICDAR 2019 paper : _TableNet: Deep Learning model for end-to-end Table detection and Tabular data extraction from Scanned Document Images._
[__Paper__](https://arxiv.org/abs/2001.01469)
## Overview
**Paper: TableNet: Deep Learning model for end-to-end Table detection and Tabular data extraction from Scanned Document Images**TableNet is a modern deep learning architecture that was proposed by a team from TCS Research year in the year 2019. The main motivation was to extract information from scanned tables through mobile phones or cameras.
They proposed a solution that includes accurate detection of the tabular region within an image and subsequently detecting and extracting information from the rows and columns of the detected table.
**Architecture:** The architecture is based out of Long et al., an encoder-decoder model for semantic segmentation. The same encoder/decoder network is used as the FCN architecture for table extraction. The images are preprocessed and modified using the Tesseract OCR.
Source: [Nanonets](https://nanonets.com/blog/table-extraction-deep-learning/#tablenet?&utm_source=nanonets.com/blog/&utm_medium=blog&utm_content=Table%20Detection,%20Information%20Extraction%20and%20Structuring%20using%20Deep%20Learning)
![architecture](https://github.com/jainammm/TableNet/raw/master/architecture.png)
## How to run
```
pip install -r requirements.txt
```1. Download the Marmot Dataset from the link given in readme.
1. Run `data_preprocess/generate_mask.py` to generate Table and Column Mask of corresponding images.
1. Follow the `TableNet.ipynb` notebook to train and test the model.## Challenges
* Require a very decent System with a good GPU for accurate result on High pixel images.## Dataset
Download the dataset provided in paper : [Marmot Dataset](https://drive.google.com/drive/folders/1QZiv5RKe3xlOBdTzuTVuYRxixemVIODp).