Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/sdpdas/document-layout-generator-and-segmentation-tool
Lists all parts of a document PDF and is a highly scalable with robust code.
https://github.com/sdpdas/document-layout-generator-and-segmentation-tool
analysis document-classification numpy opencv-python pdf2image python
Last synced: 10 days ago
JSON representation
Lists all parts of a document PDF and is a highly scalable with robust code.
- Host: GitHub
- URL: https://github.com/sdpdas/document-layout-generator-and-segmentation-tool
- Owner: SDpDas
- Created: 2024-06-23T21:16:40.000Z (6 months ago)
- Default Branch: main
- Last Pushed: 2024-06-24T08:04:06.000Z (6 months ago)
- Last Synced: 2024-06-24T22:51:56.730Z (6 months ago)
- Topics: analysis, document-classification, numpy, opencv-python, pdf2image, python
- Language: Python
- Homepage:
- Size: 58.7 MB
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Document Layout Generation
Note - Install necessary libraries such as opencv-python, pdf2image, img2pdf, numpy etc.
Download Poppler https://github.com/oschwartz10612/poppler-windows/releases/## Description
This tool converts any document PDF to its segmented version PDF.
All middle work can be found in input and output image folder.
## How to Use?
1. Download the ZIP file of the project and open its source folder.
2. Navigate to layout.py and run it directly or by using
```python
python layout.py3. If you want to use your own document add it to folder and change the path in layout.py
```python
pdf_file = '...\\(PDF_name).pdf'4. Observe the results in Processed Files folder.
## Glossary
1. input_img : will show all images in openCV format when extracted from PDF.
2. output_img : will show all output images after resizing and object detection is done.
3. output_pdf : will display the resultant PDFs of merged output images.# Screenshots:
![arxiv1](working/output_1.png)
![arxiv1](working/output_2.png)
# Created by Sagardeep Das