Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/sdpdas/document-layout-generator-and-segmentation-tool

Lists all parts of a document PDF and is a highly scalable with robust code.
https://github.com/sdpdas/document-layout-generator-and-segmentation-tool

analysis document-classification numpy opencv-python pdf2image python

Last synced: 10 days ago
JSON representation

Lists all parts of a document PDF and is a highly scalable with robust code.

Awesome Lists containing this project

README

        

# Document Layout Generation

Note - Install necessary libraries such as opencv-python, pdf2image, img2pdf, numpy etc.
Download Poppler https://github.com/oschwartz10612/poppler-windows/releases/

## Description

This tool converts any document PDF to its segmented version PDF.

All middle work can be found in input and output image folder.

## How to Use?

1. Download the ZIP file of the project and open its source folder.

2. Navigate to layout.py and run it directly or by using

```python
python layout.py

3. If you want to use your own document add it to folder and change the path in layout.py

```python
pdf_file = '...\\(PDF_name).pdf'

4. Observe the results in Processed Files folder.

## Glossary

1. input_img : will show all images in openCV format when extracted from PDF.
2. output_img : will show all output images after resizing and object detection is done.
3. output_pdf : will display the resultant PDFs of merged output images.

# Screenshots:

![arxiv1](working/output_1.png)

![arxiv1](working/output_2.png)

# Created by Sagardeep Das