https://github.com/dotx12/reformatpdf
Extracting data from a PDF table and converting it to JSON for further work.
https://github.com/dotx12/reformatpdf
pdf python python3 reformat tabula tabula-py
Last synced: about 1 year ago
JSON representation
Extracting data from a PDF table and converting it to JSON for further work.
- Host: GitHub
- URL: https://github.com/dotx12/reformatpdf
- Owner: dotX12
- Created: 2021-03-15T18:09:51.000Z (over 5 years ago)
- Default Branch: master
- Last Pushed: 2021-03-16T17:26:28.000Z (over 5 years ago)
- Last Synced: 2025-01-29T04:41:26.553Z (over 1 year ago)
- Topics: pdf, python, python3, reformat, tabula, tabula-py
- Language: Python
- Homepage:
- Size: 7.4 MB
- Stars: 2
- Watchers: 1
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# ReformatPDF
A program that allows you to convert large PDF tables to JSON for further work.
Spoiler func reformat
```text
def reformat(self) -> dict:
В зависимости от полученного списка словарей форматирует данные по нужному шаблону.
[{'top': 241.32117, 'left': 114.458305, 'width': 148.84747314453125, 'height': 23.783233642578125,
'text': 'Шулаева Евгения Юрьевна\r(Бакалавриат)'},
{'top': 241.32117, 'left': 263.3058, 'width': 58.32025146484375, 'height': 23.783233642578125, 'text': ''},
{'top': 241.32117, 'left': 321.62604, 'width': 53.400787353515625, 'height': 23.783233642578125, 'text': ''},
{'top': 241.32117, 'left': 375.02682, 'width': 95.03976440429688, 'height': 23.783233642578125, 'text': 'Копия'},
{'top': 241.32117, 'left': 470.0666, 'width': 94.81597900390625, 'height': 23.783233642578125, 'text': 'V'},
{'top': 0.0, 'left': 0.0, 'width': 0.0, 'height': 0.0, 'text': ''}]
Вернет :
{'name': 'Шулаева Евгения Юрьевна', 'form_of_education': 'Бакалавриат', 'document_type': 'Копия', 'consent': True}
[{'top': 264.92746, 'left': 64.614006, 'width': 49.84033966064453, 'height': 12.490386962890625, 'text': '48.03.01'},
{'top': 264.92746, 'left': 114.454346, 'width': 148.8521728515625, 'height': 12.490386962890625, 'text': 'Теология'},
{'top': 264.92746, 'left': 263.30652, 'width': 58.32073974609375, 'height': 12.490386962890625, 'text': '9'},
{'top': 264.92746, 'left': 321.62726, 'width': 53.399078369140625, 'height': 12.490386962890625, 'text': '32'},
{'top': 264.92746, 'left': 375.02634, 'width': 95.03768920898438, 'height': 12.490386962890625, 'text': ''},
{'top': 264.92746, 'left': 470.06403, 'width': 94.8138427734375, 'height': 12.490386962890625, 'text': '9'}]
Вернет:
{'code': '48.03.01', 'name': 'Теология', 'number_of_seats': '9', 'applications': '32', 'consent': '9'}
```