Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/dvelkow/pdf_data_scraper
Scrapes all the data from a pdf and turns it into a relational table in a csv format, uses chatgpt API to do so
https://github.com/dvelkow/pdf_data_scraper
Last synced: 18 days ago
JSON representation
Scrapes all the data from a pdf and turns it into a relational table in a csv format, uses chatgpt API to do so
- Host: GitHub
- URL: https://github.com/dvelkow/pdf_data_scraper
- Owner: dvelkow
- Created: 2024-08-06T20:54:20.000Z (6 months ago)
- Default Branch: main
- Last Pushed: 2024-08-06T22:53:35.000Z (6 months ago)
- Last Synced: 2025-01-11T04:07:05.346Z (19 days ago)
- Language: Python
- Size: 2.93 KB
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
## Overview
This project implements a smart PDF data extractor that uses OpenAI's GPT model to go through PDF documents and extract specific information. The extracted data is then saved into a CSV file, making it easy to analyze and process the information.## Features
- Extract user-specified data fields from PDF documents
- Output extracted data to a CSV file for easy analysis## How To Run
First, clone the repository:`` git clone https://github.com/dvelkow/pdf_data_scraper ``
Then navigate to the directory and run the main.py file:
`` python3 main.py ``
#### IMPROTANT: The script will prompt you to enter the data fields you want to extract (comma-separated)