https://github.com/chetanxpro/document-ai
A app to extract structured data from a pdf document
https://github.com/chetanxpro/document-ai
extract-data
Last synced: 7 months ago
JSON representation
A app to extract structured data from a pdf document
- Host: GitHub
- URL: https://github.com/chetanxpro/document-ai
- Owner: ChetanXpro
- Created: 2023-11-12T06:29:01.000Z (over 2 years ago)
- Default Branch: feat/typescript
- Last Pushed: 2023-11-30T18:56:14.000Z (over 2 years ago)
- Last Synced: 2025-03-05T04:44:18.102Z (about 1 year ago)
- Topics: extract-data
- Language: Python
- Homepage:
- Size: 1000 KB
- Stars: 2
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Extract Data from PDF
This script extracts structured data from PDF files.
## Prerequisites
Before you run the script, make sure you have:
- `Node.js` installed on your machine
- An API key for `gpt-4`.
## Setup Instructions
- First, define all the possible document types in the `types/documentType.ts` file.
- Then, specify all the document schemas in the `constants/schema.ts` file.
- To configure the script to work with your `gpt-4` API key and any other settings, you'll need to set up environment variables:
Rename the provided `.example.env` file to `.env` in the root directory of the project:
```mv .example.env .env```