https://github.com/onify/blueprint-aws-textract-pdf-to-form
Onify Blueprint: Amazon AWS Textract - PDF to form example
https://github.com/onify/blueprint-aws-textract-pdf-to-form
agent ai amazon amazon-textract blueprint bpmn flow form machine-learning nodejs ocr onify onify-blueprint onify-blueprints pdf rpa s3 s3-bucket textract
Last synced: 6 months ago
JSON representation
Onify Blueprint: Amazon AWS Textract - PDF to form example
- Host: GitHub
- URL: https://github.com/onify/blueprint-aws-textract-pdf-to-form
- Owner: onify
- License: mit
- Created: 2021-04-12T19:50:20.000Z (over 4 years ago)
- Default Branch: main
- Last Pushed: 2022-06-27T05:27:00.000Z (over 3 years ago)
- Last Synced: 2025-04-11T23:16:34.820Z (6 months ago)
- Topics: agent, ai, amazon, amazon-textract, blueprint, bpmn, flow, form, machine-learning, nodejs, ocr, onify, onify-blueprint, onify-blueprints, pdf, rpa, s3, s3-bucket, textract
- Language: JavaScript
- Homepage: https://onify.co
- Size: 788 KB
- Stars: 7
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README

[](https://www.repostatus.org/#wip)
# Onify Blueprint: Amazon AWS Textract - PDF to form example
Example how to 1) upload files to AWS S3 and 2) process the PDF file via AWS Textract and 3) send link to form to validate data from PDF. What you need to do is decide where the data from the form should go. But that is a different story and a different Blueprint :-)
**Amazon Textract**
Amazon Textract is a machine learning service that automatically extracts text, handwriting and data from scanned documents that goes beyond simple optical character recognition (OCR) to identify, understand, and extract data from forms and tables. Read more: https://aws.amazon.com/textract/.
## Screenshots
Here is a example of the PDF files that are in the "inbox".

### AWS Textract
Here is how Amazon Textract sees the PDF as a form.

### Form
Here is how the data ends up in the form in Onify.

### Onify Flow

## Requirements
* Onify Hub API 2.3.0 or later
* Mail configured in Onify Hub
* Onify Agent (tagged `agent`)
* Onify Flow license
* Node.js installed (on agent)
* Camunda Modeler 4.4 or later
* Amazon AWS services: S3 Bucket, SNS and SQS## Included
* 1 x Flow
* 3 x Scripts (nodejs)## Setup
### Amazon AWS
In order for this to work you need the following setup:
1. Amazon S3 Bucket
2. AWS user with [permissions](https://docs.aws.amazon.com/textract/latest/dg/api-async-roles.html)
3. Document access key (`accessKeyId`) and Secure Access Key for AWS user (`secretAccessKey`)> NOTE: For more information, please read [Configuring Amazon Textract for Asynchronous Operations](https://docs.aws.amazon.com/textract/latest/dg/api-async-roles.html)
> NOTE: Amazon Textract is not available in all regions. Also make sure S3 bucket and Textract are in same region.
### Onify Agent
* Copy files from `.\resources\agent\scripts` to `.\scripts` folder on Onify Agent.
* Run `npm install` from the `.\scripts` folder
* Update `aws_config.json` with AWS credentials and region.### Onify Flow
Update flow (`aws-textract-pdf-to-form.bpmn`) with your own variables:
* `inboxPath` - Path to the PDF files
* `bucket` - S3 bucket to upload files
* `mailTo` - Where to send the link to the form
* `onifyUrl` - URL to Onify APP (default is http://localhost:3000)
* `roleArn` - The Amazon Resource Name (ARN) of an IAM role that gives Amazon Textract publishing permissions to the Amazon SNS topic
* `snsTopicArn` - The Amazon SNS topic that Amazon Textract posts the completion status to
* `sqsQueueUrl` - Amazon SQS url that is subscribed to the SNS topic## Run
1. Open `aws-textract-pdf-to-form.bpmn` in Camunda Modeler
2. Click `Start current diagram`## Support
* Community/forum: https://support.onify.co/discuss
* Documentation: https://support.onify.co/docs
* Support and SLA: https://support.onify.co/docs/get-support## License
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.