https://github.com/marcusmonteirodesouza/google-cloud-document-ai-rest-api-demo
Create an Identity Auto-Filler API with Google Cloud Document AI
https://github.com/marcusmonteirodesouza/google-cloud-document-ai-rest-api-demo
document-ai express google-cloud google-cloud-platform nextjs nodejs terraform
Last synced: 5 months ago
JSON representation
Create an Identity Auto-Filler API with Google Cloud Document AI
- Host: GitHub
- URL: https://github.com/marcusmonteirodesouza/google-cloud-document-ai-rest-api-demo
- Owner: marcusmonteirodesouza
- License: other
- Created: 2023-08-08T12:05:28.000Z (almost 3 years ago)
- Default Branch: main
- Last Pushed: 2023-08-13T19:01:41.000Z (almost 3 years ago)
- Last Synced: 2024-01-28T21:38:25.385Z (over 2 years ago)
- Topics: document-ai, express, google-cloud, google-cloud-platform, nextjs, nodejs, terraform
- Language: TypeScript
- Homepage:
- Size: 76.4 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Google Cloud Document AI REST API demo
## API Endpoints
### Parse US ID Driver License
`/v1/documents/countries/us/ids/driver-licenses/parse`
Input: US driver license file.
Example output:
```json
{
"address": "123 MAIN STREET\nAPT. 1\nHARRISBURG, PA 17101-0000",
"dateOfBirth": "01/07/1973",
"documentId": "99 999 999",
"expirationDate": "01/08/2026",
"familyName": "SAMPLE",
"givenNames": "ANDREW JASON",
"issueDate": "01/07/2022",
"portraitImage": "base64 encoded portrait image"
}
```
### US ID Proof
`/v1/documents/countries/us/ids/id-proof`
Input: US ID document.
Example output:
```json
{
"fraudSignalsIsIdentityDocument": "PASS",
"fraudSignalsSuspiciousWords": "SUSPICIOUS_WORDS_FOUND",
"evidenceSuspiciousWord": ["SPECIMEN"],
"evidenceInconclusiveSuspiciousWord": [],
"fraudSignalsImageManipulation": "PASS",
"fraudSignalsOnlineDuplicate": "POSSIBLE_ONLINE_DUPLICATE",
"evidenceHostname": ["theforumnewsgroup.com"],
"evidenceThumbnailUrl": [
"https://encrypted-tbn3.gstatic.com/images?q=tbn:ANd9GcSSYKZslJGQPVhC8_IAz3wgo1gA2Hv7hO531VyxuP_J0Kgka_o7"
]
}
```
### Parse US Passport
`/v1/documents/countries/us/ids/passports/parse`
Input: US passport file.
Example output:
```json
{
"address": null,
"dateOfBirth": "05 FEB 1965",
"documentId": "E00009349",
"expirationDate": "09 JUL 2030",
"familyName": "TRAVELER",
"givenNames": "HAPPY",
"issueDate": "10 JUL 2020",
"mrzCode": "P Triggers -> Click the "Run" button on the `apps` trigger row -> Click the "Run Trigger" button.
1. Go to Cloud Build -> History, and follow the build's progress.
1. Go to Load Balancing -> `api-url-map`, and copy the IP address. Follow [this guide](https://cloud.google.com/load-balancing/docs/https/setup-global-ext-https-serverless#update_dns) for the SSL certificate to be signed and have HTTPS set.
### Train US Patent Parser Custom Document Extractor
The US Patent Parser is a Document AI [Custom Document Extractor](https://cloud.google.com/document-ai/docs/workbench/build-custom-processor). To train it, follow the steps below:
1. Go to Key Management -> take note of the location of the `doc-ai-key`.
1. Go to Cloud Storage -> click "Create" to [create a GCS bucket](https://cloud.google.com/storage/docs/creating-buckets) -> You can name it `-us-patent-parser-v1-0-0-initial-data-import` -> For the location, select the same region as the `doc-ai-key` -> Click "Continue" until the "Choose how to protect object data" section -> open the "Data encryption" accordion, click "Customer-managed encryption key (CMEK)" and select the `doc-ai-key` as the encryption key -> click "Create". Now click "Upload Folder" and upload the [US patents labeled data folder](./data/documents/us/patents/labeled/).
1. Now go to Document AI -> My Processors -> Click the `us-patent-parser` processor -> Train.
1. Click "Show Advanced Options" -> Click "I'l specify my own location -> select the `-us-patent-parser-dataset` bucket. Wait for the dataset configuration to finish.
1. Click the "Import Documents" button -> click "Browse" -> select the bucket you imported the US patents labeled data to and select the `labeled` folder -> In the "Data split" dropdown on the right, select `Auto-split` -> click "Import". Wait for the import to finish.
1. Click "Edit Schema", enable all the labels, set the labels according to the table below, and then click "Save":
| Name | Data type | Occurrence |
| ------------------- | ---------- | ------------- |
| applicant_line_1 | Plain Text | Required once |
| application_number | Number | Required once |
| class_international | Plain Text | Required once |
| class_us | Plain Text | Required once |
| filing_date | Datetime | Required once |
| inventor_line_1 | Plain text | Required once |
| issuer | Plain text | Required once |
| patent_number | Number | Required once |
| publication_date | Datetime | Required once |
| title_line_1 | Plain text | Required once |
1. Go back to the "Train" tab, and click "Train New Version". You can name the version `v1-0-0`, and then click "Start Training". Wait for the training to finish: it can take more than 1 hour for it to finish.
1. Check the processor's [F1 score](https://cloud.google.com/document-ai/docs/workbench/evaluate#all-labels): it should show more than `0.9` for all labels.
1. Go to the Manage Versions tab -> click the three dots on the right of the model version -> click "Deploy version", and wait for it to finish. It can take more than 10 minutes for it to finish.
1. Click the three dots again and click "Set as default".
### Now your API should be ready to use!