https://github.com/bukalapak/ktpextractor
This is a service which takes KTP image as the input, and extract the data in the KTP as the output. This is a part of open source project by Data Scientists of Bukalapak.
https://github.com/bukalapak/ktpextractor
data datascience
Last synced: 6 months ago
JSON representation
This is a service which takes KTP image as the input, and extract the data in the KTP as the output. This is a part of open source project by Data Scientists of Bukalapak.
- Host: GitHub
- URL: https://github.com/bukalapak/ktpextractor
- Owner: bukalapak
- Created: 2020-01-22T10:36:35.000Z (about 6 years ago)
- Default Branch: master
- Last Pushed: 2020-01-23T06:50:20.000Z (about 6 years ago)
- Last Synced: 2025-07-05T00:11:33.663Z (7 months ago)
- Topics: data, datascience
- Language: Python
- Homepage:
- Size: 1.49 MB
- Stars: 96
- Watchers: 216
- Forks: 28
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# KTPextractor
This is a service to extract data from KTP image. This is a part of open source project by Data Scientists of Bukalapak. Other open source projects: https://github.com/bukalapak?q=data
### Config File
Please fill in the configuration in file `kyc_config.py`
`gcv_api_key_path`: path location of the GCV API Key. To get an API, check https://cloud.google.com/vision/docs/setup
`json_loc` = path location to save the OCR output from GCV
`output_loc` = path location to save the extracted KTP data
### OCR Text Extractor
To extract texts from an image (OCR), use the following command:
```
python ocr_text_extractor.py
```
The OCR output file will be saved in the `json_loc` (check config file)
### KTP Entity Extractor
To extract attributes from the KTP based on the OCR output, use the following command:
```
python ktp_entity_extractor.py
```
The extracted KTP data will be saved in csv format in the `output_loc` (check config file)
### KTP Data Extractor
To extract KTP data directly from KTP image, use the following command:
```
python KTPextractor_main.py
```
The extracted KTP data will be saved in csv format in the `output_loc` (check config file)