Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/jacastanon01/elmwood-records
Automating data entry of burial records into Crypt Keeper Online DBMS
https://github.com/jacastanon01/elmwood-records
Last synced: about 1 month ago
JSON representation
Automating data entry of burial records into Crypt Keeper Online DBMS
- Host: GitHub
- URL: https://github.com/jacastanon01/elmwood-records
- Owner: jacastanon01
- Created: 2024-06-27T15:05:35.000Z (6 months ago)
- Default Branch: main
- Last Pushed: 2024-08-09T15:58:02.000Z (5 months ago)
- Last Synced: 2024-08-09T17:14:37.128Z (5 months ago)
- Language: Python
- Homepage:
- Size: 34.2 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Project Overview
Since 1872, [Elmwood Cemetery](https://elmwoodcemeterykc.org/) has been a historic burial ground in Kansas City, Missouri and is listed in the National Register of Historic Places. The cemetery is currently undergoing an archival project and need help with digitizing their burial records into a searchable database with the [CryptKeeper Cemetery Software](https://ckonline.tbgtom.com/).
This project is an effort to create a digital archive of all of Elmwood's burial records, utilizing Tesseract OCR to extract text from scanned documents.
## Process 📝
### Authentication and Google Drive Integration
Utilized google-auth, google-auth-oauthlib, and google-api-python-client libraries to authenticate and access files stored in Google Drive.
### PDF to Image Conversion
Employed the pdf2image library to convert PDF pages into image files, facilitating OCR processing.
### Text Extraction with Tesseract OCR
Applied Tesseract OCR through the pytesseract library to extract text from the converted image files, allowing for the digitization of scanned documents.