{"id":18778938,"url":"https://github.com/ramsailopal/doc-scan","last_synced_at":"2026-05-09T14:07:34.276Z","repository":{"id":127741930,"uuid":"540357044","full_name":"RamSailopal/Doc-Scan","owner":"RamSailopal","description":"A demonstration of document QR Code/text scanning using Tesseract and opencv","archived":false,"fork":false,"pushed_at":"2022-09-23T09:44:07.000Z","size":2346,"stargazers_count":1,"open_issues_count":0,"forks_count":1,"subscribers_count":1,"default_branch":"main","last_synced_at":"2024-12-29T10:27:39.907Z","etag":null,"topics":["opencv","python3","qr-code","qr-generator","tesseract-ocr"],"latest_commit_sha":null,"homepage":"","language":"JavaScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/RamSailopal.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2022-09-23T08:56:13.000Z","updated_at":"2024-09-12T01:37:36.000Z","dependencies_parsed_at":null,"dependency_job_id":"fa09312a-85ba-4e48-aee1-73e40194bdc6","html_url":"https://github.com/RamSailopal/Doc-Scan","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/RamSailopal%2FDoc-Scan","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/RamSailopal%2FDoc-Scan/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/RamSailopal%2FDoc-Scan/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/RamSailopal%2FDoc-Scan/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/RamSailopal","download_url":"https://codeload.github.com/RamSailopal/Doc-Scan/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":239690074,"owners_count":19681035,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["opencv","python3","qr-code","qr-generator","tesseract-ocr"],"created_at":"2024-11-07T20:17:51.547Z","updated_at":"2025-12-17T19:30:17.140Z","avatar_url":"https://github.com/RamSailopal.png","language":"JavaScript","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Doc-Scan\n\nA demonstration of document QR Code/text scanning using Tesseract and opencv\n\nThis demonstration first builds a jpg version of the following pdf document:\n\nhttps://www.va.gov/vaforms/va/pdf/VA0730b.pdf\n\nThis jpg version is then used to display a web form along with a QR code for a unique document reference.\n\nOnce the web form is then printed, filled out by hand and scanned as a jpg, it is then processed to extract the printed/handwritten text along with the text associated with the QR code.\n\n# Deployment\n\n    git clone https://github.com/RamSailopal/Doc-Scan.git\n    \n    cd Doc-Scan\n    \n    docker-compose up\n    \n\nThe script **pdfconvert/convert.py** is used to generate the initial jpeg.\n\nThe web form can then be viewed in a browser by navigating to:\n\nhttp:dockerserveraddress:8080?ref=testref\n\nWhere **testref** is the reference to be translated into a QR code.\n\nOnce the document is filled and scanned, the resulting jpeg is then used to output text using the script **pdfscan/scan.py**\n\n# Demonstration\n\nThis demonstration takes the following jpg:\n\nhttps://github.com/RamSailopal/Doc-Scan/blob/main/pdfscan/FilledOut.jpg\n\nIt then processes the file to generate the following text\n\nhttps://github.com/RamSailopal/Doc-Scan/blob/main/pdfscan/pdfscanout.txt\n\n# Findings\n\nThe initial web form had to be scaled out to display on one page and this effected the quality of the jpeg and subsequently the OCR results. Printed text was fine, but hand written text proved difficult to process acccuratly. QR codes were not processed at all.\n\nAs a comparison, A \"screen grab\" of part of the web form was taken and then the mouse used to add text (as if it were a pen). The resulting jpg can be viewed here:\n\nhttps://github.com/RamSailopal/Doc-Scan/blob/main/pdfscan/doc-out1.png\n\nThe processed output can be seen here:\n\nhttps://github.com/RamSailopal/Doc-Scan/blob/main/pdfscan/pdfscanout1.txt\n\nWith original scaling and no loss of quality, the QR code is processed correctly as well as the printed text. The mouse written text is again \"patchy\"\n\n# Running your own examples\n\nOnce a form is printed, filled and scanned, add it to the pdfscan folder. Once this has been done run:\n\n    docker exec -it pdfscan /bin/bash -c 'cd /home/pdfscan \u0026\u0026 python3 scan1.py \u003cimagefilename\u003e \u003e \u003cnameofoutputfile\u003e'\n    \ni.e.\n\n    docker exec -it pdfscan /bin/bash -c 'cd /home/pdfscan \u0026\u0026 python3 scan1.py scannedimage.jpg \u003e outputtext.txt'\n    \nThe output data will then be available to be viewed in the file **pdfscan/outputtext.txt**\n\n# Improvements\n\nIn terms of hand written text, tesseract can be improved with \"training\" - https://tesseract-ocr.github.io/tessdoc/tess4/TrainingTesseract-4.00.html\n\n# References\n\nTesseract - https://tesseract-ocr.github.io/\n\nPython Tesseract - https://pypi.org/project/pytesseract/\n\nOpenCV QR Code detection - https://docs.opencv.org/4.x/de/dc3/classcv_1_1QRCodeDetector.html\n\nWeb QR Code generator - https://github.com/kazuhikoarase/qrcode-generator\n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Framsailopal%2Fdoc-scan","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Framsailopal%2Fdoc-scan","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Framsailopal%2Fdoc-scan/lists"}