{"id":25817328,"url":"https://github.com/pilarcode/receipt-ocr","last_synced_at":"2025-02-28T06:34:10.136Z","repository":{"id":172864423,"uuid":"375042332","full_name":"pilarcode/receipt-ocr","owner":"pilarcode","description":"Named entity recognition (NER). Extraction of features from images of receipts with different formats. #NER #OCR 🛒🏷️","archived":false,"fork":false,"pushed_at":"2024-02-03T12:30:56.000Z","size":8000,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2024-02-03T13:31:52.231Z","etag":null,"topics":["flask-api","ocr-python"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":false,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/pilarcode.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null}},"created_at":"2021-06-08T14:38:13.000Z","updated_at":"2024-02-03T13:31:52.256Z","dependencies_parsed_at":null,"dependency_job_id":"4a34f1e4-224a-4d46-ba3a-db6d381db7b0","html_url":"https://github.com/pilarcode/receipt-ocr","commit_stats":null,"previous_names":["pilarcode/receipt-ocr"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pilarcode%2Freceipt-ocr","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pilarcode%2Freceipt-ocr/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pilarcode%2Freceipt-ocr/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pilarcode%2Freceipt-ocr/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/pilarcode","download_url":"https://codeload.github.com/pilarcode/receipt-ocr/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":241112475,"owners_count":19911694,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["flask-api","ocr-python"],"created_at":"2025-02-28T06:34:09.711Z","updated_at":"2025-02-28T06:34:10.126Z","avatar_url":"https://github.com/pilarcode.png","language":"Jupyter Notebook","readme":"# Object Character Recognition(OCR) pytesseract \n\n## Overview \n- Este [notebook](https://github.com/pilarcode/notebooks/blob/dev/ocr_recibos_pytesseract.ipynb) contiene un experimento sobre las funcionales que podemos realizar con [Pytesseract](https://pypi.org/project/pytesseract/), una libreria open source para optical character recognition.\n\n- También contiene un servicio web con [Flask](https://flask.palletsprojects.com/en/2.2.x/) que recibe una imagen codificada en base64 y realiza la extracción de caractéristicas de la imagen del recibo (precio item,descripción item, total). Ese servicio llama a servicios de AWS para la extracción de entidades y utiliza expresiones regulares en el prepocesamiento.\n\n## Notes\nClasificar las compras nos permite predecir los gastos que realizará un cliente o realizar compras de forma automática.\n\n- [x]  Implementar un servicio web para el reconocimiento de texto en imágenes como alternativa al que ya se encuentra disponible en la plataforma de Aws que es de coste.\n- [x] Explorar los datasets de facturas o recibos disponibles en el estado del arte para utilizarlo en nuestro caso de uso.\n- [x] Tarea de extracción de datos. Dada una imagen de un recibo o ticket de compra obtener el nombre del establecimiento donde se realizo la compra, fecha de la compra y el listado de los productos (establecimiento, nombre producto, precio del producto) en formato texto.\n- [ ] Tarea de almacenamiento: Guardar la información del fichero txt en una base de datos NoSQL ( por ejemplo: Amazon DynamoDB) para categorizar las compras. \n\n\n\u003cimg src=\"https://github.com/pilarcode/demo-receipt-ocr/blob/main/portada_readme.png\" name=\"ejemplo recibo con las entidades extraidas con Pytesseract\" width=\"400\"/\u003e\n\n## Resources\n### OCR opensource\n- Tesseract https://tesseract-ocr.github.io/ https://opensource.google/projects/tesseract\n- Free OCR API https://ocr.space/ocrapi\n- Top 5 https://rapidapi.com/blog/top-5-ocr-apis/\n\n###  Datasets\n- SROIE2019 https://drive.google.com/drive/folders/1ShItNWXyiY1tFDM5W02bceHuJjyeeJl2\n- FUNSD https://guillaumejaume.github.io/FUNSD/\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpilarcode%2Freceipt-ocr","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fpilarcode%2Freceipt-ocr","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpilarcode%2Freceipt-ocr/lists"}