{"id":22645644,"url":"https://github.com/ht0710/receipt-information-extraction","last_synced_at":"2025-07-29T09:39:14.622Z","repository":{"id":264166541,"uuid":"606203239","full_name":"HT0710/Receipt-Information-Extraction","owner":"HT0710","description":"Receipt-Information-Extraction","archived":false,"fork":false,"pushed_at":"2025-04-26T13:02:11.000Z","size":5842752,"stargazers_count":15,"open_issues_count":1,"forks_count":6,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-04-26T14:19:18.142Z","etag":null,"topics":["deep-learning","information-extraction","machine-learning","mc-ocr","ocr-recognition","receipt"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/HT0710.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2023-02-24T20:51:10.000Z","updated_at":"2025-04-26T13:02:14.000Z","dependencies_parsed_at":"2024-11-23T06:46:31.471Z","dependency_job_id":null,"html_url":"https://github.com/HT0710/Receipt-Information-Extraction","commit_stats":null,"previous_names":["ht0710/receipt-information-extraction"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/HT0710/Receipt-Information-Extraction","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/HT0710%2FReceipt-Information-Extraction","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/HT0710%2FReceipt-Information-Extraction/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/HT0710%2FReceipt-Information-Extraction/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/HT0710%2FReceipt-Information-Extraction/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/HT0710","download_url":"https://codeload.github.com/HT0710/Receipt-Information-Extraction/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/HT0710%2FReceipt-Information-Extraction/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":267664903,"owners_count":24124416,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-07-29T02:00:12.549Z","response_time":2574,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["deep-learning","information-extraction","machine-learning","mc-ocr","ocr-recognition","receipt"],"created_at":"2024-12-09T06:06:44.076Z","updated_at":"2025-07-29T09:39:14.615Z","avatar_url":"https://github.com/HT0710.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# \u003cp align=\"center\"\u003eReceipt Information Extraction (RIE)\u003c/p\u003e\n\n**Project completed on 21/03/2023**\n\nThis project is inspired by MC-OCR, more information about this competition can access here: [Link](https://www.rivf2021-mc-ocr.vietnlp.com/)\n\n![ex_2](example/ex_2.png)\n\n---\n## Table of Contents\n- [Receipt Information Extraction (RIE)](#receipt-information-extraction-rie)\n  - [Table of Contents](#table-of-contents)\n  - [Introduction](#introduction)\n  - [Install](#install)\n    - [Version](#version)\n    - [Requirements](#requirements)\n  - [Usage](#usage)\n    - [One-time run](#one-time-run)\n    - [Each step run](#each-step-run)\n      - [1. Remove background](#1-remove-background)\n      - [2. Rotate](#2-rotate)\n      - [3. Extract information](#3-extract-information)\n  - [Results](#results)\n  - [Citations](#citations)\n  - [License](#license)\n  - [References](#references)\n  - [Contact](#contact)\n\n\n## Introduction\nReceipt Information Extraction (RIE) is a task that involves extracting structured data from unstructured receipts. The goal is to identify and extract key information such as date, time, total amount, tax amount, items purchased, etc. from receipts in various formats and languages. This task can be useful for applications such as expense management, accounting, fraud detection and analytics.\n\n![pipeline](example/pipeline.png)\n\n\n## Install\nDownload the project\n1. Click here: [RIE](https://github.com/HT0710/Receipt-Information-Extraction/archive/refs/heads/main.zip)\n2. Unzip the `main.zip` file\n3. Navigate into the project folder:\n```bash\ncd path-to-the-project-folder\n```\n\n\nClone with git: **(NOT RECOMMENDED)**\n```bash\n# very slow download speed\ngit clone https://github.com/HT0710/Receipt-Information-Extraction\ncd Receipt-Information-Extraction\n```\n\nUsing `wget` on Linux:\n```bash\n# faster download speed\nwget https://github.com/HT0710/Receipt-Information-Extraction/archive/refs/heads/main.zip\nunzip main.zip\ncd Receipt-Information-Extraction-main\n```\n\n### Version\n- **`Python 3.8`**\n\nUsing Conda:\n```bash\n# Installation: https://docs.conda.io/en/latest/miniconda.html\nconda create -n rie python=3.8\nconda activate rie\n```\n\n### Requirements\n- rembg = 2.0.30\n- torch = 1.11\n- torchvision = 0.12\n- opencv-python = 4.7.0.72\n- scikit-learn = 1.2.1\n- scikit-image = 0.19.3\n- scipy = 1.9.3\n- imutils = 0.5.4\n- PyYAML = 6.0\n- einops = 0.6.0\n- gdown = 4.6.4\n```bash\npip install -r requirements.txt\n```\nNote: CUDA is required if you want to use GPU. You can follow my instructions [here](https://gist.github.com/HT0710/639ec6ad96f9c46e0d209ba2e50ee168)\n\n## Usage\n### One-time run\nModify the configurations in `config.yaml` then run:\n```bash\npython run.py\n```\nWith CLI:\n```bash\npython run.py -h\n```\n- -i: Image path or Folder path\n- -o: Output folder path\n- -g: Which gpu to run | 0 for cpu | -1 for all (Default: -1)\n- -mp: Maximum of cpu can use | -1 for 80% of your cpu (Default: -1)\n\nCaution: Using 100% of your cpu may crash your system!\n\n**Example:**\n```bash\npython run.py -i data/test/test_1.jpg -o output -g 0 -mp 10\n```\nMore configurations can access in `config.yaml`\n\n### Each step run\n#### 1. Remove background\n- Remove the image background\n- Input and output folder can be modify in `background_remove.py`\n- Note: only run with folder input\n\nExecute:\n```bash\npython background_remove.py\n```\n\n#### 2. Rotate\n- Rotate horizontal, invert and align straight\n- Input and output folder can be modify in `rotate.py`\n- Note: only run with folder input\n\nExecute:\n```bash\npython rotate.py\n```\n\n#### 3. Extract information\n- Extract the receipt information\n- Input and output can be modify in `extract_info.py`\n- Note: only run with single image input, if you want extract a folder please use `run.py`\n\nExecute:\n```bash\npython extract_info.py\n```\n\n## Results\n![ex_1](example/ex_1.png)\n\n## Citations\nYou can find the paper here: [RIE](https://github.com/HT0710/Receipt-Information-Extraction/tree/main/example/RIE.pdf)\n\n## License\nThis project is licensed under the MIT License. See [LICENSE](https://github.com/HT0710/Receipt-Information-Extraction/blob/main/LICENSE) for more details.\n\n## References\n- [Rembg](https://github.com/danielgatis/rembg) - [danielgatis](https://github.com/danielgatis)\n- [CRAFT-pytorch](https://github.com/clovaai/CRAFT-pytorch) - [clovaai](https://github.com/clovaai)\n- [VietOCR](https://github.com/pbcquoc/vietocr) - [pbcquoc](https://github.com/pbcquoc)\n- [PICK-pytorch](https://github.com/wenwenyu/PICK-pytorch) - [wenwenyu](https://github.com/wenwenyu)\n  \n## Contact\nOpen an issue: [New issue](https://github.com/HT0710/Receipt-Information-Extraction/issues/new)\n\nMail: pthung7102002@gmail.com\n\n---\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fht0710%2Freceipt-information-extraction","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fht0710%2Freceipt-information-extraction","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fht0710%2Freceipt-information-extraction/lists"}