{"id":30762535,"url":"https://github.com/klippa-app/receipt-ocr","last_synced_at":"2025-10-22T10:55:47.479Z","repository":{"id":313034879,"uuid":"1033694467","full_name":"klippa-app/receipt-ocr","owner":"klippa-app","description":"Receipt OCR engine to extract receipt information.","archived":false,"fork":false,"pushed_at":"2025-09-15T16:48:27.000Z","size":608,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-10-10T20:41:44.192Z","etag":null,"topics":["extraction","information","ocr","parser","python","receipt"],"latest_commit_sha":null,"homepage":"https://dochorizon.klippa.com","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/klippa-app.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-08-07T07:54:00.000Z","updated_at":"2025-09-15T16:48:30.000Z","dependencies_parsed_at":"2025-09-03T15:22:55.594Z","dependency_job_id":"7a80ccba-0700-405b-91bf-c4f4aa6b1221","html_url":"https://github.com/klippa-app/receipt-ocr","commit_stats":null,"previous_names":["klippa-app/receipt-ocr"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/klippa-app/receipt-ocr","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/klippa-app%2Freceipt-ocr","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/klippa-app%2Freceipt-ocr/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/klippa-app%2Freceipt-ocr/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/klippa-app%2Freceipt-ocr/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/klippa-app","download_url":"https://codeload.github.com/klippa-app/receipt-ocr/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/klippa-app%2Freceipt-ocr/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":280424213,"owners_count":26328462,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-22T02:00:06.515Z","response_time":63,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["extraction","information","ocr","parser","python","receipt"],"created_at":"2025-09-04T15:06:22.069Z","updated_at":"2025-10-22T10:55:47.433Z","avatar_url":"https://github.com/klippa-app.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Receipt OCR for Receipt Processing \u0026 Information Extraction\nThis repository can be used to integrate Receipt OCR software to extract receipt information.\nThis receipt parser uses the DocHorizon OCR API to extract information.\n\n### How Klippa Receipt OCR Works:\n\n- Image Upload: You first need to upload the image of the receipt you want to process.\n- Data Extraction: Klippa uses machine learning models and image processing techniques to analyze the image, identify key data points, and extract relevant information.\n- Data Formatting: The extracted data is then formatted into a structured format (like JSON), which can easily be consumed by applications or compared with other datasets.\n- Response: The service returns the structured data to your application, allowing you to process it as needed.\n\n## Things you need\n- A DocHorizon API key and/or license\n- A receipt image\n- A Python 3.6+ environment\n- [_link to swagger docs_](https://dochorizon.klippa.com/api/swagger#/)\n\n## How to Connect to Klippa Receipt OCR with Python\nTo use the Klippa Receipt OCR API in your Python script, you can follow these steps:\n\n**Step 1: Set Up Your Environment** \u003cbr/\u003e\nYou need to make sure you have Python installed on your system. \nInstall all of the required libraries using the requirements.txt file by running:\n\n```pip install -r ./requirements.txt```\n\n**Step 2: Obtain Credentials** \u003cbr/\u003e\nSign up for Klippa’s document extraction services and get your API key. This key will be required to authenticate your requests.\n\u003eSee how to get your API key in the section below; [here](#license--api-key)\n\n**Step 3: Run Python Script** \u003cbr/\u003e\nIn this repository you will find a sample Python script `main.py` which demonstrates how to upload an image to Klippa and fetch the OCR results.\n\n\n### Explanation:\n- API_URL: Change this to the actual endpoint you need to use. Check [Klippa’s documentation](https://dochorizon.klippa.com/api/swagger#/) for the correct API URLs.\n- API_KEY: Place your Klippa API key here.\n- image_path: Change this path to the actual path of the image you want to upload.\n- request.components: You can enable of disable the different components in the request.\n- At the end of the code, there is an example usage piece that can be used to test the code.\n\n### Important Notes:\nEnsure you use secure methods to store and manage your API keys.\nAlways refer to the official [DocHorizon API documentation](https://dochorizon.klippa.com/docs) for the most up-to-date information, including any changes in endpoints or request formats.\nThe API may have rate limits or require specific image formats—consult the documentation for these details.\n\n\u003ePlease ensure that the image is the only object in the image with edges clearly visible.\n\n\u003cimg src=\"/images/receipt-example-github.jpg\" alt=\"receipt-example\" width=\"400\" height=\"400\"\u003e\n\n### Example\nAn example of a POST request using cURL:\n\nThe following endpoint is used from the [swagger](https://dochorizon.klippa.com/api/swagger#/Components%20Capturing%20API/document-capturing-components) docs.\n\n\u003cdetails\u003e\n\u003csummary\u003eClick here to see the full cURL command\u003c/summary\u003e\n\n```\ncurl -X POST \\\\\n  -H \"x-api-key: {your-api-key}\" \\\\\n  -H \"Content-Type: application/json\" \\\\\n  -d '{\n  \"components\": {\n    \"barcode\": {\n      \"barcode_types\": [\n        \"string\"\n      ],\n      \"enabled\": false\n    },\n    \"fraud\": {\n      \"enabled\": false,\n      \"metadata\": {\n        \"date\": false,\n        \"editor\": false\n      },\n      \"visual\": {\n        \"copy_move\": false,\n        \"splicing\": false\n      }\n    },\n    \"ocr\": {\n      \"enabled\": false\n    }\n  },\n  \"documents\": [\n    {\n      \"content_type\": \"string\",\n      \"data\": \"string\",\n      \"file_id\": \"string\",\n      \"filename\": \"string\",\n      \"page_ranges\": \"string\",\n      \"password\": \"string\",\n      \"url\": \"string\"\n    }\n  ]\n}' \\\\\n  https://dochorizon.klippa.com/api/services/document_capturing/v1/components\n```\n\n\u003c/details\u003e\n\n\u003cdetails\u003e\n\u003csummary\u003eThe expected JSON schema with a 200 OK response\u003c/summary\u003e\n\n```\n{\n  \"components\": {\n    \"barcode\": {\n      \"barcodes\": [\n        {\n          \"type\": \"string\",\n          \"value\": \"string\"\n        }\n      ],\n      \"candidates\": [\n        {\n          \"confidence\": 0,\n          \"coordinates\": [\n            {\n              \"file\": 0,\n              \"page\": 0,\n              \"vertices\": [\n                [\n                  0\n                ]\n              ]\n            }\n          ],\n          \"type\": \"string\",\n          \"value\": \"string\"\n        }\n      ]\n    },\n    \"fraud\": {\n      \"metadata\": {\n        \"date\": {\n          \"confidence\": 0,\n          \"digitized\": \"string\",\n          \"modified\": \"string\",\n          \"original\": \"string\"\n        },\n        \"editor\": {\n          \"confidence\": 0,\n          \"found\": [\n            \"string\"\n          ],\n          \"fraudulent\": [\n            \"string\"\n          ]\n        }\n      },\n      \"summary\": {\n        \"confidence\": 0\n      },\n      \"visual\": {\n        \"copy_move\": {\n          \"confidence\": 0,\n          \"coordinates\": [\n            {\n              \"file\": 0,\n              \"page\": 0,\n              \"vertices\": [\n                [\n                  0\n                ]\n              ]\n            }\n          ]\n        },\n        \"splicing\": {\n          \"confidence\": 0,\n          \"coordinates\": [\n            {\n              \"file\": 0,\n              \"page\": 0,\n              \"vertices\": [\n                [\n                  0\n                ]\n              ]\n            }\n          ]\n        }\n      }\n    },\n    \"ocr\": {\n      \"documents\": [\n        {\n          \"document_index\": 0,\n          \"pages\": [\n            {\n              \"height\": 0,\n              \"lines\": [\n                {\n                  \"coordinates\": [\n                    {\n                      \"file\": 0,\n                      \"page\": 0,\n                      \"vertices\": [\n                        [\n                          0\n                        ]\n                      ]\n                    }\n                  ],\n                  \"text\": \"string\",\n                  \"words\": [\n                    {\n                      \"coordinates\": [\n                        {\n                          \"file\": 0,\n                          \"page\": 0,\n                          \"vertices\": [\n                            [\n                              0\n                            ]\n                          ]\n                        }\n                      ],\n                      \"text\": \"string\"\n                    }\n                  ]\n                }\n              ],\n              \"page_index\": 0,\n              \"text\": \"string\",\n              \"width\": 0\n            }\n          ]\n        }\n      ]\n    }\n  },\n  \"version\": \"string\"\n}\n```\n\n\u003c/details\u003e\n\n## License \u0026 API KEY\nFor this project and usage of the DocHorizon OCR , you would need to create an account and retrieve an API key.\nFollow these steps to get your API key:\n* Sign up via the [signup page](https://dochorizon.klippa.com/public/signup)\n* Finish setting up the organization and create a first project\n* Enable the service you would like to use (Document Capturing - Components) under Project settings \u003e Services\n* Create a credential by going to the Project settings \u003e Credentials page (screenshot 1 \u0026 2)\n* Create a credential, give it a name and add additional security settings like IP whitelisting\n* Make sure the right service (Document Capturing - Components) is toggled on in the 'Access' tab (screenshot 3 \u0026 4)\n* Go to the 'API Keys' tab and copy the API key\n  * _Optional:_ Here you can also create a new API key if you want to have new keys for different use cases\n* [Link to documentation](https://dochorizon.klippa.com/docs/platform/credentials) for further information\n\n\u003e Image of the Access page within an existing credential\n\u003cimg src=\"/images/access_credentials_page_ReceiptOCR.png\" alt=\"screenshot API key\" width=\"1000\" height=\"400\"\u003e\n\n\u003e In the API keys tab you will find the API key\n\n## Background \u0026 support\nKlippa has 10 years of experience in OCR and Document Processing and has built a robust and scalable solution for many customers.\nReceipt OCR is one of the most popular services. Since receipt information extraction is a complex task, we have developed a robust and scalable solution that can be used by anyone.\nThanks to the engine and receipt parser we use, you can extract information from any kind of receipt.\nTo learn more about the Receipt OCR software we use, visit this [page.](https://www.klippa.com/en/ocr/financial-documents/receipts/)\n\nIf you have any questions or need support, please [contact](mailto:dochorizon-support@klippa.com) us.\nOr visit the general [website.](https://klippa.com/)\n\n## Other supported languages\nThis repository contains a sample Python script that demonstrates how to upload an image to Klippa and fetch the OCR results.\nAny other kind of library can be used to integrate the API such as:\n- cUrl\n- NodeJS\n- PHP\n- GO\n- C#/.NET\n- Java\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fklippa-app%2Freceipt-ocr","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fklippa-app%2Freceipt-ocr","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fklippa-app%2Freceipt-ocr/lists"}