https://github.com/klippa-app/receipt-ocr
Receipt OCR engine to extract receipt information.
https://github.com/klippa-app/receipt-ocr
extraction information ocr parser python receipt
Last synced: 8 months ago
JSON representation
Receipt OCR engine to extract receipt information.
- Host: GitHub
- URL: https://github.com/klippa-app/receipt-ocr
- Owner: klippa-app
- Created: 2025-08-07T07:54:00.000Z (11 months ago)
- Default Branch: main
- Last Pushed: 2025-09-15T16:48:27.000Z (9 months ago)
- Last Synced: 2025-10-10T20:41:44.192Z (9 months ago)
- Topics: extraction, information, ocr, parser, python, receipt
- Language: Python
- Homepage: https://dochorizon.klippa.com
- Size: 594 KB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Receipt OCR for Receipt Processing & Information Extraction
This repository can be used to integrate Receipt OCR software to extract receipt information.
This receipt parser uses the DocHorizon OCR API to extract information.
### How Klippa Receipt OCR Works:
- Image Upload: You first need to upload the image of the receipt you want to process.
- Data Extraction: Klippa uses machine learning models and image processing techniques to analyze the image, identify key data points, and extract relevant information.
- Data Formatting: The extracted data is then formatted into a structured format (like JSON), which can easily be consumed by applications or compared with other datasets.
- Response: The service returns the structured data to your application, allowing you to process it as needed.
## Things you need
- A DocHorizon API key and/or license
- A receipt image
- A Python 3.6+ environment
- [_link to swagger docs_](https://dochorizon.klippa.com/api/swagger#/)
## How to Connect to Klippa Receipt OCR with Python
To use the Klippa Receipt OCR API in your Python script, you can follow these steps:
**Step 1: Set Up Your Environment**
You need to make sure you have Python installed on your system.
Install all of the required libraries using the requirements.txt file by running:
```pip install -r ./requirements.txt```
**Step 2: Obtain Credentials**
Sign up for Klippa’s document extraction services and get your API key. This key will be required to authenticate your requests.
>See how to get your API key in the section below; [here](#license--api-key)
**Step 3: Run Python Script**
In this repository you will find a sample Python script `main.py` which demonstrates how to upload an image to Klippa and fetch the OCR results.
### Explanation:
- API_URL: Change this to the actual endpoint you need to use. Check [Klippa’s documentation](https://dochorizon.klippa.com/api/swagger#/) for the correct API URLs.
- API_KEY: Place your Klippa API key here.
- image_path: Change this path to the actual path of the image you want to upload.
- request.components: You can enable of disable the different components in the request.
- At the end of the code, there is an example usage piece that can be used to test the code.
### Important Notes:
Ensure you use secure methods to store and manage your API keys.
Always refer to the official [DocHorizon API documentation](https://dochorizon.klippa.com/docs) for the most up-to-date information, including any changes in endpoints or request formats.
The API may have rate limits or require specific image formats—consult the documentation for these details.
>Please ensure that the image is the only object in the image with edges clearly visible.

### Example
An example of a POST request using cURL:
The following endpoint is used from the [swagger](https://dochorizon.klippa.com/api/swagger#/Components%20Capturing%20API/document-capturing-components) docs.
Click here to see the full cURL command
```
curl -X POST \\
-H "x-api-key: {your-api-key}" \\
-H "Content-Type: application/json" \\
-d '{
"components": {
"barcode": {
"barcode_types": [
"string"
],
"enabled": false
},
"fraud": {
"enabled": false,
"metadata": {
"date": false,
"editor": false
},
"visual": {
"copy_move": false,
"splicing": false
}
},
"ocr": {
"enabled": false
}
},
"documents": [
{
"content_type": "string",
"data": "string",
"file_id": "string",
"filename": "string",
"page_ranges": "string",
"password": "string",
"url": "string"
}
]
}' \\
https://dochorizon.klippa.com/api/services/document_capturing/v1/components
```
The expected JSON schema with a 200 OK response
```
{
"components": {
"barcode": {
"barcodes": [
{
"type": "string",
"value": "string"
}
],
"candidates": [
{
"confidence": 0,
"coordinates": [
{
"file": 0,
"page": 0,
"vertices": [
[
0
]
]
}
],
"type": "string",
"value": "string"
}
]
},
"fraud": {
"metadata": {
"date": {
"confidence": 0,
"digitized": "string",
"modified": "string",
"original": "string"
},
"editor": {
"confidence": 0,
"found": [
"string"
],
"fraudulent": [
"string"
]
}
},
"summary": {
"confidence": 0
},
"visual": {
"copy_move": {
"confidence": 0,
"coordinates": [
{
"file": 0,
"page": 0,
"vertices": [
[
0
]
]
}
]
},
"splicing": {
"confidence": 0,
"coordinates": [
{
"file": 0,
"page": 0,
"vertices": [
[
0
]
]
}
]
}
}
},
"ocr": {
"documents": [
{
"document_index": 0,
"pages": [
{
"height": 0,
"lines": [
{
"coordinates": [
{
"file": 0,
"page": 0,
"vertices": [
[
0
]
]
}
],
"text": "string",
"words": [
{
"coordinates": [
{
"file": 0,
"page": 0,
"vertices": [
[
0
]
]
}
],
"text": "string"
}
]
}
],
"page_index": 0,
"text": "string",
"width": 0
}
]
}
]
}
},
"version": "string"
}
```
## License & API KEY
For this project and usage of the DocHorizon OCR , you would need to create an account and retrieve an API key.
Follow these steps to get your API key:
* Sign up via the [signup page](https://dochorizon.klippa.com/public/signup)
* Finish setting up the organization and create a first project
* Enable the service you would like to use (Document Capturing - Components) under Project settings > Services
* Create a credential by going to the Project settings > Credentials page (screenshot 1 & 2)
* Create a credential, give it a name and add additional security settings like IP whitelisting
* Make sure the right service (Document Capturing - Components) is toggled on in the 'Access' tab (screenshot 3 & 4)
* Go to the 'API Keys' tab and copy the API key
* _Optional:_ Here you can also create a new API key if you want to have new keys for different use cases
* [Link to documentation](https://dochorizon.klippa.com/docs/platform/credentials) for further information
> Image of the Access page within an existing credential

> In the API keys tab you will find the API key
## Background & support
Klippa has 10 years of experience in OCR and Document Processing and has built a robust and scalable solution for many customers.
Receipt OCR is one of the most popular services. Since receipt information extraction is a complex task, we have developed a robust and scalable solution that can be used by anyone.
Thanks to the engine and receipt parser we use, you can extract information from any kind of receipt.
To learn more about the Receipt OCR software we use, visit this [page.](https://www.klippa.com/en/ocr/financial-documents/receipts/)
If you have any questions or need support, please [contact](mailto:dochorizon-support@klippa.com) us.
Or visit the general [website.](https://klippa.com/)
## Other supported languages
This repository contains a sample Python script that demonstrates how to upload an image to Klippa and fetch the OCR results.
Any other kind of library can be used to integrate the API such as:
- cUrl
- NodeJS
- PHP
- GO
- C#/.NET
- Java