https://github.com/y1d1r/pyfacture
PyFacture is a Python project designed to automate expense management from receipts. The application utilizes image processing techniques and Optical Character Recognition (OCR) using Tesseract and Llama3.2-vision to extract relevant information from a photo of a receipt.
https://github.com/y1d1r/pyfacture
image-procesing lamma ocr-python opencv python tesseract
Last synced: 12 months ago
JSON representation
PyFacture is a Python project designed to automate expense management from receipts. The application utilizes image processing techniques and Optical Character Recognition (OCR) using Tesseract and Llama3.2-vision to extract relevant information from a photo of a receipt.
- Host: GitHub
- URL: https://github.com/y1d1r/pyfacture
- Owner: Y1D1R
- Created: 2024-12-29T14:05:12.000Z (about 1 year ago)
- Default Branch: main
- Last Pushed: 2025-01-17T20:47:59.000Z (about 1 year ago)
- Last Synced: 2025-01-17T21:29:30.288Z (about 1 year ago)
- Topics: image-procesing, lamma, ocr-python, opencv, python, tesseract
- Language: Python
- Homepage:
- Size: 813 KB
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# PyFacture
PyFacture is a Python project designed to automate expense management from receipts. The application utilizes image processing techniques and Optical Character Recognition (OCR) to extract relevant information from a photo of a receipt, such as purchased products, their prices, and the date of purchase.
## Features
- **Image Processing:** Enhances receipt images for better OCR accuracy.
- **Optical Character Recognition (OCR):** Extracts text from receipt images using Tesseract or Llama.
- **Data Extraction:** Analyzes OCR text to identify products, prices, and dates.
- **Excel File Management:** Creates and updates Excel files to store extracted data.
## Installation
### 1. Clone the Repository
```bash
git clone https://github.com/Y1D1R/PyFacture.git
cd PyFacture
```
### 2. Install Dependencies
Install the required Python packages using pip:
```bash
pip install -r requirements.txt
```
### 3. Install Tesseract OCR and Ollama
PyFacture relies on Tesseract OCR for text extraction.
Follow the instructions below based on your operating system.
Once you have Ollama installed, install the Llama 3.2-Vision model(6 GB):
```bash
ollama run llama3.2-vision
```
More information here : https://sebastian-petrus.medium.com/build-a-local-ollama-ocr-application-using-llama-3-2-vision-bfc3014e3ad6
### 4. Usage
#### 4.1. Prepare Your Data
Place your receipt images in the "data/input/" directory.
Ensure that the images are clear, well-lit, and free from distortions for optimal OCR results.
#### 4.2. Run the Application
Execute the main script, then choose the method from the menu to process the receipts and extract data:
```bash
python pyfacture/main.py
```

#### 4.3. View the Results
##### 4.3.1 Tesseract OCR
The extracted data will be saved as Excel files in the "data/output/" directory.

##### 4.3.1 Llama OCR