https://github.com/y1d1r/pyfacture
PyFacture is a Python project designed to automate expense management from receipts. The application utilizes image processing techniques and Optical Character Recognition (OCR) using Tesseract and Llama3.2-vision to extract relevant information from a photo of a receipt.
https://github.com/y1d1r/pyfacture
image-procesing lamma ocr-python opencv python tesseract
Last synced: 2 months ago
JSON representation
PyFacture is a Python project designed to automate expense management from receipts. The application utilizes image processing techniques and Optical Character Recognition (OCR) using Tesseract and Llama3.2-vision to extract relevant information from a photo of a receipt.
- Host: GitHub
- URL: https://github.com/y1d1r/pyfacture
- Owner: Y1D1R
- Created: 2024-12-29T14:05:12.000Z (5 months ago)
- Default Branch: main
- Last Pushed: 2025-01-17T20:47:59.000Z (4 months ago)
- Last Synced: 2025-01-17T21:29:30.288Z (4 months ago)
- Topics: image-procesing, lamma, ocr-python, opencv, python, tesseract
- Language: Python
- Homepage:
- Size: 813 KB
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# PyFacture
PyFacture is a Python project designed to automate expense management from receipts. The application utilizes image processing techniques and Optical Character Recognition (OCR) to extract relevant information from a photo of a receipt, such as purchased products, their prices, and the date of purchase.
## Features
- **Image Processing:** Enhances receipt images for better OCR accuracy.
- **Optical Character Recognition (OCR):** Extracts text from receipt images using Tesseract or Llama.
- **Data Extraction:** Analyzes OCR text to identify products, prices, and dates.
- **Excel File Management:** Creates and updates Excel files to store extracted data.## Installation
### 1. Clone the Repository
```bash
git clone https://github.com/Y1D1R/PyFacture.git
cd PyFacture
```### 2. Install Dependencies
Install the required Python packages using pip:```bash
pip install -r requirements.txt
```### 3. Install Tesseract OCR and Ollama
PyFacture relies on Tesseract OCR for text extraction.
Follow the instructions below based on your operating system.Once you have Ollama installed, install the Llama 3.2-Vision model(6 GB):
```bash
ollama run llama3.2-vision
```
More information here : https://sebastian-petrus.medium.com/build-a-local-ollama-ocr-application-using-llama-3-2-vision-bfc3014e3ad6### 4. Usage
#### 4.1. Prepare Your Data
Place your receipt images in the "data/input/" directory.
Ensure that the images are clear, well-lit, and free from distortions for optimal OCR results.#### 4.2. Run the Application
Execute the main script, then choose the method from the menu to process the receipts and extract data:```bash
python pyfacture/main.py
```#### 4.3. View the Results
##### 4.3.1 Tesseract OCR![]()
![]()
The extracted data will be saved as Excel files in the "data/output/" directory.##### 4.3.1 Llama OCR
![]()