Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/mittalbhavya/invoicedetailsextractor
Invoice Extraction Application is a Python-based tool built with Streamlit for extracting and processing invoice details from PDFs and images. It uses OCR via PaddleOCR and Generative AI with Google's Gemini API to provide structured data, including customer details, product information, and total amounts
https://github.com/mittalbhavya/invoicedetailsextractor
ai-automation data-extraction generative-ai google-gemini image-processing invoice-extraction ocr paddleocr python streamlit
Last synced: about 2 months ago
JSON representation
Invoice Extraction Application is a Python-based tool built with Streamlit for extracting and processing invoice details from PDFs and images. It uses OCR via PaddleOCR and Generative AI with Google's Gemini API to provide structured data, including customer details, product information, and total amounts
- Host: GitHub
- URL: https://github.com/mittalbhavya/invoicedetailsextractor
- Owner: MITTALBHAVYA
- Created: 2024-08-10T06:55:53.000Z (5 months ago)
- Default Branch: main
- Last Pushed: 2024-08-29T10:03:13.000Z (4 months ago)
- Last Synced: 2024-08-29T11:27:25.659Z (4 months ago)
- Topics: ai-automation, data-extraction, generative-ai, google-gemini, image-processing, invoice-extraction, ocr, paddleocr, python, streamlit
- Language: Python
- Homepage:
- Size: 1.15 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
Here's a comprehensive `README.md` file for your invoice extraction Streamlit application:
```markdown
# Invoice Extraction Application## Overview
The Invoice Extraction Application is a powerful tool built with Python and Streamlit that allows users to extract and process invoice details from various file types, including PDFs and images. This application leverages Optical Character Recognition (OCR) and Generative AI to provide structured data from invoices.
## Features
- **File Upload**: Supports uploading of PDF and image files.
- **Extraction Methods**: Offers two methods of extraction:
- **Direct Extraction (Image-based)**: Processes images directly for extraction.
- **Text Extraction (Text-based)**: Extracts text from PDFs and then processes the text.
- **Structured Output**: Provides extracted invoice details in a well-organized JSON format.## Requirements
Ensure you have Python 3.7+ installed. Create a virtual environment and install the necessary packages listed in `requirements.txt`:
```bash
pip install -r requirements.txt
```## Setup
1. **Clone the Repository**
```bash
git clone https://github.com/MITTALBHAVYA/InvoiceDetailsExtractor
cd invoice-extraction-app
```2. **Set Up Environment Variables**
Create a `.env` file in the project root directory with the following content:
```env
GEMINI_API_KEY=your_api_key_here
```Replace `your_api_key_here` with your actual API key.
3. **Run the Application**
Start the Streamlit app:
```bash
streamlit run app.py
```This will launch the application in your default web browser.
## Usage
1. **Upload a File**
Use the file uploader to choose an invoice file. Supported formats include PDF and common image formats (PNG, JPG, JPEG, GIF, BMP, TIFF).
2. **Select Extraction Method**
Choose between:
- **Direct Extraction (Image-based)**: Suitable for image files.
- **Text Extraction (Text-based)**: Suitable for PDF files.3. **Process the File**
Click the "Process" button to start the extraction. The application will process the file and display the extracted details.
4. **View Results**
The extracted details will be displayed in a structured format. You can see customer details, product information, and the total amount extracted from the invoice.
## Contributing
Contributions are welcome! Please fork the repository and submit a pull request with your changes. Ensure your code adheres to the project's coding standards and includes relevant tests.
## License
This project is licensed under the MIT License. See the [LICENSE](LICENSE) file for details.
## Contact
For any questions or issues, please reach out to:
- **Author**: Bhavya Mittal
- **Email**: [email protected]
- **GitHub**: [INVOICE_DETAILS_EXTRACTOR](https://github.com/MITTALBHAVYA/InvoiceDetailsExtractor)## Acknowledgments
- **PaddleOCR**: For Optical Character Recognition.
- **PyMuPDF**: For PDF text extraction.
- **Streamlit**: For creating the web application interface.
- **Google Generative AI**: For AI-powered text extraction.```
### Instructions:
1. **Clone and Setup**: Instructions for cloning the repository and setting up the environment.
2. **Run the Application**: How to start the Streamlit app.
3. **Usage**: Detailed steps on how to use the application.
4. **Contributing**: Guidelines for contributing to the project.
5. **License and Contact**: Licensing information and contact details.Feel free to adjust the contact details and any other specifics according to your project and preferences!
```
![alt text](workflow.jpg)
![alt text](front.png)