https://github.com/aznironman/invoice-pdf-to-excel-batch-converter
This Python script batch-converts PDF files to Excel spreadsheets. Developed specifically for SW Firefighting Foam & Equipment, LLC, but easily adaptable.
https://github.com/aznironman/invoice-pdf-to-excel-batch-converter
custom-invoice excel invoice invoice-pdf invoices pdf pdf-conveter pdf-to-excel
Last synced: 2 months ago
JSON representation
This Python script batch-converts PDF files to Excel spreadsheets. Developed specifically for SW Firefighting Foam & Equipment, LLC, but easily adaptable.
- Host: GitHub
- URL: https://github.com/aznironman/invoice-pdf-to-excel-batch-converter
- Owner: AznIronMan
- License: mit
- Created: 2023-10-17T17:07:05.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2023-10-17T17:11:24.000Z (over 1 year ago)
- Last Synced: 2023-10-17T18:32:39.888Z (over 1 year ago)
- Topics: custom-invoice, excel, invoice, invoice-pdf, invoices, pdf, pdf-conveter, pdf-to-excel
- Language: Python
- Homepage:
- Size: 13.7 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# PDF to Excel Batch Converter
- **Version**: v1.10
- **Date**: 10.17.2023 @ 10:00 PST
- **Developed for**: C. Ryan of SW Firefighting Foam & Equipment, LLC## Overview
This Python script is designed to batch-convert PDF files to Excel spreadsheets. The code is well-structured and modular, with specific functionalities encapsulated in specialized functions. It was written specifically for the structued PDFs provided by SW Firefighting Foam & Equipment, LLC, but can be easily adapted for other PDFs.## Modules
filer.py
- **check_function**: Checks if a file or directory exists. Can create directories.
- **dir_check**: Wrapper around check_function for directories.
- **file_exists**: Wrapper around check_function for files.
- **is_gui_available**: Checks if a GUI environment is available.
- **path_to_module**: Converts a file path to a Python module path.
- **select_folder**: Opens a file dialog for folder selection or takes user input.logger.py
- **check_function**: Checks if a file or directory exists. Can create directories. (Duplicated from filer.py)
- **dir_check**: Wrapper around check_function for directories. (Duplicated from filer.py)
- **fix_datetime**: Formats datetime objects or timestamps.
- **format_log_date**: Formats date for log files.
- **log**: Writes log messages to a file.
- **now**: Gets the current time in a formatted string.
- **today**: Gets the current date in a formatted string.
- **zlog**: Enhanced logging function with console output option.process.py
- **batch_convert**: Batch converts PDFs to Excel in a target directory.
- **clean_currency**: Cleans currency strings.
- **find_and_parse_date**: Finds and parses dates in text.
- **format_excel**: Formats the generated Excel file.
- **get_years_to_search**: Returns a list of years to search for in text.
- **map_text_to_excel_columns**: Maps extracted text to Excel columns.
- **pdf_to_excel**: Core function that manages the PDF to Excel conversion.
- **_Various parse_ functions**: Extract specific information from text.## Requirements
- Python 3.8+
- openpyxl
- pdfplumber
- pandas
- tkinter (for GUI dialogs)
- re (Regular expressions)
- datetime
- dateutil
- dotenv (For environment variables)## Usage
- Ensure all dependencies are installed. Run `pip install -r requirements.txt` to install all dependencies.
- Run `python \.` from the root of the project where __main__.py is located.
- If a GUI environment is available, a file dialog will open for folder selection. Otherwise, the user will be prompted to enter a folder path.
- All processed PDFs will output as Excel files in a new 'processed' directory within the same directory as the PDFs.
- Subdirectories PDF files will be converted to Excel files within the same subdirectory in a new 'processed' subdirectory.## Areas for Improvement
- **Exception Handling**: Could be improved for more specific error messages.## Author Information
- **Author**: [Geoff Clark of ClarkTribeGames, LLC](https://clarktribegames.com)
- **Email**: [[email protected]](mailto:[email protected])
- **Socials**:
[Github @aznironman](https://github.com/aznironman)
[IG: @aznironman](https://instagram.com/aznironman)
[X: @aznironman](https://www.twitter.com/aznironman)