https://github.com/mathieubuisson/bill-ingestion
https://github.com/mathieubuisson/bill-ingestion
Last synced: 26 days ago
JSON representation
- Host: GitHub
- URL: https://github.com/mathieubuisson/bill-ingestion
- Owner: MathieuBuisson
- Created: 2026-04-14T21:40:21.000Z (2 months ago)
- Default Branch: main
- Last Pushed: 2026-04-23T21:02:47.000Z (2 months ago)
- Last Synced: 2026-04-23T22:16:31.755Z (2 months ago)
- Language: Python
- Size: 57.6 KB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Electricity Bill Ingestion
Python-based automation for downloading, processing, and organizing electricity bills.
## Overview
This project automates the ingestion of electricity bills by:
1. Downloading electricity bills from Bord Gáis Energy
2. Converting the bill PDF to Markdown
3. Uploading the bill PDF file to a specific Google Drive folder (`Finance/Taxes//Income Tax/Electricity Receipts`)
4. Copying the Markdown file to a personal knowledge base
5. Sending a notification email with the link and other details
## Architecture
The project follows a modular architecture with clear separation of concerns:
- **Downloaders**: Handle bill retrieval from utility providers
- **Converters**: Transform file formats (PDF → Markdown)
- **Cloud Services**: Manage Google Drive and Gmail operations
- **Configuration**: Centralized settings and environment management
## Prerequisites
- Python 3.13 or higher
- Google Drive and Gmail accounts with OAuth2 credentials
- Bord Gáis online account credentials
## Setup Instructions
### 1. Clone the Repository
```bash
git clone https://github.com/MathieuBuisson/bill-ingestion.git
cd bill-ingestion
```
### 2. Create a Virtual Environment
```bash
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
```
### 3. Install Dependencies
```bash
pip install -r requirements.txt
python -m playwright install chromium
```
### 4. Configure Environment Variables
Create a `.env` file in the project root:
```bash
# Bord Gáis credentials
BORDGAIS_EMAIL=your-email@example.com
BORDGAIS_PASSWORD=your-password
BORDGAIS_ACCOUNT_ID=your-account-id
# Google credentials
GOOGLE_CREDENTIALS_FILE=credentials.json
NOTIFICATION_EMAIL=your-email@gmail.com
# Paths
MARKDOWN_DESTINATION_FOLDER=/path/to/your/knowledge/base
# Logging
LOG_LEVEL=INFO
```
### 5. Set Up Google OAuth2 Credentials
1. Go to [Google Cloud Console](https://console.cloud.google.com/)
2. Create a new project
3. Enable Google Drive API and Gmail API
4. Create OAuth2 credentials (Desktop application)
5. Download and save as `credentials.json` in the project root
### 6. Run the Workflow
Run the bill ingestion workflow:
```bash
python -m bill_ingestion.main
```
## Project Structure
```
bill-ingestion/
├── .github/
│ └── workflows/
│ └── ci.yml # GitHub Actions CI pipeline
├── .env # Environment variables (add to .gitignore)
├── .gitignore
├── README.md
├── pyproject.toml # Tool configurations (pytest, black, mypy, etc.)
├── requirements.txt
├── setup.py
│
├── data/ # Downloaded PDFs (runtime generated)
├── logs/ # Application logs (runtime generated)
├── temp/ # OAuth tokens and temporary files (runtime generated)
│
├── src/
└── bill_ingestion/
├── __init__.py
├── main.py # Entry point / orchestrator
├── config.py # Configuration & credentials
│
├── downloaders/
│ ├── __init__.py
│ └── bordgais.py # Bord Gáis bill download logic
│
├── converters/
│ ├── __init__.py
│ └── pdf_to_markdown.py # PDF → Markdown conversion
│
├── cloud/
│ ├── __init__.py
│ ├── google_drive.py # Google Drive operations
│ └── gmail_service.py # Email notification service
│
└── utils/
├── __init__.py
├── logger.py # Logging configuration
└── exceptions.py # Custom exceptions
│
├── tests/ # Unit tests for the application
```
## Usage Examples
### Manual Workflow Execution
```python
from bill_ingestion.main import ingest_bill_workflow
ingest_bill_workflow()
```
## Environment Variables
| Variable | Description | Required |
|----------|-------------|----------|
| `BORDGAIS_EMAIL` | Bord Gáis account email | Yes |
| `BORDGAIS_PASSWORD` | Bord Gáis account password | Yes |
| `BORDGAIS_ACCOUNT_ID` | Bord Gáis account ID | Yes |
| `GOOGLE_CREDENTIALS_FILE` | Path to Google OAuth2 credentials | Yes |
| `NOTIFICATION_EMAIL` | Email to receive bill notifications | Yes |
| `MARKDOWN_DESTINATION_FOLDER` | Destination folder for converted Markdown files | Yes |
| `LOG_LEVEL` | Logging level (INFO, DEBUG, etc.) | No |
## Security Notes
- Never commit `.env` file or `credentials.json` to version control
- Use environment variables for all sensitive data
- Rotate Google OAuth2 tokens regularly
- Consider using a secrets manager for production deployments
## Troubleshooting
### Playwright Browser Issues
If you encounter Playwright installation issues on Windows:
```bash
python -m playwright install --with-deps chromium
```
### Google Authentication Issues
Ensure your Google Cloud project has:
- Google Drive API enabled
- Gmail API enabled
- Correct OAuth2 scopes in credentials
### Bord Gáis Login Issues
The Bord Gáis website may change its structure. If the downloader fails:
1. Review the error logs
2. Inspect the website HTML structure
3. Update selectors in `src/bill_ingestion/downloaders/bordgais.py`
## Contributing
When making changes:
1. Follow PEP 8 style guidelines
2. Add type hints to new functions
3. Update tests for new functionality
4. Update documentation as needed
5. Run tests locally using PowerShell: `$env:PYTHONPATH="src"; python -m pytest`