An open API service indexing awesome lists of open source software.

https://github.com/mathieubuisson/bill-ingestion


https://github.com/mathieubuisson/bill-ingestion

Last synced: 26 days ago
JSON representation

Awesome Lists containing this project

README

          

# Electricity Bill Ingestion

Python-based automation for downloading, processing, and organizing electricity bills.

## Overview

This project automates the ingestion of electricity bills by:

1. Downloading electricity bills from Bord Gáis Energy
2. Converting the bill PDF to Markdown
3. Uploading the bill PDF file to a specific Google Drive folder (`Finance/Taxes//Income Tax/Electricity Receipts`)
4. Copying the Markdown file to a personal knowledge base
5. Sending a notification email with the link and other details

## Architecture

The project follows a modular architecture with clear separation of concerns:

- **Downloaders**: Handle bill retrieval from utility providers
- **Converters**: Transform file formats (PDF → Markdown)
- **Cloud Services**: Manage Google Drive and Gmail operations
- **Configuration**: Centralized settings and environment management

## Prerequisites

- Python 3.13 or higher
- Google Drive and Gmail accounts with OAuth2 credentials
- Bord Gáis online account credentials

## Setup Instructions

### 1. Clone the Repository

```bash
git clone https://github.com/MathieuBuisson/bill-ingestion.git
cd bill-ingestion
```

### 2. Create a Virtual Environment

```bash
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
```

### 3. Install Dependencies

```bash
pip install -r requirements.txt
python -m playwright install chromium
```

### 4. Configure Environment Variables

Create a `.env` file in the project root:

```bash
# Bord Gáis credentials
BORDGAIS_EMAIL=your-email@example.com
BORDGAIS_PASSWORD=your-password
BORDGAIS_ACCOUNT_ID=your-account-id

# Google credentials
GOOGLE_CREDENTIALS_FILE=credentials.json
NOTIFICATION_EMAIL=your-email@gmail.com

# Paths
MARKDOWN_DESTINATION_FOLDER=/path/to/your/knowledge/base

# Logging
LOG_LEVEL=INFO
```

### 5. Set Up Google OAuth2 Credentials

1. Go to [Google Cloud Console](https://console.cloud.google.com/)
2. Create a new project
3. Enable Google Drive API and Gmail API
4. Create OAuth2 credentials (Desktop application)
5. Download and save as `credentials.json` in the project root

### 6. Run the Workflow

Run the bill ingestion workflow:

```bash
python -m bill_ingestion.main
```

## Project Structure

```
bill-ingestion/
├── .github/
│ └── workflows/
│ └── ci.yml # GitHub Actions CI pipeline
├── .env # Environment variables (add to .gitignore)
├── .gitignore
├── README.md
├── pyproject.toml # Tool configurations (pytest, black, mypy, etc.)
├── requirements.txt
├── setup.py

├── data/ # Downloaded PDFs (runtime generated)
├── logs/ # Application logs (runtime generated)
├── temp/ # OAuth tokens and temporary files (runtime generated)

├── src/
└── bill_ingestion/
├── __init__.py
├── main.py # Entry point / orchestrator
├── config.py # Configuration & credentials

├── downloaders/
│ ├── __init__.py
│ └── bordgais.py # Bord Gáis bill download logic

├── converters/
│ ├── __init__.py
│ └── pdf_to_markdown.py # PDF → Markdown conversion

├── cloud/
│ ├── __init__.py
│ ├── google_drive.py # Google Drive operations
│ └── gmail_service.py # Email notification service

└── utils/
├── __init__.py
├── logger.py # Logging configuration
└── exceptions.py # Custom exceptions

├── tests/ # Unit tests for the application
```

## Usage Examples

### Manual Workflow Execution

```python
from bill_ingestion.main import ingest_bill_workflow

ingest_bill_workflow()
```

## Environment Variables

| Variable | Description | Required |
|----------|-------------|----------|
| `BORDGAIS_EMAIL` | Bord Gáis account email | Yes |
| `BORDGAIS_PASSWORD` | Bord Gáis account password | Yes |
| `BORDGAIS_ACCOUNT_ID` | Bord Gáis account ID | Yes |
| `GOOGLE_CREDENTIALS_FILE` | Path to Google OAuth2 credentials | Yes |
| `NOTIFICATION_EMAIL` | Email to receive bill notifications | Yes |
| `MARKDOWN_DESTINATION_FOLDER` | Destination folder for converted Markdown files | Yes |
| `LOG_LEVEL` | Logging level (INFO, DEBUG, etc.) | No |

## Security Notes

- Never commit `.env` file or `credentials.json` to version control
- Use environment variables for all sensitive data
- Rotate Google OAuth2 tokens regularly
- Consider using a secrets manager for production deployments

## Troubleshooting

### Playwright Browser Issues

If you encounter Playwright installation issues on Windows:

```bash
python -m playwright install --with-deps chromium
```

### Google Authentication Issues

Ensure your Google Cloud project has:
- Google Drive API enabled
- Gmail API enabled
- Correct OAuth2 scopes in credentials

### Bord Gáis Login Issues

The Bord Gáis website may change its structure. If the downloader fails:
1. Review the error logs
2. Inspect the website HTML structure
3. Update selectors in `src/bill_ingestion/downloaders/bordgais.py`

## Contributing

When making changes:
1. Follow PEP 8 style guidelines
2. Add type hints to new functions
3. Update tests for new functionality
4. Update documentation as needed
5. Run tests locally using PowerShell: `$env:PYTHONPATH="src"; python -m pytest`