An open API service indexing awesome lists of open source software.

https://github.com/ibnu-umer/finance-parser

Extract and analyze bank or payment transaction data from PDF statements โ€” all in one unified CLI tool.
https://github.com/ibnu-umer/finance-parser

bank bank-state bank-statement-parser banking canara-bank cli command-line-interface command-line-tool csv google google-pay gpay json pdf-parser python

Last synced: 4 days ago
JSON representation

Extract and analyze bank or payment transaction data from PDF statements โ€” all in one unified CLI tool.

Awesome Lists containing this project

README

          

# ๐Ÿงพ Finance Parser

Extract and analyze **bank or payment transaction data** from PDF statements โ€” an all-in-one CLI tool.
The **Finance Parser** reads PDFs (GPay, Canara Bank, etc.), extracts structured transaction details, and exports them to **CSV or JSON** for easy analysis or integration.

## ๐Ÿš€ Features

- โš™๏ธ **Multi-bank support** (GPay, Canara, and extendable to others)
- ๐Ÿ“„ **Smart PDF parsing** using Camelot / pdfplumber
- ๐Ÿงฉ **CLI tool** for easy automation
- ๐Ÿงน **Data normalization & cleaning**
- ๐Ÿ“Š **Exports to CSV and JSON**
- ๐Ÿ”’ Fully offline โ€” no external APIs required

## ๐Ÿ—๏ธ Project Structure

```plaintext
finance-parser/
โ”œโ”€โ”€ src/
โ”‚ โ””โ”€โ”€ finance_parser/
โ”‚ โ”œโ”€โ”€ __init__.py
โ”‚ โ”œโ”€โ”€ __main__.py # CLI entry point
โ”‚ โ”œโ”€โ”€ main.py # Core logic
โ”‚ โ”œโ”€โ”€ canara_parser.py # Bank-specific parsers
โ”‚ โ”œโ”€โ”€ gpay_parser.py
โ”‚ โ””โ”€โ”€ utils/ # Shared helpers
โ”œโ”€โ”€ media/
โ”‚ โ””โ”€โ”€ sample_statement.pdf # Example input
โ”œโ”€โ”€ output/
โ”‚ โ”œโ”€โ”€ transactions.csv
โ”‚ โ””โ”€โ”€ transactions.json
โ”œโ”€โ”€ pyproject.toml # Build system & CLI entry config
โ”œโ”€โ”€ requirements.txt
โ””โ”€โ”€ README.md
```

## โš™๏ธ Setup

### 1๏ธโƒฃ Clone the repo
```bash
git clone https://github.com/ibnu-umer/finance-parser.git
cd finance-parser
```

### 2๏ธโƒฃ Install dependencies
```bash
pip install -r requirements.txt
```

### 3๏ธโƒฃ Add your statement PDF
Place your bank statement (e.g., GPay, Canara) inside the `media/` folder.

## ๐Ÿงฉ Usage

### Basic Command
```bash
python -m finance_parser --file "media/canara_statement.pdf" --type canara --format csv
```

Or, if installed as a package:
```bash
finance-parser --file "media/canara_statement.pdf" --type canara --format csv
```

## โš™๏ธ CLI Options

| Flag | Description | Example |
|------|--------------|---------|
| `-f`, `--file` | Path to PDF file | `--file media/canara_statement.pdf` |
| `-t`, `--type` | Bank/statement type (`gpay`, `canara`, etc.) | `--type canara` |
| `-o`, `--output` | Output folder | `--output output/` |
| `--format` | Output format (`csv`, `json`, or `both`) | `--format both` |
| `-p`, `--privacy` | Processing mode (`raw`, `clean`, or `masked`) | `--privacy clean` |

Example:
```bash
finance-parser --file media/canara_statement.pdf --type canara --format both --privacy masked
```

## ๐Ÿง  How It Works

1. Detects and reads statement text using Camelot or pdfplumber.
2. Chooses the correct parser based on `--type`.
3. Extracts structured transaction data (date, description, debit/credit, balance, etc.).
4. Applies normalization, masking, or cleaning if requested.
5. Outputs the data in CSV or JSON formats.

## ๐Ÿงฐ Dependencies

- camelot-py / pdfplumber โ€“ PDF parsing
- pandas โ€“ Data manipulation
- argparse โ€“ Command-line interface
- re โ€“ Regex-based parsing

Install manually if needed:
```bash
pip install camelot-py pdfplumber pandas
```

## ๐Ÿงผ Output

### GPay

- `date` โ€“ Transaction date
- `time` โ€“ Transaction time
- `type` โ€“ Credit/Debit
- `payee` โ€“ Counterparty / Payee name
- `txn_id` โ€“ UPI Transaction ID
- `account` โ€“ Account
- `amount` โ€“ Transaction amount

### Canara

- `date` โ€“ Transaction date
- `time` โ€“ Transaction time
- `txn_type` โ€“ Credit/Debit
- `mode` โ€“ UPI, NEFT, IMPS, etc.
- `txn_id` โ€“ Transaction ID (for UPI/IMPS)
- `bank_code` โ€“ 4-letter bank code
- `payee` โ€“ Counterparty / Payee name
- `upi_id` โ€“ UPI ID if available
- `amount` โ€“ Transaction amount
- `balance` โ€“ Account balance after transaction
- `cheque_no` โ€“ Cheque number if present

## ๐Ÿฅง Sensitive Fields

Some transaction fields contain sensitive information. These are handled differently depending on the output mode.

### Sensitive Fields by Source

- **Canara Bank**
- `upi_id`
- `txn_id`
- `cheque_no`

- **GPay**
- `txn_id`

### Output Modes

1. **Raw**
- All columns are included.
- Sensitive fields are **not masked**.

2. **Masked**
- All columns are included.
- Sensitive fields are **masked** (partial hiding of UPI IDs, txn IDs, cheque numbers).

3. **Clean**
- All sensitive fields are **dropped** from the output.
- Only non-sensitive columns remain.

This ensures privacy while maintaining flexibility for analysis.