https://github.com/ibnu-umer/finance-parser
Extract and analyze bank or payment transaction data from PDF statements โ all in one unified CLI tool.
https://github.com/ibnu-umer/finance-parser
bank bank-state bank-statement-parser banking canara-bank cli command-line-interface command-line-tool csv google google-pay gpay json pdf-parser python
Last synced: 4 days ago
JSON representation
Extract and analyze bank or payment transaction data from PDF statements โ all in one unified CLI tool.
- Host: GitHub
- URL: https://github.com/ibnu-umer/finance-parser
- Owner: ibnu-umer
- Created: 2025-10-18T13:51:30.000Z (8 months ago)
- Default Branch: master
- Last Pushed: 2025-10-29T12:16:55.000Z (8 months ago)
- Last Synced: 2025-10-29T14:15:36.607Z (8 months ago)
- Topics: bank, bank-state, bank-statement-parser, banking, canara-bank, cli, command-line-interface, command-line-tool, csv, google, google-pay, gpay, json, pdf-parser, python
- Language: Python
- Homepage:
- Size: 28.3 KB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# ๐งพ Finance Parser
Extract and analyze **bank or payment transaction data** from PDF statements โ an all-in-one CLI tool.
The **Finance Parser** reads PDFs (GPay, Canara Bank, etc.), extracts structured transaction details, and exports them to **CSV or JSON** for easy analysis or integration.
## ๐ Features
- โ๏ธ **Multi-bank support** (GPay, Canara, and extendable to others)
- ๐ **Smart PDF parsing** using Camelot / pdfplumber
- ๐งฉ **CLI tool** for easy automation
- ๐งน **Data normalization & cleaning**
- ๐ **Exports to CSV and JSON**
- ๐ Fully offline โ no external APIs required
## ๐๏ธ Project Structure
```plaintext
finance-parser/
โโโ src/
โ โโโ finance_parser/
โ โโโ __init__.py
โ โโโ __main__.py # CLI entry point
โ โโโ main.py # Core logic
โ โโโ canara_parser.py # Bank-specific parsers
โ โโโ gpay_parser.py
โ โโโ utils/ # Shared helpers
โโโ media/
โ โโโ sample_statement.pdf # Example input
โโโ output/
โ โโโ transactions.csv
โ โโโ transactions.json
โโโ pyproject.toml # Build system & CLI entry config
โโโ requirements.txt
โโโ README.md
```
## โ๏ธ Setup
### 1๏ธโฃ Clone the repo
```bash
git clone https://github.com/ibnu-umer/finance-parser.git
cd finance-parser
```
### 2๏ธโฃ Install dependencies
```bash
pip install -r requirements.txt
```
### 3๏ธโฃ Add your statement PDF
Place your bank statement (e.g., GPay, Canara) inside the `media/` folder.
## ๐งฉ Usage
### Basic Command
```bash
python -m finance_parser --file "media/canara_statement.pdf" --type canara --format csv
```
Or, if installed as a package:
```bash
finance-parser --file "media/canara_statement.pdf" --type canara --format csv
```
## โ๏ธ CLI Options
| Flag | Description | Example |
|------|--------------|---------|
| `-f`, `--file` | Path to PDF file | `--file media/canara_statement.pdf` |
| `-t`, `--type` | Bank/statement type (`gpay`, `canara`, etc.) | `--type canara` |
| `-o`, `--output` | Output folder | `--output output/` |
| `--format` | Output format (`csv`, `json`, or `both`) | `--format both` |
| `-p`, `--privacy` | Processing mode (`raw`, `clean`, or `masked`) | `--privacy clean` |
Example:
```bash
finance-parser --file media/canara_statement.pdf --type canara --format both --privacy masked
```
## ๐ง How It Works
1. Detects and reads statement text using Camelot or pdfplumber.
2. Chooses the correct parser based on `--type`.
3. Extracts structured transaction data (date, description, debit/credit, balance, etc.).
4. Applies normalization, masking, or cleaning if requested.
5. Outputs the data in CSV or JSON formats.
## ๐งฐ Dependencies
- camelot-py / pdfplumber โ PDF parsing
- pandas โ Data manipulation
- argparse โ Command-line interface
- re โ Regex-based parsing
Install manually if needed:
```bash
pip install camelot-py pdfplumber pandas
```
## ๐งผ Output
### GPay
- `date` โ Transaction date
- `time` โ Transaction time
- `type` โ Credit/Debit
- `payee` โ Counterparty / Payee name
- `txn_id` โ UPI Transaction ID
- `account` โ Account
- `amount` โ Transaction amount
### Canara
- `date` โ Transaction date
- `time` โ Transaction time
- `txn_type` โ Credit/Debit
- `mode` โ UPI, NEFT, IMPS, etc.
- `txn_id` โ Transaction ID (for UPI/IMPS)
- `bank_code` โ 4-letter bank code
- `payee` โ Counterparty / Payee name
- `upi_id` โ UPI ID if available
- `amount` โ Transaction amount
- `balance` โ Account balance after transaction
- `cheque_no` โ Cheque number if present
## ๐ฅง Sensitive Fields
Some transaction fields contain sensitive information. These are handled differently depending on the output mode.
### Sensitive Fields by Source
- **Canara Bank**
- `upi_id`
- `txn_id`
- `cheque_no`
- **GPay**
- `txn_id`
### Output Modes
1. **Raw**
- All columns are included.
- Sensitive fields are **not masked**.
2. **Masked**
- All columns are included.
- Sensitive fields are **masked** (partial hiding of UPI IDs, txn IDs, cheque numbers).
3. **Clean**
- All sensitive fields are **dropped** from the output.
- Only non-sensitive columns remain.
This ensures privacy while maintaining flexibility for analysis.