An open API service indexing awesome lists of open source software.

https://github.com/fahadnasir13/financial_data-analyzer_tool

A Python-based framework for analyzing, cleaning, and reconciling financial data stored in Excel workbooks.
https://github.com/fahadnasir13/financial_data-analyzer_tool

data-analysis excel financial python store

Last synced: 8 days ago
JSON representation

A Python-based framework for analyzing, cleaning, and reconciling financial data stored in Excel workbooks.

Awesome Lists containing this project

README

          

📊 Financial Data Analyzer & Reconciliation Tool

A Python-based framework for **analyzing, cleaning, and reconciling financial data** stored in Excel workbooks.

It not only parses complex financial formats (currencies, dates, shorthand like `1.5M`, etc.) but also provides advanced reconciliation features:

- ✅ **Direct Matching** – Identify one-to-one matches between transactions and targets
- 🔢 **Subset Sum Matching** – Detect combinations of transactions that add up to a target amount
- 🤖 **Machine Learning & Heuristics** – Optimized dynamic programming, genetic algorithms, and fuzzy similarity scoring
- 📈 **Performance Benchmarking** – Compare brute force vs. optimized methods across dataset sizes
- 📝 **Excel Reporting** – Clean reports with matched transactions, targets, and differences

---

## 🔧 Features

### Data Parsing & Cleaning
- Handles multiple currency formats: `$1,234.56`, `(2,500.00)`, `€1.234,56`, `₹1,23,456.78`
- Understands shorthand notations: `1.5M`, `2B`, etc.
- Parses dates in multiple formats: `MM/DD/YYYY`, `DD/MM/YYYY`, `Q4 2023`, Excel serials
- Cleans data into **standardized floats** (amounts) and **ISO 8601 dates**

### Reconciliation Engine
- **Direct Matching**: Exact 1-to-1 matches
- **Subset Sum Analysis**:
- *Brute Force*: Tests all combinations (small datasets only)
- *Dynamic Programming*: Efficient exact matching for medium datasets
- *Genetic Algorithm*: Heuristic search for large datasets
- **Fuzzy Matching**: String similarity + amount tolerance for approximate reconciliation

### Benchmarking
- Compare execution time across methods (brute force vs. DP vs. GA)
- Visualize scaling performance with runtime plots

### Reporting
- Generates Excel output with:
- Cleaned transactions
- Cleaned targets
- Match reports (transactions, target IDs, match type, differences, etc.)

---

## 📂 Project Structure

Financial-data-analyzer/
├── src/
│ ├── parser.py # Data loading & cleaning
│ ├── recon.py # Reconciliation engine (exact, brute, DP, GA, fuzzy)
│ └── main.py # Main orchestrator script

├── examples/
│ └── sample_data.xlsx # Example input workbook

├── output/
│ ├── recon_results.xlsx # Reconciliation results
│ └── benchmark_plots/ # Performance graphs

├── tests/
│ ├── test_parser.py
│ └── test_recon.py

├── requirements.txt
└── README.md

---

## ⚙️ Installation

Clone the repository:

```bash
git clone https://github.com/your-username/financial-data-analyzer.git
cd financial-data-analyzer

python -m venv venv
# On Linux/Mac
source venv/bin/activate
# On Windows
venv\Scripts\activate

pip install -r requirements.txt
📑 Usage
Prepare Input Excel

Sheet1 (Transactions)

Column A: Transaction Amount (e.g., 150.00)

Column B: Description (e.g., "Invoice #001")

Sheet2 (Targets)

Column C: Target Amount (e.g., 225.50)

Column D: Reference ID (e.g., "REF001")

Run the Analyzer

python src/main.py --input examples/sample_data.xlsx --output output/recon_results.xlsx

View Results

Open output/recon_results.xlsx → includes sheets:

Transactions_Clean – standardized transactions

Targets_Clean – standardized targets

Matches – reconciliation results

Example:

| target\_id | ref\_id | target | match\_type | txn\_ids | txn\_amounts | sum\_amount | diff |
| ---------- | ------- | ------ | ------------- | --------------- | ---------------- | ----------- | ---- |
| TGT0001 | REF001 | 225.50 | exact\_1to1 | \[TXN0003] | \[225.50] | 225.50 | 0.0 |
| TGT0002 | REF002 | 300.00 | brute\_subset | \[TXN0005,TXN7] | \[150.00,150.00] | 300.00 | 0.0 |
| TGT0003 | REF003 | 450.75 | ga\_subset | \[TXN0010,...] | \[200.25,250.50] | 450.75 | 0.0 |

🔬 Methods Compared

| Method | Strengths | Weaknesses | Best Use Case |
| -------------- | -------------------------- | ------------------------ | ------------------------ |
| Exact Match | Fast & simple | Only 1:1 matches | Small exact checks |
| Brute Force | Guaranteed if feasible | Exponential time | Very small datasets |
| Dynamic Prog. | Efficient, exact | Works best with integers | Medium datasets |
| Genetic Algo | Scales, finds near-exact | Approximate, stochastic | Large datasets |
| Fuzzy Matching | Handles noisy descriptions | Approximate only | Name/desc reconciliation |

🛠 Configuration

Tunable parameters in main.py:

brute_max_subset_size: max subset size for brute force (default: 4)

brute_time_budget_s: per-target time limit (default: 1.0s)

dp_candidate_limit: pool size for DP (default: 25)

ga_candidate_limit: pool size for GA (default: 30)

✅ Roadmap

Add web dashboard for interactive reconciliation

Database integration (Postgres, SQL Server)

ML classifier for predictive reconciliation

Batch-processing for enterprise-scale datasets

👨‍💻 Contributing

Contributions welcome!

Fork the repo

Create a feature branch

Submit a PR with details

📜 License

MIT License – free to use, modify, and distribute.