https://github.com/fahadnasir13/financial_data-analyzer_tool
A Python-based framework for analyzing, cleaning, and reconciling financial data stored in Excel workbooks.
https://github.com/fahadnasir13/financial_data-analyzer_tool
data-analysis excel financial python store
Last synced: 8 days ago
JSON representation
A Python-based framework for analyzing, cleaning, and reconciling financial data stored in Excel workbooks.
- Host: GitHub
- URL: https://github.com/fahadnasir13/financial_data-analyzer_tool
- Owner: fahadnasir13
- Created: 2025-08-18T17:11:17.000Z (10 months ago)
- Default Branch: main
- Last Pushed: 2025-08-18T17:24:12.000Z (10 months ago)
- Last Synced: 2025-08-18T19:23:06.602Z (10 months ago)
- Topics: data-analysis, excel, financial, python, store
- Language: Python
- Homepage:
- Size: 1.35 MB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
📊 Financial Data Analyzer & Reconciliation Tool
A Python-based framework for **analyzing, cleaning, and reconciling financial data** stored in Excel workbooks.
It not only parses complex financial formats (currencies, dates, shorthand like `1.5M`, etc.) but also provides advanced reconciliation features:
- ✅ **Direct Matching** – Identify one-to-one matches between transactions and targets
- 🔢 **Subset Sum Matching** – Detect combinations of transactions that add up to a target amount
- 🤖 **Machine Learning & Heuristics** – Optimized dynamic programming, genetic algorithms, and fuzzy similarity scoring
- 📈 **Performance Benchmarking** – Compare brute force vs. optimized methods across dataset sizes
- 📝 **Excel Reporting** – Clean reports with matched transactions, targets, and differences
---
## 🔧 Features
### Data Parsing & Cleaning
- Handles multiple currency formats: `$1,234.56`, `(2,500.00)`, `€1.234,56`, `₹1,23,456.78`
- Understands shorthand notations: `1.5M`, `2B`, etc.
- Parses dates in multiple formats: `MM/DD/YYYY`, `DD/MM/YYYY`, `Q4 2023`, Excel serials
- Cleans data into **standardized floats** (amounts) and **ISO 8601 dates**
### Reconciliation Engine
- **Direct Matching**: Exact 1-to-1 matches
- **Subset Sum Analysis**:
- *Brute Force*: Tests all combinations (small datasets only)
- *Dynamic Programming*: Efficient exact matching for medium datasets
- *Genetic Algorithm*: Heuristic search for large datasets
- **Fuzzy Matching**: String similarity + amount tolerance for approximate reconciliation
### Benchmarking
- Compare execution time across methods (brute force vs. DP vs. GA)
- Visualize scaling performance with runtime plots
### Reporting
- Generates Excel output with:
- Cleaned transactions
- Cleaned targets
- Match reports (transactions, target IDs, match type, differences, etc.)
---
## 📂 Project Structure
Financial-data-analyzer/
├── src/
│ ├── parser.py # Data loading & cleaning
│ ├── recon.py # Reconciliation engine (exact, brute, DP, GA, fuzzy)
│ └── main.py # Main orchestrator script
│
├── examples/
│ └── sample_data.xlsx # Example input workbook
│
├── output/
│ ├── recon_results.xlsx # Reconciliation results
│ └── benchmark_plots/ # Performance graphs
│
├── tests/
│ ├── test_parser.py
│ └── test_recon.py
│
├── requirements.txt
└── README.md
---
## ⚙️ Installation
Clone the repository:
```bash
git clone https://github.com/your-username/financial-data-analyzer.git
cd financial-data-analyzer
python -m venv venv
# On Linux/Mac
source venv/bin/activate
# On Windows
venv\Scripts\activate
pip install -r requirements.txt
📑 Usage
Prepare Input Excel
Sheet1 (Transactions)
Column A: Transaction Amount (e.g., 150.00)
Column B: Description (e.g., "Invoice #001")
Sheet2 (Targets)
Column C: Target Amount (e.g., 225.50)
Column D: Reference ID (e.g., "REF001")
Run the Analyzer
python src/main.py --input examples/sample_data.xlsx --output output/recon_results.xlsx
View Results
Open output/recon_results.xlsx → includes sheets:
Transactions_Clean – standardized transactions
Targets_Clean – standardized targets
Matches – reconciliation results
Example:
| target\_id | ref\_id | target | match\_type | txn\_ids | txn\_amounts | sum\_amount | diff |
| ---------- | ------- | ------ | ------------- | --------------- | ---------------- | ----------- | ---- |
| TGT0001 | REF001 | 225.50 | exact\_1to1 | \[TXN0003] | \[225.50] | 225.50 | 0.0 |
| TGT0002 | REF002 | 300.00 | brute\_subset | \[TXN0005,TXN7] | \[150.00,150.00] | 300.00 | 0.0 |
| TGT0003 | REF003 | 450.75 | ga\_subset | \[TXN0010,...] | \[200.25,250.50] | 450.75 | 0.0 |
🔬 Methods Compared
| Method | Strengths | Weaknesses | Best Use Case |
| -------------- | -------------------------- | ------------------------ | ------------------------ |
| Exact Match | Fast & simple | Only 1:1 matches | Small exact checks |
| Brute Force | Guaranteed if feasible | Exponential time | Very small datasets |
| Dynamic Prog. | Efficient, exact | Works best with integers | Medium datasets |
| Genetic Algo | Scales, finds near-exact | Approximate, stochastic | Large datasets |
| Fuzzy Matching | Handles noisy descriptions | Approximate only | Name/desc reconciliation |
🛠 Configuration
Tunable parameters in main.py:
brute_max_subset_size: max subset size for brute force (default: 4)
brute_time_budget_s: per-target time limit (default: 1.0s)
dp_candidate_limit: pool size for DP (default: 25)
ga_candidate_limit: pool size for GA (default: 30)
✅ Roadmap
Add web dashboard for interactive reconciliation
Database integration (Postgres, SQL Server)
ML classifier for predictive reconciliation
Batch-processing for enterprise-scale datasets
👨💻 Contributing
Contributions welcome!
Fork the repo
Create a feature branch
Submit a PR with details
📜 License
MIT License – free to use, modify, and distribute.