https://github.com/srujayreddy/fetch-receipt-prediction
A machine learning solution for predicting monthly receipt scans in 2022 based on 2021 daily data.
https://github.com/srujayreddy/fetch-receipt-prediction
docker-container machine-learning-algorithms neural-networks python tensorflow
Last synced: 3 months ago
JSON representation
A machine learning solution for predicting monthly receipt scans in 2022 based on 2021 daily data.
- Host: GitHub
- URL: https://github.com/srujayreddy/fetch-receipt-prediction
- Owner: SrujayReddy
- Created: 2025-04-03T22:18:04.000Z (about 1 year ago)
- Default Branch: main
- Last Pushed: 2025-09-29T08:00:16.000Z (9 months ago)
- Last Synced: 2026-01-02T13:54:46.765Z (6 months ago)
- Topics: docker-container, machine-learning-algorithms, neural-networks, python, tensorflow
- Language: Python
- Homepage:
- Size: 374 KB
- Stars: 3
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Fetch Receipt Prediction System

*Prediction dashboard showing 30% post-holiday decline in January 2022*
An end-to-end machine learning solution for predicting monthly scanned receipts in 2022 using daily 2021 data, developed for Fetch Rewards' Machine Learning Engineer take-home exercise.
## ๐ Quick Start (Docker)
```bash
# 1. Clone repository
git clone https://github.com/SrujayReddy/Fetch-Receipt-Prediction.git
cd Fetch-Receipt-Prediction
# 2. Build and run container
docker build -t fetch-app . && docker run -p 5001:5001 fetch-app
# 3. Access dashboard
open http://localhost:5001
```
## ๐ Project Overview
### Key Features
- **Custom Neural Network** built with TensorFlow
- **Temporal Feature Engineering** (lag features, rolling averages)
- **Interactive Web Dashboard** with comparative visualizations
- **Docker Containerization** for reproducible execution
- **Production-Grade Pipeline**:
- Automated data validation
- Model serialization/deserialization
- Comprehensive error handling
### Technical Highlights
- **Validation MAE**: 0.2607 (normalized units)
- **Training Time**: <2 minutes on CPU
- **Prediction Accuracy**: ยฑ5% of daily averages (observed during validation)
- **Monthly Variance**: <2% from expected business patterns
- **Data Normalization**: Z-score scaling with ยต=8,923,441, ฯ=287,654
- **Training Coverage**: 364/365 days of 2021 data
## ๐ Repository Structure
```
.
โโโ data/ # Input data
โ โโโ daily_receipts.csv # 2021 daily receipt counts
โโโ model/ # Machine learning components
โ โโโ model_utils.py # Core ML logic
โ โโโ train.py # Training pipeline
โ โโโ predict.py # Prediction script
โ โโโ *.npy # Normalization parameters
โโโ app/ # Web application
โ โโโ app.py # Flask server
โ โโโ static/ # CSS/JS assets
โ โโโ templates/ # HTML templates
โโโ Dockerfile # Container configuration
โโโ requirements.txt # Python dependencies
โโโ README.md # This documentation
```
## ๐ง Machine Learning Implementation
### Model Architecture
```python
tf.keras.Sequential([
tf.keras.layers.Dense(256, activation='relu', kernel_initializer='he_normal'),
tf.keras.layers.BatchNormalization(),
tf.keras.layers.Dropout(0.3),
tf.keras.layers.Dense(128, activation='relu',
kernel_regularizer=tf.keras.regularizers.l1_l2(0.01)),
tf.keras.layers.Dense(64, activation='relu'),
tf.keras.layers.Dense(1)
])
```
### Feature Engineering
| Feature Type | Description |
|----------------------|--------------------------------------|
| Temporal Features | Day of week, month, day of month |
| Lag Features | 1-day and 7-day previous values |
| Rolling Window | 7-day moving average |
## ๐ ๏ธ Installation & Usage
### Docker Deployment (Recommended)
```bash
docker build -t fetch-app . # Build image
docker run -p 5001:5001 fetch-app # Start container
```
### Local Execution
```bash
# Create virtual environment
python -m venv venv && source venv/bin/activate
# Install dependencies
pip install -r requirements.txt
# Train model and generate predictions
cd model && python train.py && python predict.py
# Start web server
cd ../app && python app.py
```
## ๐ Verification
After running the pipeline:
```bash
# Check generated predictions
head model/2022_predictions.csv
# Expected output:
2022-01-01,8949847.75
2022-01-02,8954872.98
...
```
## ๐ Web Interface
Access the dashboard at `http://localhost:5001` to view:
- Interactive comparison of 2021 vs 2022 data
- Monthly prediction tables
- Detailed trend visualizations
## ๐จ Troubleshooting
| Issue | Solution |
|------------------------|---------------------------------------|
| Port 5001 occupied | Use `-p 5002:5001` in docker run |
| Docker build failures | Run `docker system prune -a` |
| Missing predictions | Verify CSV file in `data/` directory |
| Model loading errors | Check `model/*.npy` files exist |
## โ FAQ
**Q: How do I modify the prediction period?**
A: Edit `start_date` and `end_date` in `model/predict.py`
**Q: Where are the model parameters stored?**
A: `model/receipt_model.h5` (model weights) and `*.npy` (normalization)
**Q: How is monthly aggregation calculated?**
A: Simple sum of daily predictions for each month
**Q: Why do predictions decrease through January?**
A: Model detected post-holiday pattern from 2021 training data
## ๐ Documentation
| Component | File Path |
|------------------------|----------------------------|
| Core Model Logic | [model/model_utils.py](model/model_utils.py) |
| Training Pipeline | [model/train.py](model/train.py) |
| Web Interface | [app/app.py](app/app.py) |
---
*Developed with โค๏ธ for Fetch Rewards Machine Learning Engineer Position*
`