https://github.com/code-str8/customer-frauds-detection
fraud detection challenge for STEG (Tunisian Company of Electricity and Gas) focused on identifying fraudulent meter manipulation through billing history data
https://github.com/code-str8/customer-frauds-detection
api binaryclassification data-science docker machine-learning notebook-jupyter python streamlit-webapp
Last synced: 3 months ago
JSON representation
fraud detection challenge for STEG (Tunisian Company of Electricity and Gas) focused on identifying fraudulent meter manipulation through billing history data
- Host: GitHub
- URL: https://github.com/code-str8/customer-frauds-detection
- Owner: Code-str8
- License: mit
- Created: 2025-02-06T13:26:02.000Z (11 months ago)
- Default Branch: main
- Last Pushed: 2025-03-21T12:54:13.000Z (10 months ago)
- Last Synced: 2025-03-21T13:54:59.008Z (10 months ago)
- Topics: api, binaryclassification, data-science, docker, machine-learning, notebook-jupyter, python, streamlit-webapp
- Language: Jupyter Notebook
- Homepage:
- Size: 33.2 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Customer Frauds Detection ๐โก
This project addresses a fraud detection challenge for STEG (Tunisian Company of Electricity and Gas), focusing on identifying fraudulent meter manipulation through billing history data.
## ๐ Table of Contents
- [Introduction](#introduction)
- [Dataset](#dataset)
- [Installation](#installation)
- [Usage](#usage)
- [Streamlit App](#streamlit-app)
- [API Usage](#api-usage)
- [Model Training](#model-training)
- [API Documentation](#api-documentation)
- [Methodology](#methodology)
- [Results](#results)
- [Challenges & Trade-offs](#challenges--trade-offs)
- [Future Work](#future-work)
- [Contributing](#contributing)
- [License](#license)
## ๐ Introduction
The goal of this project is to develop a machine learning model that can accurately detect fraudulent activities in electricity and gas consumption. By analyzing historical billing data, the model aims to help STEG reduce losses due to fraud.
## ๐พ Dataset
The dataset consists of historical billing data, including features such as client ID, invoice date, consumption levels, and counter types. The target variable indicates whether a client is fraudulent or not.
## ๐ ๏ธ Installation
To run this project, you need to have Python installed along with the required libraries. You can install the dependencies using the following command:
```bash
pip install -r requirements.txt
```
## ๐ Usage
### ๐ซ Streamlit App
We've developed an interactive Streamlit application that provides a user-friendly interface for fraud detection:
1. Start the Streamlit app:
```bash
streamlit run 1_Welcome.py
```
2. Login credentials:
- Username: admin
- Password: Admin01
The app includes several features:
#### ๐ Welcome Page



A welcoming interface introducing the fraud detection system.
#### ๐ Data Explorer




Explore and analyze the dataset with interactive visualizations.
#### ๐ฎ Prediction Interface



Easy-to-use form for making fraud predictions:
- Input transaction details
- Choose between models
- Get instant predictions

Detailed prediction results with confidence scores.
#### โณ History Tracking

Track and analyze prediction history:
- View all past predictions
- Analyze trends
- Export results
### ๐ API Usage
1. Clone the repository:
```bash
git clone https://github.com/yourusername/customer-frauds-detection.git
```
2. Navigate to the project directory:
```bash
cd customer-frauds-detection
```
3. Start the API server:
```bash
uvicorn api:app
```
4. Open your browser and navigate to:
```
http://127.0.0.1:8000/docs
```
The API provides two main endpoints โจ:
- `/stacked/predict`: Uses the stacked ensemble model
- `/xgb/predict`: Uses the XGBoost model


Example of making predictions using the API:




### ๐ค Model Training
To train or experiment with the models:
1. Run the Jupyter Notebook:
```bash
jupyter notebook fraud_detection.ipynb
```
## ๐ API Documentation
The API accepts the following input parameters:
```json
{
"counter_number": int,
"account_age_days": int,
"new_index": int,
"old_index": int,
"consumption_level_1": float,
"counter_coefficient": float,
"client_catg": int,
"invoice_year": int,
"creation_year": int,
"creation_month": int
}
```
Response format:
```json
{
"prediction": int, // 0 or 1
"probability": string, // percentage
"prediction_text": string // "Fraudulent" or "Non-Fraudulent"
}
```
Example curl request:
```bash
curl -X 'POST' \
'http://127.0.0.1:8000/stacked/predict' \
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-d '{
"counter_number": 0,
"account_age_days": 0,
"new_index": 0,
"old_index": 0,
"consumption_level_1": 0,
"counter_coefficient": 0,
"client_catg": 0,
"invoice_year": 2015,
"creation_year": 0,
"creation_month": 6
}'
```
## ๐๏ธ Methodology
The project follows these steps:
1. **Data Preprocessing** ๐: Cleaning and transforming the data to make it suitable for modeling.
2. **Exploratory Data Analysis (EDA)** ๐: Visualizing data distributions and relationships.
3. **Feature Engineering** โจ: Creating new features and selecting the most relevant ones.
4. **Modeling** ๐ค: Training various machine learning models, including ensemble methods.
5. **Evaluation** ๐: Assessing model performance using cross-validation and ROC AUC scores.
## ๐ Results
The stacked model, combining XGBoost, Extra Trees, and Random Forest, achieved the best performance with an AUC of 0.83. This indicates a strong ability to distinguish between fraudulent and non-fraudulent clients.
## โ ๏ธ Challenges & Trade-offs
- **Hyperparameter Tuning** โ๏ธ: Limited by computational resources, which could have improved model robustness.
- **High Variance** ๐: Models performed well on training data but showed lower performance on testing data.
- **Class Imbalance** โ๏ธ: Addressed through resampling techniques to ensure balanced training data.
### ๐Deployment
- Streamlit: [Streamlit app](https://customer-frauds-detection.streamlit.app/)
### Article
- Medium article: [Article](https://medium.com/@Codestr8/building-a-fraud-detection-system-for-utility-companies-a-complete-guide-969d9cc0a151)
- Power BI: [PowerBI Dashboard](https://app.powerbi.com/view?r=eyJrIjoiODM4OWE0ZTMtN2ZkMC00YTJhLTg1ZTYtZmNjZjdhYWQwNjIwIiwidCI6IjQ0ODdiNTJmLWYxMTgtNDgzMC1iNDlkLTNjMjk4Y2I3MTA3NSJ9)
## ๐ฎ Future Work
- โ
**API Development**: Implemented a FastAPI-based REST API for real-time fraud detection.
- ๐ **Model Updates**: Regular retraining with new data to maintain accuracy.
- โก **Performance Optimization**: Further API optimization for higher throughput.
- ๐ **Monitoring**: Add model performance monitoring and drift detection.
## ๐ค Contributing
Contributions are welcome! Please fork the repository and submit a pull request for any improvements or bug fixes.
## ๐ License
This project is licensed under the MIT License. See the [LICENSE](LICENSE) file for more details.