An open API service indexing awesome lists of open source software.

https://github.com/code-str8/customer-frauds-detection

fraud detection challenge for STEG (Tunisian Company of Electricity and Gas) focused on identifying fraudulent meter manipulation through billing history data
https://github.com/code-str8/customer-frauds-detection

api binaryclassification data-science docker machine-learning notebook-jupyter python streamlit-webapp

Last synced: 3 months ago
JSON representation

fraud detection challenge for STEG (Tunisian Company of Electricity and Gas) focused on identifying fraudulent meter manipulation through billing history data

Awesome Lists containing this project

README

          

# Customer Frauds Detection ๐Ÿ”โšก

This project addresses a fraud detection challenge for STEG (Tunisian Company of Electricity and Gas), focusing on identifying fraudulent meter manipulation through billing history data.

## ๐Ÿ“‹ Table of Contents

- [Introduction](#introduction)
- [Dataset](#dataset)
- [Installation](#installation)
- [Usage](#usage)
- [Streamlit App](#streamlit-app)
- [API Usage](#api-usage)
- [Model Training](#model-training)
- [API Documentation](#api-documentation)
- [Methodology](#methodology)
- [Results](#results)
- [Challenges & Trade-offs](#challenges--trade-offs)
- [Future Work](#future-work)
- [Contributing](#contributing)
- [License](#license)

## ๐Ÿš€ Introduction

The goal of this project is to develop a machine learning model that can accurately detect fraudulent activities in electricity and gas consumption. By analyzing historical billing data, the model aims to help STEG reduce losses due to fraud.

## ๐Ÿ’พ Dataset

The dataset consists of historical billing data, including features such as client ID, invoice date, consumption levels, and counter types. The target variable indicates whether a client is fraudulent or not.

## ๐Ÿ› ๏ธ Installation

To run this project, you need to have Python installed along with the required libraries. You can install the dependencies using the following command:

```bash
pip install -r requirements.txt
```

## ๐Ÿš€ Usage

### ๐Ÿ’ซ Streamlit App

We've developed an interactive Streamlit application that provides a user-friendly interface for fraud detection:

1. Start the Streamlit app:
```bash
streamlit run 1_Welcome.py
```

2. Login credentials:
- Username: admin
- Password: Admin01

The app includes several features:

#### ๐Ÿ  Welcome Page
![Welcome Page](images/app%201.PNG)
![Welcome Page](images/app%202.PNG)
![Welcome Page](images/app%203.PNG)
A welcoming interface introducing the fraud detection system.

#### ๐Ÿ“š Data Explorer
![Data Explorer](images/app%204.PNG)
![Data Explorer](images/app%205.PNG)
![Data Explorer](images/app%206.PNG)
![Data Explorer](images/app%207.PNG)
Explore and analyze the dataset with interactive visualizations.

#### ๐Ÿ”ฎ Prediction Interface
![Prediction Form](images/app%208.PNG)
![Prediction Form](images/app%209.PNG)
![Prediction Form](images/app%2010.PNG)
Easy-to-use form for making fraud predictions:
- Input transaction details
- Choose between models
- Get instant predictions

![Prediction Results](images/app%2011.PNG)
Detailed prediction results with confidence scores.

#### โณ History Tracking

![Prediction History](images/app%2012.PNG)
Track and analyze prediction history:
- View all past predictions
- Analyze trends
- Export results
### ๐Ÿ”„ API Usage

1. Clone the repository:
```bash
git clone https://github.com/yourusername/customer-frauds-detection.git
```

2. Navigate to the project directory:
```bash
cd customer-frauds-detection
```

3. Start the API server:
```bash
uvicorn api:app
```

4. Open your browser and navigate to:
```
http://127.0.0.1:8000/docs
```

The API provides two main endpoints โœจ:
- `/stacked/predict`: Uses the stacked ensemble model
- `/xgb/predict`: Uses the XGBoost model

![API Documentation UI](images/api%201.PNG)
![API Documentation UI](images/api%202.PNG)

Example of making predictions using the API:

![API Prediction Example - Stacked Model](images/stacked%20api%203.PNG)
![API Prediction Example - Stacked Model](images/stacked%20api%204.PNG)
![API Prediction Example - XGBoost Model](images/xgb%20api%205.PNG)
![API Prediction Example - XGBoost Model](images/xgb%20api%206.PNG)

### ๐Ÿค– Model Training

To train or experiment with the models:

1. Run the Jupyter Notebook:
```bash
jupyter notebook fraud_detection.ipynb
```

## ๐Ÿ“ API Documentation

The API accepts the following input parameters:

```json
{
"counter_number": int,
"account_age_days": int,
"new_index": int,
"old_index": int,
"consumption_level_1": float,
"counter_coefficient": float,
"client_catg": int,
"invoice_year": int,
"creation_year": int,
"creation_month": int
}
```

Response format:
```json
{
"prediction": int, // 0 or 1
"probability": string, // percentage
"prediction_text": string // "Fraudulent" or "Non-Fraudulent"
}
```

Example curl request:
```bash
curl -X 'POST' \
'http://127.0.0.1:8000/stacked/predict' \
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-d '{
"counter_number": 0,
"account_age_days": 0,
"new_index": 0,
"old_index": 0,
"consumption_level_1": 0,
"counter_coefficient": 0,
"client_catg": 0,
"invoice_year": 2015,
"creation_year": 0,
"creation_month": 6
}'
```

## ๐Ÿ—๏ธ Methodology

The project follows these steps:
1. **Data Preprocessing** ๐Ÿ“Š: Cleaning and transforming the data to make it suitable for modeling.
2. **Exploratory Data Analysis (EDA)** ๐Ÿ“Š: Visualizing data distributions and relationships.
3. **Feature Engineering** โœจ: Creating new features and selecting the most relevant ones.
4. **Modeling** ๐Ÿค–: Training various machine learning models, including ensemble methods.
5. **Evaluation** ๐Ÿ“ˆ: Assessing model performance using cross-validation and ROC AUC scores.

## ๐Ÿ“ˆ Results

The stacked model, combining XGBoost, Extra Trees, and Random Forest, achieved the best performance with an AUC of 0.83. This indicates a strong ability to distinguish between fraudulent and non-fraudulent clients.

## โš ๏ธ Challenges & Trade-offs

- **Hyperparameter Tuning** โš™๏ธ: Limited by computational resources, which could have improved model robustness.
- **High Variance** ๐Ÿ“Š: Models performed well on training data but showed lower performance on testing data.
- **Class Imbalance** โš–๏ธ: Addressed through resampling techniques to ensure balanced training data.

### ๐ŸŽ‰Deployment

- Streamlit: [Streamlit app](https://customer-frauds-detection.streamlit.app/)

### Article

- Medium article: [Article](https://medium.com/@Codestr8/building-a-fraud-detection-system-for-utility-companies-a-complete-guide-969d9cc0a151)

- Power BI: [PowerBI Dashboard](https://app.powerbi.com/view?r=eyJrIjoiODM4OWE0ZTMtN2ZkMC00YTJhLTg1ZTYtZmNjZjdhYWQwNjIwIiwidCI6IjQ0ODdiNTJmLWYxMTgtNDgzMC1iNDlkLTNjMjk4Y2I3MTA3NSJ9)

## ๐Ÿ”ฎ Future Work

- โœ… **API Development**: Implemented a FastAPI-based REST API for real-time fraud detection.
- ๐Ÿ”„ **Model Updates**: Regular retraining with new data to maintain accuracy.
- โšก **Performance Optimization**: Further API optimization for higher throughput.
- ๐Ÿ“Š **Monitoring**: Add model performance monitoring and drift detection.

## ๐Ÿค Contributing

Contributions are welcome! Please fork the repository and submit a pull request for any improvements or bug fixes.

## ๐Ÿ“œ License

This project is licensed under the MIT License. See the [LICENSE](LICENSE) file for more details.