Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/anidipta/risk-sense
Risk Sense : Next-Gen Fraud Predictor
https://github.com/anidipta/risk-sense
convolutional-neural-networks machine-learning machine-learning-algorithms numpy python streamlit
Last synced: 5 days ago
JSON representation
Risk Sense : Next-Gen Fraud Predictor
- Host: GitHub
- URL: https://github.com/anidipta/risk-sense
- Owner: Anidipta
- License: apache-2.0
- Created: 2024-06-30T09:42:53.000Z (6 months ago)
- Default Branch: main
- Last Pushed: 2024-12-20T06:07:30.000Z (15 days ago)
- Last Synced: 2024-12-20T06:25:36.706Z (15 days ago)
- Topics: convolutional-neural-networks, machine-learning, machine-learning-algorithms, numpy, python, streamlit
- Language: Jupyter Notebook
- Homepage: https://risksense.streamlit.app/
- Size: 2.84 MB
- Stars: 4
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# 📊 **Risk Sense : Next-Gen Fraud Predictor**
## **Dataset Overview**
This dataset is specifically designed for developing and evaluating machine learning models focused on **fraud detection** in financial transactions. It contains **6.3 million** rows of simulated transactional data, offering a comprehensive foundation for building and testing models to detect fraudulent activities.## **Columns in the Dataset:**
1. **⏳ `step`**: Represents a unit of time where 1 step equals **1 hour**.
2. **💳 `type`**: The **type** of transaction, including the following categories:
- `CASH_IN`
- `CASH_OUT`
- `DEBIT`
- `PAYMENT`
- `TRANSFER`
3. **💰 `amount`**: The **monetary value** of the transaction.
4. **🏦 `oldbalanceOrg`**: The initial **balance of the origin account** before the transaction.
5. **🏦 `newbalanceOrig`**: The updated **balance of the origin account** after the transaction.
6. **🏦 `oldbalanceDest`**: The initial **balance of the destination account** before the transaction.
7. **🏦 `newbalanceDest`**: The updated **balance of the destination account** after the transaction.
8. **🔄 `changebalanceOrg`**: The **change in balance** for the origin account following the transaction.
9. **🚨 `isFraud`**: A **binary indicator** (0 or 1) where:
- `1`: Fraudulent transaction
- `0`: Non-fraudulent transaction---
## **Dataset Summary:**
- **📈 Total Rows**: 6,300,000
- **📊 Total Columns**: 9### **Key Features:**
- **🔢 Transaction Types**: A variety of transaction types, making the dataset suitable for modeling different fraud scenarios.
- **💳 Balance Changes**: Tracks balance changes before and after transactions for both origin and destination accounts.
- **🚨 Fraud Indicator**: Essential for supervised learning, helping models identify fraudulent behavior.---
## **Potential Use Cases:**
1. **🔍 Supervised Learning for Fraud Detection**:
- Train models to predict the likelihood of fraudulent transactions.
- Develop predictive models using **binary classification** techniques such as decision trees, XGBoost, or neural networks.2. **📉 Pattern Analysis**:
- Analyze transaction patterns that indicate potential fraud.
- Discover features such as **unusual amounts** or rapid **balance changes** that could suggest fraudulent activities.3. **🛠️ Feature Engineering**:
- Create new features to improve the performance of models, such as:
- Transaction frequency over time.
- Account balance changes relative to previous transactions.
- Time-based behavior patterns for accounts.---
## **How to Run:**
Access our **final fraud detection model** [here](https://drive.google.com/file/d/1P2HRWjud5vZ3E5PRUhqvywu9UHo8xttO/view?usp=sharing).
### **Live Demo**:
No download is needed! **Try the fraud detection model live** at [Risk Sense](https://risksense.streamlit.app/).---
## **Data Source & Collection Method:**
The dataset is a **simulated representation** of real-world financial transactions. Each row represents a transaction with attributes designed to mimic actual banking behavior. The simulation includes both **fraudulent** and **non-fraudulent transactions**, providing a diverse environment for model training and evaluation.
---
## **Challenges and Considerations:**
- **⚖️ Class Imbalance**:
Fraudulent transactions are significantly fewer than non-fraudulent ones, which could result in **model bias** towards predicting non-fraudulent transactions. This can be mitigated by:
- **Resampling techniques** such as **SMOTE** or **ADASYN**.
- Using **ensemble models** that can better handle imbalance, like **Random Forest** or **XGBoost**.- **🔒 Data Privacy**:
While this dataset is simulated, it mimics the structure of real-world transactional data, which can still be useful for creating privacy-preserving algorithms for **real-time fraud detection** systems.---
## **Conclusion:**
This **Fraud Detection Dataset** is a comprehensive resource for developing robust fraud detection models. With a variety of features capturing transaction behavior and the fraud indicator, it provides ample opportunities for **pattern recognition**, **anomaly detection**, and **predictive modeling**. This dataset is ideal for both **academic research** and **industry applications** aiming to enhance financial security through automated fraud detection.
---
## **Dependencies:**
Ensure you have the following Python libraries installed:
```bash
catboost
streamlit
pandas
numpy
joblib
streamlit_lottie
scikit-learn
```---
## **Demo Video**:
For an introduction to the fraud detection system, watch the demo video [here](https://youtu.be/qHkBchgEdTg?si=mCmb0Dm8TBo88reV).---
## **Author Information**
| **Name** | **Year** | **Position** |
|:---:|:---:|:---:|
| **Anidipta Pal** | 1st | Data Engineer, Data Analyst, ML Engineer |
| **Sagnik Basak** | 1st | Full Stack Developer |