https://github.com/r-mahesh45/fraud-detection-and-sales-analysis-using-random-forest
This project uses Random Forest to classify fraud risk based on taxable income and analyze key factors driving high sales for a cloth manufacturing company.
https://github.com/r-mahesh45/fraud-detection-and-sales-analysis-using-random-forest
classification data-visualization extract-transform-load python3 random-forest
Last synced: 7 months ago
JSON representation
This project uses Random Forest to classify fraud risk based on taxable income and analyze key factors driving high sales for a cloth manufacturing company.
- Host: GitHub
- URL: https://github.com/r-mahesh45/fraud-detection-and-sales-analysis-using-random-forest
- Owner: R-Mahesh45
- Created: 2024-03-07T10:27:26.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2025-01-04T13:28:28.000Z (9 months ago)
- Last Synced: 2025-01-30T07:16:11.866Z (8 months ago)
- Topics: classification, data-visualization, extract-transform-load, python3, random-forest
- Language: Jupyter Notebook
- Homepage:
- Size: 301 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Fraud Detection and Sales Analysis using Random Forest
## Overview
This project focuses on building machine learning models using Random Forest and other ensemble methods for two key tasks:
1. **Fraud Detection**: Classify individuals as "Risky" or "Good" based on taxable income.
2. **Sales Analysis**: Analyze the attributes that contribute to high sales for a cloth manufacturing company.Random Forest is utilized as the primary algorithm, along with Logistic Regression, Decision Trees, and SVM for comparison.
---
## Table of Contents
- [About the Data](#about-the-data)
- [Project Features](#project-features)
- [Modeling Techniques](#modeling-techniques)
- [Results](#results)
- [How to Run](#how-to-run)
- [Technologies Used](#technologies-used)
- [Contributing](#contributing)
- [License](#license)---
## About the Data
### Fraud Detection Dataset:
- **Taxable Income**: Used to classify individuals into "Risky" (≤ 30000) or "Good".
- Additional attributes include: age, work class, education, occupation, and more.### Sales Dataset:
- **Target Variable**: `Sales` (categorized as High/Low based on threshold values).
- **Features**: Includes Competitor Price, Income, Advertising, Population, Price, Shelf Location, Age, Education, Urban, and US.---
## Project Features
- **Fraud Detection**:
- Classification using Random Forest.
- Ensemble methods including Bagging and Voting Classifiers.
- Evaluation using cross-validation.- **Sales Analysis**:
- Random Forest model to identify key factors contributing to sales.
- Exploratory Data Analysis to derive insights.---
## Modeling Techniques
1. **Fraud Detection**:
- Random Forest Classifier to label individuals as "Risky" or "Good".
- Ensemble Voting Classifier with Logistic Regression, Decision Trees, and SVM.
- Cross-validation to evaluate performance.2. **Sales Analysis**:
- Random Forest to analyze features impacting high sales.
- Categorized `Sales` into High and Low.---
## Results
### Fraud Detection:
- Cross-validation results:
```
array([1. , 0.98333333, 0.96666667, 0.93333333, 1. ,
1. , 0.98333333, 0.98333333, 1. , 1. ])
```
- Mean accuracy: **98.5%**### Sales Analysis:
- Cross-validation results:
```
array([0.125, 0.1 , 0.125, 0.225, 0.2 , 0.175, 0.225, 0.075, 0.125, 0.15 ])
```
- Mean accuracy: **15.25%**---
## How to Run
1. Clone the repository:
```bash
git clone https://github.com/R-Mahesh45/Fraud-Detection-and-Sales-Analysis-using-Random-Forest.git
cd Fraud-Detection-and-Sales-Analysis-using-Random-Forest
```2. Install dependencies:
```bash
pip install -r requirements.txt
```3. Run the notebook or script:
```bash
python fraud_detection.py
python sales_analysis.py
```---
## Technologies Used
- **Languages**: Python
- **Libraries**:
- pandas, numpy
- scikit-learn
- matplotlib, seaborn---
## Contributing
Contributions are welcome! Please fork the repository and submit a pull request.---
## Contact
For any queries, please reach out to:
**Mahesh Anil Rathod**
**Email**: maheshrathod2236@gmail.com