https://github.com/senzmaki/nyakamwizi
A credit card fraud detection machine learning model
https://github.com/senzmaki/nyakamwizi
data-science data-science-projects decision-tree decision-tree-classifier joblib jupyter-notebook machine-learning numpy pandas python scikit-learn
Last synced: about 1 month ago
JSON representation
A credit card fraud detection machine learning model
- Host: GitHub
- URL: https://github.com/senzmaki/nyakamwizi
- Owner: SenZmaKi
- Created: 2023-05-13T13:24:12.000Z (over 2 years ago)
- Default Branch: master
- Last Pushed: 2023-10-28T03:50:40.000Z (almost 2 years ago)
- Last Synced: 2025-03-10T14:11:24.089Z (7 months ago)
- Topics: data-science, data-science-projects, decision-tree, decision-tree-classifier, joblib, jupyter-notebook, machine-learning, numpy, pandas, python, scikit-learn
- Language: Jupyter Notebook
- Homepage: https://youtu.be/dQw4w9WgXcQ
- Size: 2.37 MB
- Stars: 2
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Introduction
NyakaMwizi is a machine learning model built to detect potentially fraudulent transactionsThe [dataset](https://www.kaggle.com/datasets/kartik2112/fraud-detection) used contains 1.3M instances and 23 features
# Table of Contents
1. [How to test out the model](#how-to-test-out-the-model)
2. [Visual Insights](#visual-insights)
3. [Final Model Performance](#final-model-performance)# How to test out the model
Ensure you have [Python 3.11](https://www.python.org/downloads/release/python-3111) and [Git](https://github.com/git-guides/install-git) installed.
Open a terminal and run the following commands.
1. **Set everything up.**
- Linux/Mac
```
git clone https://github.com/SenZmaKi/NyakaMwizi && cd NyakaMwizi && python3 -m venv .venv && source .venv/bin/activate && pip install -r requirements.txt
```
- Windows (Command Prompt)
```
git clone https://github.com/SenZmaKi/NyakaMwizi && cd NyakaMwizi && python -m venv .venv && .venv\Scripts\activate && pip install -r requirements.txt
```2. **Test the model.**
```
python test_model.py
```# Visual Insights
These are insights I gained as I was exploring the data-set with graphs and computationsThey are in order of hierachy
## Time
- The time bracket under which the most fraudulent transactions occured is between 10:00PM and 4:00AM
### Graph for frauds

### Graph for non frauds
## Amount
- Contrary to what you'd expect, most fraudulent transactions didn't involve exorbitant amounts of money
- Instead they involved both reasonably large amounts of money e.g 30k and average amounts of money
### Graph for frauds

### Graph for non frauds
## Categories
- Certain transaction categories appeared to be way more fraudulent, to be specific category 4 and 11
### Graph for frauds

### Graph for non frauds
## Age
- The age brackets that involved the most fraudulent transactions is 30 to 70
- But the same can be said for non-fraudulent transactions so this insight may be a misinterpretation
### Graph for frauds

### Graph for non frauds
## Longitude and latitude
- Some areas on the scatter matrix seemed to experience more fraudulent transactions
### Scatter matrix for frauds

### Scatter matrix for non frauds
## Job
- Specific jobs experienced more fraudulent transactions e.g, job 300
- But this behaviour is inline with what is observed with non-fraudulent transactions so it may also be another misinterpretation
### Graph for frauds

### Graph for non frauds
# Final Model Performance
- [Model](https://github.com/SenZmaKi/NyakaMwizi/blob/master/model.pkl): DecisionTreeClassifier
- Precision: 82.88%
- Recall: 17.12%