https://github.com/govinddixit/credit-card-fraud-detection
https://github.com/govinddixit/credit-card-fraud-detection
Last synced: 2 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/govinddixit/credit-card-fraud-detection
- Owner: GOVINDDIXIT
- License: mit
- Created: 2019-07-31T15:16:33.000Z (almost 6 years ago)
- Default Branch: master
- Last Pushed: 2019-08-03T18:27:19.000Z (almost 6 years ago)
- Last Synced: 2025-01-28T18:37:38.968Z (4 months ago)
- Language: Jupyter Notebook
- Size: 85.9 KB
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Credit Card Fraud Detection
*In this project, I have used some machine learning algorithms to detect Credit Card fraudlent transactions. Using a dataset of nearly 285000 credit card transactions and multiple unsupervised anomaly detection algorithms, I am able to identify transactions with a high probability of being credit card fraud.*
***Dataset*** can be found [here](https://www.kaggle.com/mlg-ulb/creditcardfraud)
**I have used the following two machine learning algorithms:**
**1. Local Outlier Factor (LOF)**
The anomaly score of each sample is called Local Outlier Factor. It measures the local deviation of density of a given sample with respect to its neighbors. It is local in that the anomaly score depends on how isolated the object is with respect to the surrounding neighborhood.**2. Isolation Forest Algorithm**
The IsolationForest ‘isolates’ observations by randomly selecting a feature and then randomly selecting a split value between the maximum and minimum values of the selected feature.Since recursive partitioning can be represented by a tree structure, the number of splittings required to isolate a sample is equivalent to the path length from the root node to the terminating node.
This path length, averaged over a forest of such random trees, is a measure of normality and our decision function.
Random partitioning produces noticeably shorter paths for anomalies. Hence, when a forest of random trees collectively produce shorter path lengths for particular samples, they are highly likely to be anomalies.
Furthermore, using metrics suchs as precision, recall, and F1-scores,I have investigated the classification accuracy for these algorithms.
### Plots
- *Plot of histograms of each parameter of the dataset*
---------------------
- *Plot of Correlation matrix*
---------------------