Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/kennethleungty/Anomaly-Detection-Pipeline-Kedro
Anomaly Detection Pipeline with Isolation Forest model and Kedro framework
https://github.com/kennethleungty/Anomaly-Detection-Pipeline-Kedro
anomaly anomaly-detection credit-card credit-card-fraud data-science data-science-pipeline financial financial-data fraud fraud-detection kedro machine-learning machine-learning-pipeline ml mlops pipelines quantumblack
Last synced: about 2 months ago
JSON representation
Anomaly Detection Pipeline with Isolation Forest model and Kedro framework
- Host: GitHub
- URL: https://github.com/kennethleungty/Anomaly-Detection-Pipeline-Kedro
- Owner: kennethleungty
- Created: 2022-03-11T15:37:27.000Z (almost 3 years ago)
- Default Branch: main
- Last Pushed: 2022-12-27T02:43:58.000Z (almost 2 years ago)
- Last Synced: 2024-08-01T10:19:10.876Z (5 months ago)
- Topics: anomaly, anomaly-detection, credit-card, credit-card-fraud, data-science, data-science-pipeline, financial, financial-data, fraud, fraud-detection, kedro, machine-learning, machine-learning-pipeline, ml, mlops, pipelines, quantumblack
- Language: Python
- Homepage: https://neptune.ai/blog/data-science-pipelines-with-kedro
- Size: 239 KB
- Stars: 22
- Watchers: 2
- Forks: 7
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
- awesome-kedro - Anomaly Detection Pipeline with Kedro
README
# Building and Managing an Isolation Forest Anomaly Detection Pipeline with Kedro
## Overview
Anomaly (fraud) detection pipeline on credit card transaction data using Isolation Forest machine learning model and Kedro frameworkLink to article: https://neptune.ai/blog/data-science-pipelines-with-kedro
## Objective
Develop a data science pipeline to detect anomalous (fradulent) credit card transactions with the use of:
- **Isolation Forest** machine learning model - For unsupervised anomaly detection
- **Kedro** - An open-source Python framework for creating reproducible, maintainable, and modular data science code. This framework helps to accelerate data pipelining, enhance data science prototyping, and promote pipeline reproducibility.)## Motivation
- Explore how unsupervised anomaly detection works, and better understand the concept and implementation of isolation forest
- Leverage Kedro framework to optimally structure data science pipeline projects## Data
The [credit card transaction data](https://github.com/Fraud-Detection-Handbook/simulated-data-transformed) is obtained from the collaboration between Worldline and Machine Learning Group. It is a realistic simulation of real-world credit card transactions and has been designed to include complicated fraud detection issues.## General Pipeline Structure
![Alt text](/docs/images/01_DS_Pipeline_Overview.png?raw=true)## Anomaly Detection Pipeline Structure
![Alt text](/docs/images/05_Anomaly_Detection_Pipeline_Blueprint.png?raw=true)## Steps
1. Change path to project directory in command line - `cd C:/Anomaly-Detection-Pipeline-Kedro`
2. Initialize Conda virtual environment (create one if not done so) - `conda activate env_kedro`
3. Execute a pipeline run with `kedro run`Please see the [walkthrough article](https://neptune.ai/blog/data-science-pipelines-with-kedro) for details