Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/kennethleungty/anomaly-detection-pipeline-kedro
Anomaly Detection Pipeline with Isolation Forest model and Kedro framework
https://github.com/kennethleungty/anomaly-detection-pipeline-kedro
anomaly anomaly-detection credit-card credit-card-fraud data-science data-science-pipeline financial financial-data fraud fraud-detection kedro machine-learning machine-learning-pipeline ml mlops pipelines quantumblack
Last synced: 3 months ago
JSON representation
Anomaly Detection Pipeline with Isolation Forest model and Kedro framework
- Host: GitHub
- URL: https://github.com/kennethleungty/anomaly-detection-pipeline-kedro
- Owner: kennethleungty
- Created: 2022-03-11T15:37:27.000Z (almost 3 years ago)
- Default Branch: main
- Last Pushed: 2022-12-27T02:43:58.000Z (about 2 years ago)
- Last Synced: 2024-10-29T05:34:50.288Z (4 months ago)
- Topics: anomaly, anomaly-detection, credit-card, credit-card-fraud, data-science, data-science-pipeline, financial, financial-data, fraud, fraud-detection, kedro, machine-learning, machine-learning-pipeline, ml, mlops, pipelines, quantumblack
- Language: Python
- Homepage: https://neptune.ai/blog/data-science-pipelines-with-kedro
- Size: 239 KB
- Stars: 23
- Watchers: 2
- Forks: 7
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Building and Managing an Isolation Forest Anomaly Detection Pipeline with Kedro
## Overview
Anomaly (fraud) detection pipeline on credit card transaction data using Isolation Forest machine learning model and Kedro frameworkLink to article: https://neptune.ai/blog/data-science-pipelines-with-kedro
## Objective
Develop a data science pipeline to detect anomalous (fradulent) credit card transactions with the use of:
- **Isolation Forest** machine learning model - For unsupervised anomaly detection
- **Kedro** - An open-source Python framework for creating reproducible, maintainable, and modular data science code. This framework helps to accelerate data pipelining, enhance data science prototyping, and promote pipeline reproducibility.)## Motivation
- Explore how unsupervised anomaly detection works, and better understand the concept and implementation of isolation forest
- Leverage Kedro framework to optimally structure data science pipeline projects## Data
The [credit card transaction data](https://github.com/Fraud-Detection-Handbook/simulated-data-transformed) is obtained from the collaboration between Worldline and Machine Learning Group. It is a realistic simulation of real-world credit card transactions and has been designed to include complicated fraud detection issues.## General Pipeline Structure
data:image/s3,"s3://crabby-images/66dfa/66dfa7ba78923afb7d476eb2d0eda10e18074f5f" alt="Alt text"## Anomaly Detection Pipeline Structure
data:image/s3,"s3://crabby-images/f2d3f/f2d3f91db022ea13ec80bf3bb154a787548ca934" alt="Alt text"## Steps
1. Change path to project directory in command line - `cd C:/Anomaly-Detection-Pipeline-Kedro`
2. Initialize Conda virtual environment (create one if not done so) - `conda activate env_kedro`
3. Execute a pipeline run with `kedro run`Please see the [walkthrough article](https://neptune.ai/blog/data-science-pipelines-with-kedro) for details