https://github.com/logicalclocks/real-time-fraud-demo
Real Time fraud detection demo with Hopsworks
https://github.com/logicalclocks/real-time-fraud-demo
Last synced: 10 months ago
JSON representation
Real Time fraud detection demo with Hopsworks
- Host: GitHub
- URL: https://github.com/logicalclocks/real-time-fraud-demo
- Owner: logicalclocks
- License: apache-2.0
- Created: 2024-10-17T17:18:40.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2024-10-18T06:56:11.000Z (over 1 year ago)
- Last Synced: 2025-02-26T23:37:38.256Z (over 1 year ago)
- Language: Jupyter Notebook
- Size: 85.9 KB
- Stars: 0
- Watchers: 3
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Hopsworks real-time credit card fraud detection tutorial
## Introduction
In this guide we will demonstrate how to build a real-time machine learning system detecting credit card fraud.
The guide is divided into the following sections:
- **Data Generation**: Generate a sample of credit card user profiles and transactions
- **Feature Pipeline**: Compute the features and create the feature groups
- **Training Pipeline**: Assemble the training dataset, train and validate the model
- **Inference Pipeline**: Deploy the model and make requests
### Data Generation
Run the `data_generation/generate_historical_data.ipynb` notebook to generate 2 CSV files:
- `historical_transactions.csv`: Containing historical transactions
- `kyc.csv`: Containing informations about the different accounts
You can save the above csv files in Hopsworks or S3.
Please note that you will have to adjust the location of the CSV files in the feature pipelines notebooks
### Feature Pipeline
The feature pipeline directory contains the code to compute the different feature groups and populate the feature store.
While the first run is static, you can use the data generator above to generate new data and update the feature groups (in real-time)
I suggest you execute the notebooks in the following order:
#### Profiles
Uses the data in the `kyc.csv` file to create the necessary profiles features.
#### Transactions
Uses the data in the `historical_transactions.csv` file to create the necessary transactions features.
#### Profiles Last Transactions
Uses the data in the `historical_transactions.csv` file to compute the time and location of the last transaction for each account
### Training Pipeline
The training pipeline contains the code to join the different features together, generate a training dataset, train a model and registered with the feature store.
- `fraud_model_fv.ipynb`: This notebook joins the necessary features together and register the feature view. It also generates a training dataset split by time.
- `model_training,ipynb`: This notebook trains the model on the training dataset and registers the model with the Hopsworks model registry.
### Inference Pipeline
This creates a KServe based deployment on Hopsworks. It deploys the model and provide and endpoint where you can send predictions. While you run your own KServe deployment, you can extract the code to fetch the data from the online feature store and compute the on-demand features.