https://github.com/paulescu/build-and-deploy-real-time-feature-pipeline
Develop and deploy a real-time feature pipeline in Python, using Bytewax 🐝 and Hopsworks Feature Store.
https://github.com/paulescu/build-and-deploy-real-time-feature-pipeline
bytewax feature-engineering hopsworks ml mlops python realtime streamlit
Last synced: about 2 months ago
JSON representation
Develop and deploy a real-time feature pipeline in Python, using Bytewax 🐝 and Hopsworks Feature Store.
- Host: GitHub
- URL: https://github.com/paulescu/build-and-deploy-real-time-feature-pipeline
- Owner: Paulescu
- License: mit
- Created: 2023-06-29T11:38:56.000Z (almost 2 years ago)
- Default Branch: main
- Last Pushed: 2023-07-04T09:17:05.000Z (almost 2 years ago)
- Last Synced: 2025-03-27T20:21:54.049Z (2 months ago)
- Topics: bytewax, feature-engineering, hopsworks, ml, mlops, python, realtime, streamlit
- Language: Python
- Homepage: https://www.realworldml.net/subscribe
- Size: 350 KB
- Stars: 134
- Watchers: 2
- Forks: 18
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
Hands-on MLOps
Build and Deploy a Real-Time Feature Pipeline
with Python 🐍⚡
![]()
#### Table of contents
* [What is a real-time feature pipeline?](#what-is-a-real-time-feature-pipeline)
* [Cool, but how can I implement one?](#cool-but-how-can-i-implement-one)
* [What is this repo about?](#what-is-this-repo-about)
* [Run the whole thing in 10 minutes](#run-the-whole-thing-in-10-minutes)
* [Wanna learn more real-time ML?](#wanna-learn-more-real-time-ml)## What is a real-time feature pipeline?
Machine Learning models are as good as the input features you feed at training and inference time.
And for many real-world applications, like financial trading, these features must be generated and served **as fast as possible**, so the ML system produces the best predictions possible.
Generating and serving features fast is what a **real-time feature pipeline** does.
![]()
## Cool, but how can I implement one?
Python alone is **not** a language designed for speed 🐢, which makes it unsuitable for real-time processing. Because of this, real-time feature pipelines were usually writen with Java-based tools like Apache Spark or Apache Flink.However, things are changing fast with the emergence of Rust 🦀 and libraries like **[Bytewax 🐝](https://github.com/bytewax/bytewax?utm_source=pau&utm_medium=partner&utm_content=github)** that expose a pure Python API on top of a highly-efficient language like Rust.
So you get the best from both worlds.
- Rust's speed and performance, plus
- Python-rich ecosystem of libraries.So you can develop highly performant and scalable real-time pipelines, leveraging top-notch Python libraries.
🦀 + 🐝 + 🐍 = ⚡
## What is this repo about?
In this repository you will learn how to develop and deploy a real-time feature pipeline in 100% Python that
* **fetches** real-time trade data (aka raw data) from the [Coinbase Websocket API](https://help.coinbase.com/en/cloud/websocket-feeds/exchange)
* **transforms** trade data into OHLC data (aka features) in real-time using **[Bytewax](https://github.com/bytewax/bytewax?utm_source=pau&utm_medium=partner&utm_content=github)**, and
* **stores** these features in the [Hopsworks]() Feature StoreYou will also build a dashboard using Bokeh and Streamlit to visualize the final features, in real-time.
## Run the whole thing in 10 minutes
1. Create a Python virtual environment with the project dependencies with
```
$ make init
```2. Set your Hopsworks API key and project name variables in `set_environment_variables_template.sh`, rename the file and run it (sign up for free at [hospworks.ai](https://app.hopsworks.ai/?utm_source=pau&utm_medium=pau&utm_content=github) to get these 2 values)
```
$ . ./set_environment_variables.sh
```3. To run the feature pipeline locally
```
$ make run
```4. To spin up a Streamlit dashboard to visualize the data in real-time
```
$ make frontend
```5. To run the feature pipeline on an AWS EC2 instance you first need to have an AWS account and the `aws-cli` tool installed in your local system. Then run the following command to deploy your feature pipeline onto an EC2 instance
```
$ make deploy
```6. Feature pipeline logs are send to AWS CloudWatch. Run the following command to grab the URL where you can see the logs.
```
$ make info
```7. To shutdown the feature pipeline on AWS and free resources run
```
$ make undeploy
```
## Wanna learn more Real-Time ML?
I am preparing a new hands-on tutorial where you will learn to buld a complete real-time ML system, from A to Z.
**[➡️ Subscribe to The Real-World ML Newsletter](https://paulabartabajo.substack.com/)** to be notified when the tutorial is out.