https://github.com/danielstern/demo-data-pipeline
https://github.com/danielstern/demo-data-pipeline
Last synced: 4 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/danielstern/demo-data-pipeline
- Owner: danielstern
- Created: 2024-06-23T13:40:42.000Z (almost 2 years ago)
- Default Branch: master
- Last Pushed: 2024-06-23T13:45:22.000Z (almost 2 years ago)
- Last Synced: 2024-10-31T00:08:08.013Z (over 1 year ago)
- Language: Python
- Size: 1.95 KB
- Stars: 1
- Watchers: 1
- Forks: 2
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Data Pipeline Simulator
## Description
A simple application simulating a real-time data pipeline.
## Running the App
1. Start the server with `python3 server.py`
2. Start simulating data with `python3 simulator.py`. A new file will be created every 10 seconds until this method is stopped.
## Challenge Activities
1. Can you change the storage medium from the `tmp/` folder to a Postgres database running locally?
2. Can you update the simulator to add a randomly generated first and last name - and then update the data cleaner to mask or bucket that information?
3. Can you containerize the server and get it running on the cloud? (Beware cloud expenses.)
4. Can you change the storage medium again from a local database to a cloud native database running remotely? (e.g., AWS DynamoDB). Once again, be aware of any expenses you will incur, and delete the cloud resources once you are done this exercise.