https://github.com/notner/fast-app
Async ELT Pipeline
https://github.com/notner/fast-app
asyncio fastapi iceberg kafka pandas pydantic python3 spark
Last synced: 3 months ago
JSON representation
Async ELT Pipeline
- Host: GitHub
- URL: https://github.com/notner/fast-app
- Owner: notner
- Created: 2025-02-17T08:18:47.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2025-02-20T14:08:26.000Z (over 1 year ago)
- Last Synced: 2025-02-20T15:24:07.181Z (over 1 year ago)
- Topics: asyncio, fastapi, iceberg, kafka, pandas, pydantic, python3, spark
- Language: Jupyter Notebook
- Homepage:
- Size: 649 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Async ELT Pipeline
ELT Pipeline for ingesting raw data, scalable micro-service server apps for processing (via Kafka) and provide clean data downstream.
```code
Tech Stack:
* Python3
- FastAPI
- Pydantic
- Pandas
- Pyiceberg
- Pyspark
* DataLake
- Iceberg
* Databases:
- Redis
- PSQL
```
Warning
**THIS PROJECT IS CURRENTLY IN ALPHA:**
- This is a WIP and so some features might be broken.
- Backwards compatibility isnt garantueed.
## Design

## Developer
To setup for development, first run `make dev-setup`
Next we need to download the .tsv data from IMDB (https://datasets.imdbws.com/), place files in `data` directory and then populate our database:
```bash
## Get Source Data
# Supported IMDB File Names and location
data/title.basics.tsv
## Populate Redis
# Start fixtures will launch docker/docker-compose.yml
make start-fixtures
# Start the FastAPI web process
make run-web
# Populate redis-database
curl http://127.0.0.1:8000/movies/populate
# check data
curl http://127.0.0.1:8000/movie/Batman
```
Run tests:
```bash
make run-tests
```