https://github.com/notner/fast-app

Async ELT Pipeline
https://github.com/notner/fast-app

asyncio fastapi iceberg kafka pandas pydantic python3 spark

Last synced: 3 months ago
JSON representation

Async ELT Pipeline

Host: GitHub
URL: https://github.com/notner/fast-app
Owner: notner
Created: 2025-02-17T08:18:47.000Z (over 1 year ago)
Default Branch: main
Last Pushed: 2025-02-20T14:08:26.000Z (over 1 year ago)
Last Synced: 2025-02-20T15:24:07.181Z (over 1 year ago)
Topics: asyncio, fastapi, iceberg, kafka, pandas, pydantic, python3, spark
Language: Jupyter Notebook
Homepage:
Size: 649 KB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# Async ELT Pipeline

ELT Pipeline for ingesting raw data, scalable micro-service server apps for processing (via Kafka) and provide clean data downstream.

```code
Tech Stack:
* Python3
- FastAPI
- Pydantic
- Pandas
- Pyiceberg
- Pyspark
* DataLake
- Iceberg
* Databases:
- Redis
- PSQL
```

Warning
**THIS PROJECT IS CURRENTLY IN ALPHA:**

- This is a WIP and so some features might be broken.
- Backwards compatibility isnt garantueed.

## Design

![](docs/architecture.svg)

## Developer

To setup for development, first run `make dev-setup`

Next we need to download the .tsv data from IMDB (https://datasets.imdbws.com/), place files in `data` directory and then populate our database:
```bash
## Get Source Data
# Supported IMDB File Names and location
data/title.basics.tsv

## Populate Redis
# Start fixtures will launch docker/docker-compose.yml
make start-fixtures
# Start the FastAPI web process
make run-web
# Populate redis-database
curl http://127.0.0.1:8000/movies/populate
# check data
curl http://127.0.0.1:8000/movie/Batman
```
Run tests:
```bash
make run-tests
```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/notner/fast-app

Awesome Lists containing this project

README