An open API service indexing awesome lists of open source software.

https://github.com/banickn/dagster-iceberg


https://github.com/banickn/dagster-iceberg

dagster data-engineering

Last synced: 7 days ago
JSON representation

Awesome Lists containing this project

README

          

# Dagster-Iceberg project

This is a project to investigate how to set up a modern data toolstack with Dagster, Apache Iceberg, Azure and DuckDB or Daft.

## Get started

- Create a **.env** in fab-data/ like this:
```
AZURE_CONNECTION_STRING = ""
AZURE_BRONZE_CONTAINER_NAME = ""
AZURE_SILVER_CONTAINER_NAME = ""
AZURE_GOLD_CONTAINER_NAME = ""
AZURE_STORAGE_ACCOUNT_NAME = ""
AZURE_STORAGE_ACCOUNT_KEY = ""
```

- Install python modules.
TODO.

- Start dagster to run **setup_silver** and **setup_gold assets**.
These jobs create local sqlite Iceberg catalogs and the namespaces/tables in Azure.

- Run **fake_data.py** to create fake semiconductor manufacturing data.
These json files will get loaded automatically into an Azure container as raw data if the sensor is activated in Dagster.

- Running **write_silver_fabdata** and **write_gold_fabreport** loads the data into Iceberg tables and execute some basic aggregations for the gold layer.

- With "**streamlit run fab_report.py**" you can start a simple Streamlit report dashboard that uses the gold layer.

## Architecture

```mermaid
graph TD
subgraph Data Sources
Batch[Batch Sources]
Stream[Streaming Sources]
end
subgraph Orchestrator
direction TB
Dagster[Dagster]
end
subgraph Visualization
direction TB
Streamlit[Streamlit]
end
subgraph Data Lakehouse
direction LR
Bronze[**Bronze Layer**
Raw data
JSON]
Silver[**Silver Layer**
Cleaned, Augmented Data
Apache Iceberg]
Gold[**Gold Layer**
Aggregates
Apache Iceberg]
end

Batch --> Bronze
Stream --> Bronze
Bronze --> Silver
Silver --> Gold
Streamlit --> Gold
Dagster --> Bronze
Dagster --> Silver
Dagster --> Gold
style Bronze fill:#CE8946,stroke:#333,stroke-width:2px
style Silver fill:#C0C0C0,stroke:#333,stroke-width:2px
style Gold fill:#FFD700,stroke:#333,stroke-width:2px
style Dagster fill:#5eb1ef,stroke:#333,stroke-width:2px

```

## Contributing
Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.

Please make sure to update tests as appropriate.