https://github.com/makism/datastack-playground
A datastack playground; includes Spark, Kafka, Airbyte, etc.
https://github.com/makism/datastack-playground
apache-airflow apache-spark deltalake minio
Last synced: 7 months ago
JSON representation
A datastack playground; includes Spark, Kafka, Airbyte, etc.
- Host: GitHub
- URL: https://github.com/makism/datastack-playground
- Owner: makism
- License: mit
- Created: 2022-12-19T17:44:50.000Z (almost 3 years ago)
- Default Branch: main
- Last Pushed: 2023-10-04T19:27:22.000Z (about 2 years ago)
- Last Synced: 2025-03-08T19:05:59.032Z (7 months ago)
- Topics: apache-airflow, apache-spark, deltalake, minio
- Language: Jupyter Notebook
- Homepage:
- Size: 55.7 KB
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# datastack-playground
## General
A playgound for Apache Spark with Minio support and Delta Lake integration. Additional tools include Airbyte and Lightdash.
## Getting Started
### Building
Run the following command to build the Docker images:
```bash
./build.sh
```### Running
Now, bring up the cluster with:
```bash
docker-compose up
```## Extras
### Airbyte
Clone and run the Airbyte repo locally; follow the instructions at https://docs.airbyte.com/deploying-airbyte/local-deployment.
#### CLI
You may need to install the Airbyte CLI. See https://docs.airbyte.com/understanding-airbyte/airbyte-cli and https://github.com/airbytehq/airbyte/blob/master/octavia-cli/README.md
### Lightdash
Clone and run the Lightdash repo locally; follow the instructions at https://docs.lightdash.com/getting-started/quickstart.