Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

Projects in Awesome Lists by dimajix

A curated list of projects in awesome lists by dimajix .

https://github.com/dimajix/flowman

Flowman is an ETL framework powered by Apache Spark. With its declarative approach, Flowman simplifies the development of complex data pipelines.

apache-spark big-data bigdata data-engineering etl flowman hadoop scala spark sql

Last synced: 21 Dec 2024

https://github.com/dimajix/docker-jupyter-spark

Docker image for Jupyter notebooks with PySpark

docker hadoop jupyter pyspark python spark

Last synced: 09 Nov 2024

https://github.com/dimajix/terraform-emr-training

Terraform script for launching multiple EMR clusters for training purposes.

Last synced: 09 Nov 2024

https://github.com/dimajix/pyspark-advanced

Jupyter Notebooks for PySpark Advanced Workshop

Last synced: 09 Nov 2024

https://github.com/dimajix/docker-spark

Repository for building Docker containers for Spark

cluster docker hadoop spark

Last synced: 09 Nov 2024

https://github.com/dimajix/docker-alluxio

Docker image for Apache Alluxio

alluxio docker

Last synced: 09 Nov 2024

https://github.com/dimajix/pyspark-ml-taxis

Jupyter Notebooks for PySpark Workshop using NYC Taxi Trip data

Last synced: 09 Nov 2024

https://github.com/dimajix/docker-hive

Docker container running the Hive Metastore

docker hadoop hive

Last synced: 09 Nov 2024

https://github.com/dimajix/docker-miniconda

Miniconda base image

anaconda docker python

Last synced: 09 Nov 2024

https://github.com/dimajix/docker-hadoop

Repository for building Docker containers for Hadoop

docker hadoop

Last synced: 09 Nov 2024

https://github.com/dimajix/docker-jupyterhub

Docker image with jupyterhub

Last synced: 09 Nov 2024

https://github.com/dimajix/flowman-tutorial

Tutorial for Flowman

Last synced: 09 Nov 2024

https://github.com/dimajix/flowman-example

Example project for Flowman

Last synced: 09 Nov 2024

https://github.com/dimajix/spark-data-engineering

Training notebooks for Data Engineering with Spark

Last synced: 09 Nov 2024

https://github.com/dimajix/flowman-maven

Maven plugin for streamlining the development workflow with Flowman

Last synced: 09 Nov 2024

https://github.com/dimajix/hadoop-training

Source Code for Hadoop Training

hadoop hadoop-training spark

Last synced: 09 Nov 2024