Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
Projects in Awesome Lists by dimajix
A curated list of projects in awesome lists by dimajix .
https://github.com/dimajix/flowman
Flowman is an ETL framework powered by Apache Spark. With its declarative approach, Flowman simplifies the development of complex data pipelines.
apache-spark big-data bigdata data-engineering etl flowman hadoop scala spark sql
Last synced: 21 Dec 2024
https://github.com/dimajix/spark-training
Repository used for Spark Trainings
hadoop hadoop-training hive pyspark python scala spark spark-ml spark-streaming spark-training sqoop
Last synced: 09 Nov 2024
https://github.com/dimajix/terraform-emr-training
Terraform script for launching multiple EMR clusters for training purposes.
Last synced: 09 Nov 2024
https://github.com/dimajix/pyspark-advanced
Jupyter Notebooks for PySpark Advanced Workshop
Last synced: 09 Nov 2024
https://github.com/dimajix/docker-spark
Repository for building Docker containers for Spark
Last synced: 09 Nov 2024
https://github.com/dimajix/pyspark-ml-taxis
Jupyter Notebooks for PySpark Workshop using NYC Taxi Trip data
Last synced: 09 Nov 2024
https://github.com/dimajix/docker-hive
Docker container running the Hive Metastore
Last synced: 09 Nov 2024
https://github.com/dimajix/docker-hadoop
Repository for building Docker containers for Hadoop
Last synced: 09 Nov 2024
https://github.com/dimajix/vagrant-druid
cluster vagrant vagrantfile virtual
Last synced: 30 Sep 2024
https://github.com/dimajix/spark-data-engineering
Training notebooks for Data Engineering with Spark
Last synced: 09 Nov 2024
https://github.com/dimajix/flowman-maven
Maven plugin for streamlining the development workflow with Flowman
Last synced: 09 Nov 2024