An open API service indexing awesome lists of open source software.

Projects in Awesome Lists tagged with spark-cluster

A curated list of projects in awesome lists tagged with spark-cluster .

https://github.com/mgarralda/hadoop-spark-cluster

Repository containing Docker images for create a cluster Spark on Hadoop Yarn.

hadoop-hdfs spark spark-cluster spark-hadoop spark-hadoop-docker spark-yarn-docker

Last synced: 26 Apr 2025

https://github.com/aixhunter/spark-k8s-pod-template

Steps to deploy a Spark app to Kubernetes cluster using spark-submit or a pod template

k8s kubernetes pod spark spark-cluster spark-submit

Last synced: 08 May 2025

https://github.com/longnguyen010203/spark-processing-aws

👷🌇 Set up and build a big data processing pipeline with Apache Spark, 📦 AWS services (S3, EMR, EC2, IAM, VPC, Redshift) Terraform to setup the infrastructure and Integration Airflow to automate workflows🥊

apache-airflow apache-spark aws aws-ec2 aws-s3 aws-services cloud-computing data-pipeline emr-cluster iam pyspark redshift spark-cluster spark-master spark-worker terraform

Last synced: 11 Mar 2025

https://github.com/aimanamri/raspberry-pi4-hadoop-spark-cluster

This is a self-documentation of learning distributed data storage, parallel processing, and Linux OS using Apache Hadoop, Apache Spark and Raspbian OS. In this project, 3-node cluster will be setup using Raspberry Pi 4, install HDFS and run Spark processing jobs via YARN.

big-data distributed-storage hadoop-cluster hdfs parallel-processing pyspark raspberry-pi-4 spark-cluster spark-shell yarn

Last synced: 09 Apr 2025

https://github.com/kumarvna/terraform-azurerm-hdinsight

Terraform module to create managed, full-spectrum, open-source analytics service Azure HDInsight. This module creates Apache Hadoop, Apache Spark, Apache HBase, Interactive Query (Apache Hive LLAP) and Apache Kafka clusters.

apache-hive-cluster azure azure-hdinsight hadoop-cluster hadoop-filesystem hadoop-hdfs hbase-cluster hdinsight-cluster hdinsight-hadoop-cluster hdinsight-hbase-cluster hdinsight-interactive-query-cluster hdinsight-kafka-cluster hdinsight-spark-cluster kafka-cluster spark-cluster spark-clusters terraform terraform-module

Last synced: 14 Apr 2025

https://github.com/turnipdo/spark-standalone-cluster-setup

To facilitate the initial setup of Apache Spark, this repository provides a beginner-friendly, step-by-step guide on setting up a master node and two worker nodes.

python spark spark-cluster

Last synced: 12 Apr 2025

https://github.com/flaviostutz/spark-submit-scala

Spark submit extension from bde2020/spark-submit for Scala with SBT

bigdata sbt scala spark spark-cluster spark-submit

Last synced: 31 Mar 2025

https://github.com/minsusun/deploy-spark-cluster

configs for deploying the spark clusters on docker and k8s !!

docker docker-compose k8s spark-cluster

Last synced: 17 Mar 2025