An open API service indexing awesome lists of open source software.

Projects in Awesome Lists tagged with hadoop-cluster

A curated list of projects in awesome lists tagged with hadoop-cluster .

https://github.com/groda/big_data

Tutorials on Big Data essentials: Hadoop, MapReduce, Spark. Explore a variety of tutorials and demonstrations on Big Data technologies, primarily in the form of Jupyter notebooks. Most notebooks are self-contained and live—ready to run with a click.

apache-sedona apache-spark big-data bigdata bigtop docker gutenberg-ebooks hadoop hadoop-cluster hadoop-hdfs hadoop-mapreduce jupyter-notebook mapreduce mapreduce-bash mrjob pyspark spark spark-sql testdfsio

Last synced: 06 Apr 2025

https://github.com/impetus/jumbune

Jumbune, an open source BigData APM & Data Quality Management Platform for Data Clouds. Enterprise feature offering is available at http://jumbune.com. More details of open source offering are at,

aiops apm cluster-monitoring data-analysis data-quality developer-tools devops-tools hadoop hadoop-cluster hadoop-monitor hadoop-monitoring monitoring-tool optimization-framework yarn yarn-hadoop-cluster

Last synced: 10 Apr 2025

https://github.com/hyeonsangjeon/dataplatform

Hadoop3.2 single/cluster mode with web terminal gotty, spark, jupyter pyspark, hive, eco etc.

hadoop hadoop-cluster hadoop-docker hadoop-ecosystem hadoop-mapreduce hive pyspark-notebook zeppelin-notebook

Last synced: 17 Nov 2024

https://github.com/manuparra/masterdegreecc_practice

Taller del Máster Profesional de Informática UGR. Curso de CloudComputing.

cloudcomputing cluster docker docker-cluster docker-container hadoop hadoop-cluster hdfs opennebula practice virtual-machine

Last synced: 12 Apr 2025

https://github.com/hxndev/finding-average-temperature-of-each-year-using-hadoop-hdfs

In this task, we had to calculate the average temperature for each year from the given dataset using Hadoop HDFS. We had to create a MapReduce function to perform this task.

average-calculator code hadoop hadoop-cluster hadoop-filesystem hadoop-hdfs hadoop-mapreduce java mapreduce mapreduce-java

Last synced: 31 Mar 2025

https://github.com/chriskery/hadoop-operator

Kubernetes operator for managing the lifecycle of Apache Hadoop Yarn Tasks on Kubernetes.

apache-hadoop hadoop hadoop-cluster k8s kubernetes kubernetes-operator

Last synced: 18 Mar 2025

https://github.com/mitre/clusterconf

Manage Hadoop cluster configurations

hadoop hadoop-cluster r r-package rstats

Last synced: 21 Apr 2025

https://github.com/mitre/webhdfs

Interface with WebHDFS Service in a Cluster-Neutral Way

hadoop-cluster r r-package rstats webhdfs

Last synced: 21 Apr 2025

https://github.com/conema/spark-terraform

This project create an Hadoop and Spark cluster on Amazon AWS with Terraform

aws cluster hadoop hadoop-cluster hcl spark spark-clusters terraform

Last synced: 20 Nov 2024

https://github.com/codito/hadoop-expt

Experiments with Hadoop cluster setups in Docker

docker docker-compose hadoop hadoop-cluster hadoop-docker

Last synced: 10 Nov 2024

https://github.com/elaaatif/jpeg-and-jpeg2000-compression-on-multi-node-cluster-using-hadoop-and-spark

Big Data technologies can be leveraged for efficient, distributed image compression using JPEG2000 (Spark) and JPEG (MapReduce).

cluster hadoop hadoop-cluster hadoop-hdfs hadoop-mapreduce image-compression spark

Last synced: 03 Apr 2025

https://github.com/seunggihong/hadoop-install-guide

Guide to installing a Hadoop and Spark on an Oracle virtual machine.

hadoop hadoop-cluster pyspark spark virtualbox

Last synced: 06 Apr 2025

https://github.com/akshayavb99/ansible-examples

The repository contains all the Playbooks and other files used to work with different applications for Ansible

ansible ansible-playbooks docker dynamic-inventory-aws explanation hadoop-cluster linux-scripting loadbalancer rhel8 webserver webserver-setup webservers yum

Last synced: 04 Mar 2025

https://github.com/aimanamri/raspberry-pi4-hadoop-spark-cluster

This is a self-documentation of learning distributed data storage, parallel processing, and Linux OS using Apache Hadoop, Apache Spark and Raspbian OS. In this project, 3-node cluster will be setup using Raspberry Pi 4, install HDFS and run Spark processing jobs via YARN.

big-data distributed-storage hadoop-cluster hdfs parallel-processing pyspark raspberry-pi-4 spark-cluster spark-shell yarn

Last synced: 09 Apr 2025

https://github.com/akaliutau/hadoop-cluster

Batch data processing on the dockerized Hadoop cluster

batch-processing hadoop-cluster hdf5 hdfs java mapreduce

Last synced: 28 Feb 2025

https://github.com/yjham2002/hadoop_clustering

:book: Apache Hadoop Based Clustering Tutorial

hadoop hadoop-cluster mac-osx mapreduce

Last synced: 30 Mar 2025

https://github.com/jaynamm/docker-hadoop-cluster

Hadoop Cluster For Docker (-ing)

docker docker-compose hadoop hadoop-cluster

Last synced: 02 Apr 2025

https://github.com/muhamedhekal/hadoop-ha-cluster-on-docker

Hadoop3-HA-Docker is a production-ready, fault-tolerant Hadoop cluster deployed with Docker Compose. It automates the setup of a fully distributed Hadoop ecosystem with high availability (HA) features, designed for reliability, scalability, and real-world big data workloads

docker docker-compose hadoop hadoop-cluster hdfs mapreduce yarn

Last synced: 15 Apr 2025

https://github.com/avojak/aws-hadoop-cluster

Infrastructure and configuration-as-code for standing up a Hadoop cluster in AWS

ansible aws aws-ec2 configuration-as-code hadoop hadoop-cluster infrastructure-as-code terraform

Last synced: 30 Mar 2025

https://github.com/mariam-iftikhar/bigdata

The repository showcases a series of exercises and projects focused on big data processing using Hadoop, HBase, Hive, and Spark with Python. Hosted on AWS EMR, these projects demonstrate efficient data handling and processing techniques, leveraging the power of cloud computing to tackle complex data challenges.

apache-spark awsec2 awsemr hadoop-cluster hadoop-mapreduce hbase hiveql

Last synced: 16 Nov 2024

https://github.com/xpcosmos/data-lake-prime

This project aims to simulate and configure a Distributed File System using Hadoop HDFS. For this project, 3 machines were created: 1 Master Node and 2 Worker Nodes.

hadoop hadoop-cluster hadoop-hdfs hdfs network

Last synced: 04 Mar 2025

https://github.com/mariam-iftikhar/bigdataprojects

The repository showcases a series of exercises and projects focused on big data processing using Hadoop, HBase, Hive, and Spark with Python. Hosted on AWS EMR, these projects demonstrate efficient data handling and processing techniques, leveraging the power of cloud computing to tackle complex data challenges.

apache-spark awsec2 awsemr hadoop-cluster hadoop-mapreduce hbase hiveql

Last synced: 06 Mar 2025

https://github.com/kumarvna/terraform-azurerm-hdinsight

Terraform module to create managed, full-spectrum, open-source analytics service Azure HDInsight. This module creates Apache Hadoop, Apache Spark, Apache HBase, Interactive Query (Apache Hive LLAP) and Apache Kafka clusters.

apache-hive-cluster azure azure-hdinsight hadoop-cluster hadoop-filesystem hadoop-hdfs hbase-cluster hdinsight-cluster hdinsight-hadoop-cluster hdinsight-hbase-cluster hdinsight-interactive-query-cluster hdinsight-kafka-cluster hdinsight-spark-cluster kafka-cluster spark-cluster spark-clusters terraform terraform-module

Last synced: 14 Apr 2025