Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
Projects in Awesome Lists tagged with hadoop-cluster
A curated list of projects in awesome lists tagged with hadoop-cluster .
https://github.com/big-data-europe/docker-hadoop
Apache Hadoop docker image
docker docker-hadoop hadoop hadoop-cluster hadoop-docker
Last synced: 20 Dec 2024
https://github.com/impetus/jumbune
Jumbune, an open source BigData APM & Data Quality Management Platform for Data Clouds. Enterprise feature offering is available at http://jumbune.com. More details of open source offering are at,
aiops apm cluster-monitoring data-analysis data-quality developer-tools devops-tools hadoop hadoop-cluster hadoop-monitor hadoop-monitoring monitoring-tool optimization-framework yarn yarn-hadoop-cluster
Last synced: 14 Nov 2024
https://github.com/groda/big_data
Tutorials on Big Data essentials: Hadoop, MapReduce, Spark.
apache-sedona apache-spark big-data bigdata bigtop docker gutenberg-ebooks hadoop hadoop-cluster hadoop-hdfs hadoop-mapreduce jupyter-notebook mapreduce mapreduce-bash mrjob pyspark spark spark-sql testdfsio
Last synced: 17 Dec 2024
https://github.com/wittline/apache-spark-docker
Dockerizing an Apache Spark Standalone Cluster
apache-spark dataengineer dataengineering docker docker-compose hadoop-cluster hadoop-docker hdfs hive hive-metastore hue pyspark
Last synced: 14 Oct 2024
https://github.com/hyeonsangjeon/dataplatform
Hadoop3.2 single/cluster mode with web terminal gotty, spark, jupyter pyspark, hive, eco etc.
hadoop hadoop-cluster hadoop-docker hadoop-ecosystem hadoop-mapreduce hive pyspark-notebook zeppelin-notebook
Last synced: 17 Nov 2024
https://github.com/manuparra/masterdegreecc_practice
Taller del Máster Profesional de Informática UGR. Curso de CloudComputing.
cloudcomputing cluster docker docker-cluster docker-container hadoop hadoop-cluster hdfs opennebula practice virtual-machine
Last synced: 07 Nov 2024
https://github.com/mikeroyal/apache-ignite-guide
Apache Ignite Guide
data-science database hadoop hadoop-cluster ignite nosql nosql-data-storage nosql-databases stream-processing streaming
Last synced: 12 Dec 2024
https://github.com/mitre/clusterconf
Manage Hadoop cluster configurations
hadoop hadoop-cluster r r-package rstats
Last synced: 09 Nov 2024
https://github.com/mitre/webhdfs
Interface with WebHDFS Service in a Cluster-Neutral Way
hadoop-cluster r r-package rstats webhdfs
Last synced: 09 Nov 2024
https://github.com/conema/spark-terraform
This project create an Hadoop and Spark cluster on Amazon AWS with Terraform
aws cluster hadoop hadoop-cluster hcl spark spark-clusters terraform
Last synced: 20 Nov 2024
https://github.com/mikeroyal/apache-hadoop-guide
Apache Hadoop Guide
hadoop hadoop-cluster hadoop-filesystem hadoop-hdfs hadoop-mapreduce
Last synced: 12 Dec 2024
https://github.com/codito/hadoop-expt
Experiments with Hadoop cluster setups in Docker
docker docker-compose hadoop hadoop-cluster hadoop-docker
Last synced: 10 Nov 2024
https://github.com/yjham2002/hadoop_clustering
:book: Apache Hadoop Based Clustering Tutorial
hadoop hadoop-cluster mac-osx mapreduce
Last synced: 12 Dec 2024
https://github.com/akaliutau/hadoop-cluster
Batch data processing on the dockerized Hadoop cluster
batch-processing hadoop-cluster hdf5 hdfs java mapreduce
Last synced: 12 Nov 2024
https://github.com/mariam-iftikhar/bigdata
The repository showcases a series of exercises and projects focused on big data processing using Hadoop, HBase, Hive, and Spark with Python. Hosted on AWS EMR, these projects demonstrate efficient data handling and processing techniques, leveraging the power of cloud computing to tackle complex data challenges.
apache-spark awsec2 awsemr hadoop-cluster hadoop-mapreduce hbase hiveql
Last synced: 16 Nov 2024
https://github.com/imdeepanshugpt/hadoop
Hadoop-Cluster
docker docker-compose docker-container docker-image hadoop hadoop-cluster hadoop-docker hadoop-filesystem hadoop-framework hadoop-mapreduce hadoop-streaming
Last synced: 23 Nov 2024
https://github.com/aimanamri/raspberry-pi4-hadoop-spark-cluster
This is a self-documentation of learning distributed data storage, parallel processing, and Linux OS using Apache Hadoop, Apache Spark and Raspbian OS. In this project, 3-node cluster will be setup using Raspberry Pi 4, install HDFS and run Spark processing jobs via YARN.
big-data distributed-storage hadoop-cluster hdfs parallel-processing pyspark raspberry-pi-4 spark-cluster spark-shell yarn
Last synced: 05 Nov 2024
https://github.com/kumarvna/terraform-azurerm-hdinsight
Terraform module to create managed, full-spectrum, open-source analytics service Azure HDInsight. This module creates Apache Hadoop, Apache Spark, Apache HBase, Interactive Query (Apache Hive LLAP) and Apache Kafka clusters.
apache-hive-cluster azure azure-hdinsight hadoop-cluster hadoop-filesystem hadoop-hdfs hbase-cluster hdinsight-cluster hdinsight-hadoop-cluster hdinsight-hbase-cluster hdinsight-interactive-query-cluster hdinsight-kafka-cluster hdinsight-spark-cluster kafka-cluster spark-cluster spark-clusters terraform terraform-module
Last synced: 08 Nov 2024
https://github.com/spineo/hadoop-app
ansible ansible-inventory ansible-playbook hadoop hadoop-cluster hadoop-hdfs hadoop-mapreduce hdfs yarn
Last synced: 23 Nov 2024
https://github.com/xpcosmos/data-lake-prime
This project aims to simulate and configure a Distributed File System using Hadoop HDFS. For this project, 3 machines were created: 1 Master Node and 2 Worker Nodes.
hadoop hadoop-cluster hadoop-hdfs hdfs network
Last synced: 14 Nov 2024
https://github.com/akshayavb99/ansible-examples
The repository contains all the Playbooks and other files used to work with different applications for Ansible
ansible ansible-playbooks docker dynamic-inventory-aws explanation hadoop-cluster linux-scripting loadbalancer rhel8 webserver webserver-setup webservers yum
Last synced: 14 Nov 2024
https://github.com/avojak/aws-hadoop-cluster
Infrastructure and configuration-as-code for standing up a Hadoop cluster in AWS
ansible aws aws-ec2 configuration-as-code hadoop hadoop-cluster infrastructure-as-code terraform
Last synced: 12 Dec 2024
https://github.com/gmartinezramirez-old/practice-hadoop
[Study] Daily plan for practice Hadoop.
big-data hadoop hadoop-cluster hadoop-mini-clusters hive java mapreduce
Last synced: 05 Dec 2024