Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
Projects in Awesome Lists tagged with hadoop
A curated list of projects in awesome lists tagged with hadoop .
https://github.com/hrolive/big-data-analysis-with-hadoop-and-rhadoop
Foundations of “Big Data” processing by introducing the Hadoop distributed computing architecture and providing an introductory level tutorial for Big Data analysis using Hadoop, Rhadoop, and R libraries parallel, doParallel, foreach and Rmpi.
big-data big-data-analytics hadoop hdfs hpc hpc-clusters jupyter mapreduce mpi python r rstudio unix
Last synced: 04 Jan 2025
https://github.com/vicentebolea/md5-hadoop-cracker
Cracker of MD5 passwords using Hadoop
brute-force decryption hadoop md5 password-cracker
Last synced: 15 Jan 2025
https://github.com/sameetasadullah/finding-average-length-of-comments-using-mapreduce-hadoop
Program coded in Java language to find the average length of comments in a large file using Hadoop MapReduce
hadoop hadoop-mapreduce java linux ubuntu
Last synced: 21 Jan 2025
https://github.com/daniellansun/hadoop-wordcount
Word counting example for hadoop 3.0 with gradle
Last synced: 14 Jan 2025
https://github.com/divithraju/divith-raju-data-mining
This project focuses on customer segmentation using data mining techniques, specifically K-Means clustering, to classify customers into distinct groups based on their purchasing behaviors. The goal is to analyze customer data and segment them into clusters for targeted marketing strategies and better customer relationship management.
algorthims analytics apache business client connector data dataarchitecture database dataengineering datamining datascience hadoop k-means-clustering mysql project project-repository pyspark python3 spark
Last synced: 17 Jan 2025
https://github.com/shirshadatta/hadoop-cheatsheet
Your go-to-cheatsheet to learn apache-Hadoop.
dfs hadoop hadoop-cheatcheet hdfs-client hdfs-cluster jdk- masternode multitier-architecture redhat-enterprise-linux slave-nodes
Last synced: 20 Jan 2025
https://github.com/isaccanedo/ignite
:cloud: Apache Ignite
big-data cache cloud data-management-platform database distributed-sql-database hadoop ignite in-memory-computing in-memory-database iot network-client network-server osgi sql
Last synced: 12 Jan 2025
https://github.com/jordicenzano/hadoop-tutorial
Initial experiments with Hadoop
bigdata docker docker-compose hadoop mapreduce
Last synced: 06 Jan 2025
https://github.com/hellomaxime/sparkstreaming-hbase
Spark streaming to hbase
apache-spark hadoop hbase nosql spark-streaming
Last synced: 28 Dec 2024
https://github.com/abtinz/cloud-computing
cassandra cassandra-driver cloud-computing docker elasticsearch hadoop hdfs kubernetes redis spark
Last synced: 21 Jan 2025
https://github.com/angeligareta/spark-hadoop-hbase-overview
First lab for Data-Intensive Computing course at KTH where we are introduced to Apache Spark MLlib and Spark SQL, Hadoop, and HBase.
apache-spark data-intensive hadoop hbase hbase-table id2221 kth scala spark spark-mllib spark-sql
Last synced: 22 Jan 2025
https://github.com/daixinye/zjucst
作业 & 实验
blockchain cpp hadoop iot object-oriented spark
Last synced: 20 Jan 2025
https://github.com/rootsongjc/hadoop-all-in-one
Build a hadoop-all-in-one docker image.
Last synced: 20 Dec 2024
https://github.com/razo7/nap-hadoop-2.9.1
Full Stack of Hadoop 2.9.1 for Compilation and Modification
hadoop hadoop-mapreduce-framework hadoop-mapreduce-programming heartbeat java mapreduce maven-plugin nap-hadoop reducers-location
Last synced: 28 Dec 2024
https://github.com/ralgond/bigdata-example
Hadoop、Hive和Spark的例子、细节和注意事项
bigdata hadoop hdfs hive map-reduce spark
Last synced: 09 Jan 2025
https://github.com/martincastroalvarez/hadoop-hdfs-map-reduce-docker
Running Map Reduce in Hadoop using Docker
big-data bigdata hadoop hdfs map-reduce
Last synced: 22 Dec 2024
https://github.com/vitalibo/grapes
Six degrees of separation theory research
ansible dijkstra-algorithm graph hadoop mapreduce research six-degrees-of-separation
Last synced: 27 Dec 2024
https://github.com/dgroomes/hadoop-playground
📚 Learning and exploring core Apache Hadoop and its surrounding ecosystem
Last synced: 25 Jan 2025
https://github.com/manuparra/hadoop-statistics
Calculate statistical measures of one column in big data Datasets with these simply Hadoop Application
avg bigdata hadoop java massive-datasets max min standardeviation
Last synced: 27 Dec 2024
https://github.com/elhanarinc/ceng495
Ceng 495 Cloud Computing Assignments
hadoop javascript jquery mapreduce nodejs semantic-ui
Last synced: 29 Jan 2025
https://github.com/manuparra/clustering-openstack
Make a dynamic and customizable cluster with OpenStack
cluster deployment hadoop openstack openstack-command script slave-nodes spark
Last synced: 27 Dec 2024
https://github.com/thdaraujo/cheat
A handful of cheatsheets and programming tips.
bash cheat-sheets cheatsheet dms hadoop postgresql spark sqoop
Last synced: 24 Jan 2025
https://github.com/hereismari/hadoop-job-time-prediction
Code used to perform some Hadoop job predictions experiments using OpenStack Sahara.
hadoop hadoop-job prediction sahara
Last synced: 17 Dec 2024
https://github.com/liuhaozzu/big_data
nginx+flume+hadoop+hbase
flume-ng hadoop hbase mapreduce
Last synced: 29 Jan 2025
https://github.com/javiroman/dfsadmin-inotify
Simple Java example for testing the DFSAdmin API used in Apache NiFi GetHDFSEvents Processor
hadoop hdfs nifi nifi-processors
Last synced: 31 Dec 2024
https://github.com/krishnadey30/intro-to-hadoop-and-mapreduce
hadoop hadoop-mapreduce hadoop-streaming python-script
Last synced: 24 Jan 2025
https://github.com/aromoh/basic-sentiment-analysis-mrjob-twitter-
Project developed to make an sentiment analysis using dictionary implemented with MrJob applying a map-reduce model. It can be executed locally or in HDFS enviroments (such as Hadoop or AWS)
aws-ec2 hadoop hdfs-enviroments map-reduce mrjob sentiment-analysis twiiter
Last synced: 09 Dec 2024
https://github.com/chouaib-629/movierecommendation
A Hadoop-based Movie Recommendation System using the MovieLens dataset, demonstrating MapReduce for sorting and processing movie ratings.
big-data data-processing distributed-computing hadoop hadoop-hdfs hadoop-mapreduce hdfs java java-mapreduce mapreduce movielens sorting
Last synced: 05 Jan 2025
https://github.com/chen0040/java-hdfs-client
Java hadoop client that provides convenients api for file management and interaction with hadoop file system
hadoop hdfs hdfs-client java-client
Last synced: 16 Dec 2024
https://github.com/ishaansathaye/csc369-introdistributedcomputing
Cal Poly Fall 2024 CSC 369 Intro to Distributed Computing
distributed-computing hadoop java map-reduce scala spark
Last synced: 17 Dec 2024
https://github.com/anthonycalandra/wikipedia-tfidf
A Hadoop-powered search index of Wikipedia articles.
Last synced: 16 Dec 2024
https://github.com/darule0/yarndiff
A rudimentary command line utility for contrasting Apache Yarn container logs.
diff difference diffing hadoop hadoop-mapreduce hive log4j mapreduce pig spark yarn yarn2
Last synced: 23 Dec 2024
https://github.com/alexp11223/hadoopmapreducematrixmult
hadoop hadoop-mapreduce mapreduce matrix-multiplication sparse-matrix
Last synced: 13 Dec 2024
https://github.com/yjham2002/hadoop_clustering
:book: Apache Hadoop Based Clustering Tutorial
hadoop hadoop-cluster mac-osx mapreduce
Last synced: 12 Dec 2024
https://github.com/mikeroyal/apache-storm-guide
Apache Storm Guide
batch-processing data-science dataprocessing hadoop real-time storm storm-topology
Last synced: 12 Dec 2024
https://github.com/mxagar/spark_big_data_guide
This repository contains my personal guide on Spark and topics related to Big Data.
big-data hadoop machine-learning spark
Last synced: 23 Dec 2024
https://github.com/nuttymoon/jumbo-hdp3
Jumbo bundle for HDP3 stack
ansible hadoop hortonworks-hdp jumbo
Last synced: 12 Dec 2024
https://github.com/adelin-info/tp_datacloud
Architecture et développement des systèmes distribuées à large echelle
hadoop java map-reduce scala spark yarn zookeeper
Last synced: 30 Jan 2025
https://github.com/janheinrichmerker/hadoop-ktx
💾 Kotlin Extensions for Apache Hadoop (MapReduce).
hadoop hadoop-ktx hadoop-mapreduce kotlin kotlin-extensions kotlin-jvm kotlin-library
Last synced: 24 Dec 2024
https://github.com/alexcombessie/ensae_distributed-lasso-hadoop
Distributed Lasso regression with Hadoop Pig - Project for the "Practical tools for the analysis of Big Data" course by Xavier Dupre at ENSAE ParisTech
distributed-systems ensae-paristech hadoop
Last synced: 24 Dec 2024
https://github.com/zhaytam/pagerank
An implementation of the PageRank algorithm in Hadoop MapReduce
hadoop java pagerank-algorithm
Last synced: 19 Dec 2024
https://github.com/mikeacosta/san-francisco-crime
SF crime data analysis with Apache Spark
apache-hive apache-spark hadoop hdfs hortonworks
Last synced: 10 Jan 2025
https://github.com/zhulg/hadoopwordcount
Hadoop example wordCount, maven ,intellij
Last synced: 11 Jan 2025
https://github.com/yadvi12/lariox-automation
A voice-control automated system.
ansible ansible-playbook apache aws docker hadoop kubernetes kubernetes-cluster python3 webserver
Last synced: 24 Jan 2025
https://github.com/vasugi2003/web-server-log-analysis-using-pyspark
Web Server Log analysis using Pyspark
algorithms analysis big-data-analytics hadoop ml prediction pyspark python3
Last synced: 11 Jan 2025
https://github.com/isaccanedo/apache-accumulo
:battery: Apache Accumulo is a sorted, distributed key/value store that provides robust, scalable data storage and retrieva
accumulo apache big-data cluster distribued hackertoberfest hadoop hdfs key-value zookeper
Last synced: 12 Jan 2025
https://github.com/multivacplatform/multivac-elasticsearch
Demoing Spark 2.2 and Elasticsearch Hadoop connector
Last synced: 12 Jan 2025
https://github.com/multivacplatform/multivac-pubmed
Update PubMed articles daily on HDFS by using Spark Cluster
apache-spark dataframe hadoop hdfs pubmed pubmed-parser spark-sql yarn
Last synced: 12 Jan 2025
https://github.com/neshkeev/containers
A library of containers packaged by neshkeev
Last synced: 31 Jan 2025
https://github.com/adampaternostro/azure-hdi-distcp
Creates a HDInsight cluster then runs distcp remotely to copy data between blob and/or data lake (ADLS)
azure azure-data-lake azure-storage distcp file-copy hadoop hdinsight
Last synced: 31 Jan 2025
https://github.com/vibhuti03/hadoop-administration-analysis
Setting up of a cluster and performing analysis of Aadhar Dataset using Apache Hive
aadhar-dataset cluster hadoop hadoop-administration-analysis hadoop-hdfs hive nonhacluster performing-analysis
Last synced: 12 Jan 2025
https://github.com/shreyas-gopalakrishna/datacenter-scale-computing
big-data docker flask hadoop kubernetes rabbitmq redis spark
Last synced: 20 Jan 2025
https://github.com/dimajix/docker-hadoop
Repository for building Docker containers for Hadoop
Last synced: 05 Jan 2025
https://github.com/ltossian/bike-sales-data-metrics
Traitement, stockage, analyse et visualisation d'un fichier csv volumineux et de données en temps réel de ventes de vélos.
fastapi grafana hadoop kafka postgresql python spark
Last synced: 11 Oct 2024
https://github.com/onecricketeer/mapreduce-sandbox
Sandbox for Hadoop MapReduce
hadoop mapreduce sandbox-development
Last synced: 21 Jan 2025
https://github.com/dev88jerry/cs450
Bishop's University - CS450 Elements of Big Data
big-data data-science hadoop spark
Last synced: 08 Jan 2025
https://github.com/mikma03/databases
Main purpose of this repository is to generate knowledge about databases in general view.
cassandra graphql hadoop mongodb msql neo4j newsql nosql oracle-database postgresql redis sql
Last synced: 09 Jan 2025
https://github.com/martincastroalvarez/apache-hive-docker
Running Hive jobs using Docker
Last synced: 22 Dec 2024
https://github.com/martincastroalvarez/hadoop-hdfs-kafka-docker
Running Kafka using Docker
Last synced: 22 Dec 2024
https://github.com/martincastroalvarez/hadoop-hdfs-spark-docker
Running Spark jobs using Docker
Last synced: 22 Dec 2024
https://github.com/dhchenx/simplehadooptool
A tool to submit MapReduce jobs to Hadoop cluster.
client-server hadoop hadoop-api job mapreduce simple-hadoop-tool submit
Last synced: 29 Jan 2025
https://github.com/dhchenx/catla-hs
Catla for Hadoop and Spark (Catla-HS): An open-source system to support tuning MapReduce performance on Hadoop and Spark clusters.
big-data catla-hs hadoop machine-learning mapreduce parameter-search performance-tuning self-tuning-system spark visualization
Last synced: 29 Jan 2025
https://github.com/dhilipsiva/intro-to-big-data
Introduction to Big Data with practical use-cases (Meetup Talk)
big-data demo hadoop meetup-talk presentation presentations talk talks
Last synced: 21 Dec 2024
https://github.com/jferrl/gutemberg-analysis
Gutemberg corpus analysis with apache hadoop
analysis gutemberg hadoop java
Last synced: 19 Jan 2025
https://github.com/ssanthosh010303/collection-data-training
A collection of challenges exercised during data training program.
airflow apache azure azure-data-factory azure-databricks azure-logic-apps bigdata data hadoop spark
Last synced: 17 Jan 2025
https://github.com/mikma03/data_streaming
All topics related to data streaming and real-time analysis
apache docker hadoop kafka kubernetes spark-streaming
Last synced: 09 Jan 2025
https://github.com/xunliu/submarine-installer
hadoop submarine runtime environment installation
deep-learning hadoop submarine
Last synced: 20 Jan 2025
https://github.com/aleskandro/r-hadoop-madreduce-examples
A lot of examples about using R with hadoop for MapReduce with and without libraries as rhadoop/rhipe - [email protected] - Advanced Programming Languages
data-analysis hadoop mapreduce r
Last synced: 27 Dec 2024
https://github.com/billsioros/big-data
Large Scale Data Management Systems MSc. Project
Last synced: 24 Jan 2025
https://github.com/rupeshtr78/blog
Big Data Spark Hadoop Kafka Flink Spark Streaming
aws bigdata cassandra elasticsearch emr-cluster flink hadoop hive hue kafka mapreduce mongodb oozie spark sparkstreaming yarn
Last synced: 12 Jan 2025
https://github.com/dominicluidold/ws21-introductiontobigdataprojects
A collection of mandatory exercises in "Introduction to Big Data Projects" - 1st semester master @ Vorarlberg University of Applied Sciences (FHV)
avro bigdata hadoop java map-reduce
Last synced: 29 Jan 2025
https://github.com/liuhaozzu/data-mining-algorithms
data mining algorithm -based on Hadoop-2.7.3
data-mining hadoop hadoop-mapreduce java-8
Last synced: 29 Jan 2025
https://github.com/sandysanthosh/hadoop-basics
Hadoop Basics with Tabluae read data from Mysql
Last synced: 11 Jan 2025
https://github.com/myui/yarnkit
Yarnkit is a toolkit to write YARN applications
Last synced: 06 Dec 2024
https://github.com/srfrnk/spar-kube
Spark cluster deployment on a k8s cluster
hadoop k8s k8s-cluster kubernetes spar-kube spark zeppelin
Last synced: 29 Jan 2025
https://github.com/menxit/hadoop-3.0
Docker image of hadoop:3.0
bigdata docker hadoop sparkachetipassa
Last synced: 08 Jan 2025
https://github.com/yukta026/tokyo-olympics-2021-analytics
An end-to-end ETL pipeline for analyzing and visualizing Tokyo Olympics 2021 data using Azure tools and Power BI.
azure data-engineering etl hadoop powerbi python3 spark sql
Last synced: 11 Oct 2024
https://github.com/yuhexiong/deploy-hadoop-guide
apache-hadoop deployment hadoop hdfs
Last synced: 30 Jan 2025
https://github.com/manuparra/tallerh2s
Taller HDFS, Hadoop y Spark para el Master Profesional de Ingeniería Informática - Universidad de Granada
hadoop hdfs java map-reduce python spark wordcount
Last synced: 07 Nov 2024
https://github.com/riccardorevalor/mapreduce
Collection of exercises regarding Hadoop and MapReduce approach
hadoop hadoop-mapreduce mapreduce
Last synced: 14 Dec 2024