Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

Projects in Awesome Lists tagged with hadoop

A curated list of projects in awesome lists tagged with hadoop .

https://github.com/tuancamtbtx/bigdata-sdk

Some Data Connector In Big Data

elasticsearch hadoop spark

Last synced: 02 Jan 2025

https://github.com/hrolive/big-data-analysis-with-hadoop-and-rhadoop

Foundations of “Big Data” processing by introducing the Hadoop distributed computing architecture and providing an introductory level tutorial for Big Data analysis using Hadoop, Rhadoop, and R libraries parallel, doParallel, foreach and Rmpi.

big-data big-data-analytics hadoop hdfs hpc hpc-clusters jupyter mapreduce mpi python r rstudio unix

Last synced: 04 Jan 2025

https://github.com/vicentebolea/md5-hadoop-cracker

Cracker of MD5 passwords using Hadoop

brute-force decryption hadoop md5 password-cracker

Last synced: 15 Jan 2025

https://github.com/sameetasadullah/finding-average-length-of-comments-using-mapreduce-hadoop

Program coded in Java language to find the average length of comments in a large file using Hadoop MapReduce

hadoop hadoop-mapreduce java linux ubuntu

Last synced: 21 Jan 2025

https://github.com/daniellansun/hadoop-wordcount

Word counting example for hadoop 3.0 with gradle

gradle groovy hadoop hadoop3

Last synced: 14 Jan 2025

https://github.com/divithraju/divith-raju-data-mining

This project focuses on customer segmentation using data mining techniques, specifically K-Means clustering, to classify customers into distinct groups based on their purchasing behaviors. The goal is to analyze customer data and segment them into clusters for targeted marketing strategies and better customer relationship management.

algorthims analytics apache business client connector data dataarchitecture database dataengineering datamining datascience hadoop k-means-clustering mysql project project-repository pyspark python3 spark

Last synced: 17 Jan 2025

https://github.com/shahiransari/onlineretailanalysis

The project aim is to analyse online retail data logs and find various insights that might help in evaluating and helping the business

analysis big-data cloudera hadoop pig pig-latin

Last synced: 26 Jan 2025

https://github.com/jordicenzano/hadoop-tutorial

Initial experiments with Hadoop

bigdata docker docker-compose hadoop mapreduce

Last synced: 06 Jan 2025

https://github.com/mjngxwnj/olympics_data_project

A personal project that builds an end-to-end data pipeline using the 2024 Olympics data.

airflow docker hadoop python snowflake spark superset

Last synced: 10 Oct 2024

https://github.com/ezeparziale/big-data-cluster

:elephant: Cluster big data

big-data bigdata hadoop hdfs hive spark zookeeper

Last synced: 20 Jan 2025

https://github.com/flavienbwk/aws-terraform-ansible-kvm-hadoop

Install and configure Hadoop on KVM machines with Ansible, bootstrapped by Terraform on AWS.

ansible aws hadoop kvm scaleway terraform

Last synced: 28 Jan 2025

https://github.com/angeligareta/spark-hadoop-hbase-overview

First lab for Data-Intensive Computing course at KTH where we are introduced to Apache Spark MLlib and Spark SQL, Hadoop, and HBase.

apache-spark data-intensive hadoop hbase hbase-table id2221 kth scala spark spark-mllib spark-sql

Last synced: 22 Jan 2025

https://github.com/rootsongjc/hadoop-all-in-one

Build a hadoop-all-in-one docker image.

docker-image hadoop

Last synced: 20 Dec 2024

https://github.com/ralgond/bigdata-example

Hadoop、Hive和Spark的例子、细节和注意事项

bigdata hadoop hdfs hive map-reduce spark

Last synced: 09 Jan 2025

https://github.com/martincastroalvarez/hadoop-hdfs-map-reduce-docker

Running Map Reduce in Hadoop using Docker

big-data bigdata hadoop hdfs map-reduce

Last synced: 22 Dec 2024

https://github.com/zoltan-nz/hadoop

Tutorial: using hadoop and docker containers to analysing AOL search result.

aol docker hadoop search tutorial

Last synced: 22 Jan 2025

https://github.com/dgroomes/hadoop-playground

📚 Learning and exploring core Apache Hadoop and its surrounding ecosystem

hadoop

Last synced: 25 Jan 2025

https://github.com/manuparra/hadoop-statistics

Calculate statistical measures of one column in big data Datasets with these simply Hadoop Application

avg bigdata hadoop java massive-datasets max min standardeviation

Last synced: 27 Dec 2024

https://github.com/elhanarinc/ceng495

Ceng 495 Cloud Computing Assignments

hadoop javascript jquery mapreduce nodejs semantic-ui

Last synced: 29 Jan 2025

https://github.com/manuparra/clustering-openstack

Make a dynamic and customizable cluster with OpenStack

cluster deployment hadoop openstack openstack-command script slave-nodes spark

Last synced: 27 Dec 2024

https://github.com/thdaraujo/cheat

A handful of cheatsheets and programming tips.

bash cheat-sheets cheatsheet dms hadoop postgresql spark sqoop

Last synced: 24 Jan 2025

https://github.com/hereismari/hadoop-job-time-prediction

Code used to perform some Hadoop job predictions experiments using OpenStack Sahara.

hadoop hadoop-job prediction sahara

Last synced: 17 Dec 2024

https://github.com/liuhaozzu/big_data

nginx+flume+hadoop+hbase

flume-ng hadoop hbase mapreduce

Last synced: 29 Jan 2025

https://github.com/javiroman/dfsadmin-inotify

Simple Java example for testing the DFSAdmin API used in Apache NiFi GetHDFSEvents Processor

hadoop hdfs nifi nifi-processors

Last synced: 31 Dec 2024

https://github.com/aromoh/basic-sentiment-analysis-mrjob-twitter-

Project developed to make an sentiment analysis using dictionary implemented with MrJob applying a map-reduce model. It can be executed locally or in HDFS enviroments (such as Hadoop or AWS)

aws-ec2 hadoop hdfs-enviroments map-reduce mrjob sentiment-analysis twiiter

Last synced: 09 Dec 2024

https://github.com/chouaib-629/movierecommendation

A Hadoop-based Movie Recommendation System using the MovieLens dataset, demonstrating MapReduce for sorting and processing movie ratings.

big-data data-processing distributed-computing hadoop hadoop-hdfs hadoop-mapreduce hdfs java java-mapreduce mapreduce movielens sorting

Last synced: 05 Jan 2025

https://github.com/chen0040/java-hdfs-client

Java hadoop client that provides convenients api for file management and interaction with hadoop file system

hadoop hdfs hdfs-client java-client

Last synced: 16 Dec 2024

https://github.com/ishaansathaye/csc369-introdistributedcomputing

Cal Poly Fall 2024 CSC 369 Intro to Distributed Computing

distributed-computing hadoop java map-reduce scala spark

Last synced: 17 Dec 2024

https://github.com/gnaneshkunal/scala-hadoop

Hadoop programming using Scala

big-data bigdata hadoop scala spark sql

Last synced: 16 Dec 2024

https://github.com/anthonycalandra/wikipedia-tfidf

A Hadoop-powered search index of Wikipedia articles.

hadoop mapreduce tf-idf

Last synced: 16 Dec 2024

https://github.com/darule0/yarndiff

A rudimentary command line utility for contrasting Apache Yarn container logs.

diff difference diffing hadoop hadoop-mapreduce hive log4j mapreduce pig spark yarn yarn2

Last synced: 23 Dec 2024

https://github.com/yjham2002/hadoop_clustering

:book: Apache Hadoop Based Clustering Tutorial

hadoop hadoop-cluster mac-osx mapreduce

Last synced: 12 Dec 2024

https://github.com/mxagar/spark_big_data_guide

This repository contains my personal guide on Spark and topics related to Big Data.

big-data hadoop machine-learning spark

Last synced: 23 Dec 2024

https://github.com/nuttymoon/jumbo-hdp3

Jumbo bundle for HDP3 stack

ansible hadoop hortonworks-hdp jumbo

Last synced: 12 Dec 2024

https://github.com/adelin-info/tp_datacloud

Architecture et développement des systèmes distribuées à large echelle

hadoop java map-reduce scala spark yarn zookeeper

Last synced: 30 Jan 2025

https://github.com/janheinrichmerker/hadoop-ktx

💾 Kotlin Extensions for Apache Hadoop (MapReduce).

hadoop hadoop-ktx hadoop-mapreduce kotlin kotlin-extensions kotlin-jvm kotlin-library

Last synced: 24 Dec 2024

https://github.com/alokjani/bigdata-vagrant-devlab

Hadoop Software Development sandbox

centos flume hadoop hive pig sqoop zeppelin

Last synced: 18 Dec 2024

https://github.com/alexcombessie/ensae_distributed-lasso-hadoop

Distributed Lasso regression with Hadoop Pig - Project for the "Practical tools for the analysis of Big Data" course by Xavier Dupre at ENSAE ParisTech

distributed-systems ensae-paristech hadoop

Last synced: 24 Dec 2024

https://github.com/vubacktracking/hdfs-stream-processing

Streaming data processing using Hadoop HDFS, Spark, Kafka, Minio, Elasticsearch

airflow elastic hadoop hdfs kafka kibana minio spark

Last synced: 11 Oct 2024

https://github.com/zhaytam/pagerank

An implementation of the PageRank algorithm in Hadoop MapReduce

hadoop java pagerank-algorithm

Last synced: 19 Dec 2024

https://github.com/mikeacosta/san-francisco-crime

SF crime data analysis with Apache Spark

apache-hive apache-spark hadoop hdfs hortonworks

Last synced: 10 Jan 2025

https://github.com/zhulg/hadoopwordcount

Hadoop example wordCount, maven ,intellij

hadoop intellij

Last synced: 11 Jan 2025

https://github.com/isaccanedo/apache-accumulo

:battery: Apache Accumulo is a sorted, distributed key/value store that provides robust, scalable data storage and retrieva

accumulo apache big-data cluster distribued hackertoberfest hadoop hdfs key-value zookeper

Last synced: 12 Jan 2025

https://github.com/worst001/note_bigdata

收录了大数据相关各类资料、笔记、手册

bigdata cdh datawarehouse development flink flume guide hadoop hbase hive learning markdown mkdocs note notebook spark

Last synced: 12 Jan 2025

https://github.com/multivacplatform/multivac-elasticsearch

Demoing Spark 2.2 and Elasticsearch Hadoop connector

elasticsearch hadoop spark

Last synced: 12 Jan 2025

https://github.com/multivacplatform/multivac-pubmed

Update PubMed articles daily on HDFS by using Spark Cluster

apache-spark dataframe hadoop hdfs pubmed pubmed-parser spark-sql yarn

Last synced: 12 Jan 2025

https://github.com/neshkeev/containers

A library of containers packaged by neshkeev

docker hadoop k8s kubernetes

Last synced: 31 Jan 2025

https://github.com/adampaternostro/azure-hdi-distcp

Creates a HDInsight cluster then runs distcp remotely to copy data between blob and/or data lake (ADLS)

azure azure-data-lake azure-storage distcp file-copy hadoop hdinsight

Last synced: 31 Jan 2025

https://github.com/vibhuti03/hadoop-administration-analysis

Setting up of a cluster and performing analysis of Aadhar Dataset using Apache Hive

aadhar-dataset cluster hadoop hadoop-administration-analysis hadoop-hdfs hive nonhacluster performing-analysis

Last synced: 12 Jan 2025

https://github.com/dimajix/docker-hadoop

Repository for building Docker containers for Hadoop

docker hadoop

Last synced: 05 Jan 2025

https://github.com/ltossian/bike-sales-data-metrics

Traitement, stockage, analyse et visualisation d'un fichier csv volumineux et de données en temps réel de ventes de vélos.

fastapi grafana hadoop kafka postgresql python spark

Last synced: 11 Oct 2024

https://github.com/onecricketeer/mapreduce-sandbox

Sandbox for Hadoop MapReduce

hadoop mapreduce sandbox-development

Last synced: 21 Jan 2025

https://github.com/dev88jerry/cs450

Bishop's University - CS450 Elements of Big Data

big-data data-science hadoop spark

Last synced: 08 Jan 2025

https://github.com/mikma03/databases

Main purpose of this repository is to generate knowledge about databases in general view.

cassandra graphql hadoop mongodb msql neo4j newsql nosql oracle-database postgresql redis sql

Last synced: 09 Jan 2025

https://github.com/martincastroalvarez/apache-hive-docker

Running Hive jobs using Docker

hadoop hdfs hive

Last synced: 22 Dec 2024

https://github.com/martincastroalvarez/hadoop-hdfs-kafka-docker

Running Kafka using Docker

docker hadoop hdfs kafka

Last synced: 22 Dec 2024

https://github.com/martincastroalvarez/hadoop-hdfs-spark-docker

Running Spark jobs using Docker

docker hadoop spark

Last synced: 22 Dec 2024

https://github.com/dhchenx/simplehadooptool

A tool to submit MapReduce jobs to Hadoop cluster.

client-server hadoop hadoop-api job mapreduce simple-hadoop-tool submit

Last synced: 29 Jan 2025

https://github.com/dhchenx/catla-hs

Catla for Hadoop and Spark (Catla-HS): An open-source system to support tuning MapReduce performance on Hadoop and Spark clusters.

big-data catla-hs hadoop machine-learning mapreduce parameter-search performance-tuning self-tuning-system spark visualization

Last synced: 29 Jan 2025

https://github.com/dhilipsiva/intro-to-big-data

Introduction to Big Data with practical use-cases (Meetup Talk)

big-data demo hadoop meetup-talk presentation presentations talk talks

Last synced: 21 Dec 2024

https://github.com/jferrl/gutemberg-analysis

Gutemberg corpus analysis with apache hadoop

analysis gutemberg hadoop java

Last synced: 19 Jan 2025

https://github.com/ssanthosh010303/collection-data-training

A collection of challenges exercised during data training program.

airflow apache azure azure-data-factory azure-databricks azure-logic-apps bigdata data hadoop spark

Last synced: 17 Jan 2025

https://github.com/mikma03/data_streaming

All topics related to data streaming and real-time analysis

apache docker hadoop kafka kubernetes spark-streaming

Last synced: 09 Jan 2025

https://github.com/xunliu/submarine-installer

hadoop submarine runtime environment installation

deep-learning hadoop submarine

Last synced: 20 Jan 2025

https://github.com/iamsushantk/zira

Zeppelin and Impala for Reporting and Analytics

analytics bigdata hadoop reporting zepplin

Last synced: 29 Jan 2025

https://github.com/aleskandro/r-hadoop-madreduce-examples

A lot of examples about using R with hadoop for MapReduce with and without libraries as rhadoop/rhipe - [email protected] - Advanced Programming Languages

data-analysis hadoop mapreduce r

Last synced: 27 Dec 2024

https://github.com/billsioros/big-data

Large Scale Data Management Systems MSc. Project

big-data hadoop hdfs pyspark

Last synced: 24 Jan 2025

https://github.com/bearddan2000/dev-java-cli-maven-hbase-client

A POC for connecting to a hadoop cluster using hbase.

cli client dev hadoop hbase java maven zookeeper

Last synced: 29 Jan 2025

https://github.com/davidkhala/data-warehouse

data warehouse index

databricks hadoop teradata

Last synced: 19 Dec 2024

https://github.com/dominicluidold/ws21-introductiontobigdataprojects

A collection of mandatory exercises in "Introduction to Big Data Projects" - 1st semester master @ Vorarlberg University of Applied Sciences (FHV)

avro bigdata hadoop java map-reduce

Last synced: 29 Jan 2025

https://github.com/liuhaozzu/data-mining-algorithms

data mining algorithm -based on Hadoop-2.7.3

data-mining hadoop hadoop-mapreduce java-8

Last synced: 29 Jan 2025

https://github.com/sandysanthosh/hadoop-basics

Hadoop Basics with Tabluae read data from Mysql

hadoop tabluea

Last synced: 11 Jan 2025

https://github.com/lingumd/amazon_vine_analysis

Analysis to determine if there is any bias toward favorable reviews from Amazon Vine members in the Beauty products dataset.

analysis aws aws-rds aws-s3 etl google-colab hadoop os postgres pyspark sql

Last synced: 23 Jan 2025

https://github.com/myui/yarnkit

Yarnkit is a toolkit to write YARN applications

hadoop yarn

Last synced: 06 Dec 2024

https://github.com/srfrnk/spar-kube

Spark cluster deployment on a k8s cluster

hadoop k8s k8s-cluster kubernetes spar-kube spark zeppelin

Last synced: 29 Jan 2025

https://github.com/menxit/hadoop-3.0

Docker image of hadoop:3.0

bigdata docker hadoop sparkachetipassa

Last synced: 08 Jan 2025

https://github.com/yukta026/tokyo-olympics-2021-analytics

An end-to-end ETL pipeline for analyzing and visualizing Tokyo Olympics 2021 data using Azure tools and Power BI.

azure data-engineering etl hadoop powerbi python3 spark sql

Last synced: 11 Oct 2024

https://github.com/manuparra/tallerh2s

Taller HDFS, Hadoop y Spark para el Master Profesional de Ingeniería Informática - Universidad de Granada

hadoop hdfs java map-reduce python spark wordcount

Last synced: 07 Nov 2024

https://github.com/riccardorevalor/mapreduce

Collection of exercises regarding Hadoop and MapReduce approach

hadoop hadoop-mapreduce mapreduce

Last synced: 14 Dec 2024