Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

Projects in Awesome Lists tagged with hadoop

A curated list of projects in awesome lists tagged with hadoop .

https://github.com/marcionicolau/mrappsamples

Mapreduce Sample Applications using JAVA

bigdata hadoop hadoop-mapreduce

Last synced: 27 Dec 2024

https://github.com/giantcroc/big-data

big data homework

big-data hadoop mapreduce wordcount

Last synced: 08 Jan 2025

https://github.com/hailiang-wang/hadoop-getstarted

Get started with Apache Hadoop

big-data hadoop

Last synced: 07 Jan 2025

https://github.com/paucimi/big_data_arquitectura

Integrating ElasticSearch and Hadoop

bigdata elasticsearch hadoop hive kibana ubuntu

Last synced: 08 Jan 2025

https://github.com/offthetab/vkapi-ml-dataharvester

Pipeline to harvest data via VK API for ML analysis with hadoop and spark

hadoop hdfs hive linux mariadb python requests spark sqoop

Last synced: 30 Dec 2024

https://github.com/captainirs/hadoop-yarn-k8s

A sandbox for running a Hadoop-YARN cluster on Kubernetes

hadoop kubernetes spark yarn

Last synced: 11 Jan 2025

https://github.com/shuuji3/spark-ceph-connector

🌟Spark Ceph Connector: Implementation of Hadoop Filesystem API for Ceph

apache-hadoop apache-spark ceph hadoop spark

Last synced: 27 Jan 2025

https://github.com/viveksyngh/intro-to-hadoop-and-mapreduce

My First Hadoop Map Reduce Code and Projects

bigdata hadoop mapreduce

Last synced: 11 Jan 2025

https://github.com/dexterposh/azurehdinsight

Repository housing the artifacts to deploy the Hadoop clusters on Azure for my learning.

azure hadoop hdinsight-cluster learning-by-doing spark

Last synced: 04 Jan 2025

https://github.com/mng222n/cloudapp

the code developed in cloud application exercises from the university of illinois at ubarna champage

cloud-computing counter hadoop java python

Last synced: 08 Jan 2025

https://github.com/nubisub/data.eng

Praktikum Teknologi Perekayasaan Data

apache hadoop jupyter

Last synced: 24 Dec 2024

https://github.com/baptvit/big_data

My courses and activities in Big Data

big-data hadoop hbase hive kafka mapreduce oozie pig python3 scala spark zookeeper

Last synced: 15 Jan 2025

https://github.com/hrolive/patc-big-data-analytics-bsc

Introduction to the main concepts and technologies related to Big Data and Data Analytics and its applications to real projects.

analytics bias big-data data-analysis hadoop hpc machine-learning mapreduce nosql python spark spark-streaming visualization

Last synced: 04 Jan 2025

https://github.com/lakshya-gg/yadfs

Yet Another Distributed File System

hadoop python sql

Last synced: 03 Jan 2025

https://github.com/cevheri/hadoop-mr-example-currency

Hadoop MapReduce, Read currency.txt and driver, mapper, and reducer

hadoop hadoop-filesystem hadoop-hdfs hadoop-mapreduce java maven

Last synced: 05 Jan 2025

https://github.com/maurodelazeri/hive-config

Hive mysql config

hadoop hive

Last synced: 05 Jan 2025

https://github.com/aman-dutta/case-study-accidents

Spark analysis on the accidents-data

dataframe etl hadoop python spark

Last synced: 11 Oct 2024

https://github.com/yukta026/tokyo-olympics-2021-analytics

An end-to-end ETL pipeline for analyzing and visualizing Tokyo Olympics 2021 data using Azure tools and Power BI.

azure data-engineering etl hadoop powerbi python3 spark sql

Last synced: 11 Oct 2024

https://github.com/menxit/hadoop-3.0

Docker image of hadoop:3.0

bigdata docker hadoop sparkachetipassa

Last synced: 08 Jan 2025

https://github.com/sandysanthosh/hadoop-basics

Hadoop Basics with Tabluae read data from Mysql

hadoop tabluea

Last synced: 11 Jan 2025

https://github.com/mikma03/data_streaming

All topics related to data streaming and real-time analysis

apache docker hadoop kafka kubernetes spark-streaming

Last synced: 09 Jan 2025

https://github.com/ssanthosh010303/collection-data-training

A collection of challenges exercised during data training program.

airflow apache azure azure-data-factory azure-databricks azure-logic-apps bigdata data hadoop spark

Last synced: 17 Jan 2025

https://github.com/mikma03/databases

Main purpose of this repository is to generate knowledge about databases in general view.

cassandra graphql hadoop mongodb msql neo4j newsql nosql oracle-database postgresql redis sql

Last synced: 09 Jan 2025

https://github.com/dev88jerry/cs450

Bishop's University - CS450 Elements of Big Data

big-data data-science hadoop spark

Last synced: 08 Jan 2025

https://github.com/billxsheng/mapreduce

MapReduce, Hadoop, and HDFS.

hadoop hdfs java mapreduce

Last synced: 17 Jan 2025

https://github.com/vladd12/big-data-practice

Introduce to Big Data with Hadoop

hadoop hadoop-hdfs hadoop-mapreduce pig-latin

Last synced: 28 Jan 2025

https://github.com/bishalpaudel/hadoopproductpurchaseprobability

Anticipatory customer order prediction after purchasal of item(s).

cloudera-hadoop hadoop hadoop-mapreduce java

Last synced: 06 Jan 2025

https://github.com/shahiransari/clickstream-data

Analysis On various aspects of clickstream data

analytics clickstream-data hadoop pig pig-latin

Last synced: 26 Jan 2025

https://github.com/shahiransari/twitteranalysis

Use Hive to analyse Data gathered from Twitter using Flume.

hadoop hdfs hive hiveql twitter twitter-sentiment-analysis

Last synced: 26 Jan 2025

https://github.com/shahiransari/sensor-data-

Finding the regions in which the room sensors are most needed and working properly

analysis analytics cloudera hadoop hive sensor-data

Last synced: 26 Jan 2025

https://github.com/vigneshss-07/bigdata_technologies

This repo contains all technical knowledge and implementation of big data technologies.

big-data hadoop hadoop-hdfs hbase hive hive-metastore kafka mapreduce-python pyspark spark sparksql

Last synced: 16 Jan 2025

https://github.com/alchemine/realtime-trend-pipeline

실시간 검색어에 대한 수집/분석 데이터 파이프라인

airflow docker hadoop hive kafka python selenium spark

Last synced: 16 Jan 2025

https://github.com/vasugi2003/big-data-analytics

Big Data Analytics - various operations and functions.

big-data data-science dataset googlecolab hadoop hdfs pyspark python3 sql

Last synced: 11 Jan 2025

https://github.com/ericlondon/docker-hadoop-grep

Docker Hadoop Grep

docker grep hadoop

Last synced: 12 Jan 2025

https://github.com/ansh-info/hadoop-pipeline

An end-to-end data engineering pipeline to collect, store, process, and analyze property and crime data using Hadoop, Docker, MySQL, Tailscale, and Selenium

docker docker-compose hadoop jupyter-notebook mapreduce python selenium sql tailscale

Last synced: 11 Oct 2024

https://github.com/adamatti/learnhdfs

Pet project to show how to list / create files on HDFS using java client (from outside the bigdata cluster)

big gradle groovy hadoop java jvm

Last synced: 19 Jan 2025

https://github.com/ccao-data/service-sqoop-iasworld

Service to continually import iasWorld backend data to Parquet using Apache Sqoop

docker hadoop service shell sqoop

Last synced: 14 Nov 2024

https://github.com/labex-labs/hadoop-practice-labs

[Hadoop Practice Labs] This repository collects 78 of programming scenarios (labs and challenges) for Hadoop Practice Labs. This course contains lots of labs for Hadoop, each lab is a small Hadoop project with detailed guidance and solutions. You can practice your Hadoop skills by completing thes...

awesome awesome-list challenges course education hadoop hands-on labex labs programming

Last synced: 13 Nov 2024

https://github.com/labex-labs/hadoop-practice-challenges

[Hadoop Practice Challenges] This repository collects 12 of programming scenarios (labs and challenges) for Hadoop Practice Challenges. This course contains lots of challenges for Hadoop, each challenge is a small Hadoop project with detailed instructions and solutions. You can practice your Hado...

awesome awesome-list challenges course education hadoop hands-on labex labs programming

Last synced: 13 Nov 2024

https://github.com/armahdavi/bigdata_pyspark_sales_analytics

Summarizing my big data code in python pyspark to analyze sales data with retail and walmart superstore to draw sales insights

big-data bigquery clustering dataframe hadoop k-means machine-learning pyspark pyspark-ml python spark unsupervised-learning

Last synced: 28 Dec 2024

https://github.com/mirzaim/hadoop-twitter-analysis

Hadoop MapReduce analysis of US Election 2020 Tweets.

hadoop hdfs map-reduce tweet-analysis us-election-2020

Last synced: 09 Jan 2025

https://github.com/prakhar-ff13/hadoop

This repository contains Hadoop Ecosystem Files (Code, data, readme etc...)

flume-ng hadoop hadoop-filesystem hadoop-hdfs hadoop-mapreduce hive java mapreduce-java oozie-mapreduce pig yarn yarn-hadoop-cluster

Last synced: 28 Jan 2025

https://github.com/yiyun-liang/forum-posts-analysis

MapReduce scripts for forum data analysis.

hadoop mapreduce python

Last synced: 28 Jan 2025

https://github.com/yiyun-liang/geo-ip

A web interface that requests data from search engine and displays results with AmMap.

elasticsearch hadoop jython pig

Last synced: 28 Jan 2025

https://github.com/bobergot/ott-movies-insights-to-recommendations

Analyze movie ratings and build a recommendation system using MapReduce. This project utilizes the Apriori algorithm, optimized for handling large datasets like the Netflix prize data, to provide personalized movie recommendations.

apriori-algorithm aws aws-s3 big-data cloud-computing data-mining hadoop java mapreduce movie-recommendation netflix-prize parallel-computing personalization

Last synced: 22 Jan 2025

https://github.com/bobergot/large-scale-data-processing-design-patterns

Explore essential MapReduce design patterns for big data processing! This repository includes practical implementations of patterns from the "MapReduce Design Patterns" book, complete with examples across summarization, filtering, organization, joins, and more.

bigdata bigdataanalytics cloudcomputing dataengineering dataprocessing datascience designpatterns distributedcomputing hadoop java mapreduce

Last synced: 22 Jan 2025

https://github.com/rmodi6/theory-of-database-systems

Homework files for CSE532 - Theory of Database Systems

database-queries hadoop ibm-db2 jdbc map-reduce spark spatial-database sql xpath xquery

Last synced: 11 Jan 2025

https://github.com/marco-gallegos/sqoopit

A python package that lets you sqoop into HDFS/Hive/HBase data from RDBMS using sqoop

hadoop hbase hdfs hive py python python3 sqoop sqoop-import

Last synced: 22 Jan 2025

https://github.com/kwonnayeon/hadoop-platform-and-application-framework

Practice exercises for Coursera assignments on Hadoop platform and application framework.

assignments coursera hadoop practice spark

Last synced: 13 Jan 2025

https://github.com/fahimahammed/hadoop-and-hdfs

This repository provides comprehensive documentation and a handy cheat sheet for managing Apache Hadoop 3.4.0 on Debian-based systems. Whether you're setting up a new Hadoop cluster, running MapReduce jobs, or handling HDFS operations, this repository aims to be your go-to resource for all things related to Hadoop.

ddbms dfs hadoop hdfs mapreduce

Last synced: 24 Jan 2025

https://github.com/mdaiyub/big-data-lab

This repository serves as a hub for students, researchers, and enthusiasts interested in diving deep into the realm of big data.

apache-spark hadoop openjdk

Last synced: 22 Jan 2025

https://github.com/cdarlint/hadoop-unittest

learn unit test on hadoop via mini dfs cluster

hadoop minicluster minidfscluster tutorial unittest wordcount

Last synced: 13 Jan 2025

https://github.com/cclient/mongo_hadoop_map-reduce

Hadoop引用mongodb支持包,实现MapReduce分析Mongodb数据库基础示例。spark支持mongodb后,该方法已无价值

hadoop mongodb spark

Last synced: 16 Jan 2025

https://github.com/alchemine/hadoop-docker-cluster

Hadoop based Distributed System Docker Cluster

airflow docker hadoop hive kafka spark

Last synced: 16 Jan 2025

https://github.com/christian-konrad/mapreduce-invertedindexer-example

Simplified example of an Inverted Indexer for plain text documents built on Hadoop's MapReduce framework.

example hadoop hadoop-mapreduce inverted-index mapreduce

Last synced: 23 Jan 2025

https://github.com/matchy233/distributed-system-project

☁ Batch processing Word-Letter Count application with a customed k8s scheduler

distributed-systems hadoop java k8s python scheduler spark

Last synced: 09 Jan 2025

https://github.com/cleberzumba/hadoop-in-pseudodistributed-mode

Installation and Configuration of the Big Data Environment with Hadoop and Spark

hadoop spark

Last synced: 31 Dec 2024

https://github.com/jaini-bhavsar/big-data

This repository contains project related to big data. All the projects are using real- world data of real-world problems.

amazon-ec2 apache-maven hadoop java-8 oozie

Last synced: 19 Jan 2025

https://github.com/heracliteanflux/exercises-scala

Exercises in the Scala programming language with an emphasis on big data programming and applications in Apache Hadoop and Apache Spark.

apache-hadoop apache-maven apache-spark distributed-computing distributed-file-system distributed-systems hadoop map-reduce mrjob scala spark

Last synced: 19 Jan 2025

https://github.com/divinenaman/mapreduce-matrix-multipy

A python implementation of matrix multiplication using Hadoop streaming API

hadoop hadoop-hdfs hadoop-mapreduce python

Last synced: 17 Dec 2024

https://github.com/martincastroalvarez/apache-hive-docker

Running Hive jobs using Docker

hadoop hdfs hive

Last synced: 22 Dec 2024

https://github.com/martincastroalvarez/hadoop-hdfs-kafka-docker

Running Kafka using Docker

docker hadoop hdfs kafka

Last synced: 22 Dec 2024

https://github.com/martincastroalvarez/hadoop-hdfs-spark-docker

Running Spark jobs using Docker

docker hadoop spark

Last synced: 22 Dec 2024

https://github.com/dhchenx/simplehadooptool

A tool to submit MapReduce jobs to Hadoop cluster.

client-server hadoop hadoop-api job mapreduce simple-hadoop-tool submit

Last synced: 29 Jan 2025

https://github.com/dhchenx/catla-hs

Catla for Hadoop and Spark (Catla-HS): An open-source system to support tuning MapReduce performance on Hadoop and Spark clusters.

big-data catla-hs hadoop machine-learning mapreduce parameter-search performance-tuning self-tuning-system spark visualization

Last synced: 29 Jan 2025

https://github.com/jferrl/gutemberg-analysis

Gutemberg corpus analysis with apache hadoop

analysis gutemberg hadoop java

Last synced: 19 Jan 2025

https://github.com/davidkhala/oci-datalake

Datalake & Lakehouse in OCI

big-data hadoop hive hue

Last synced: 19 Dec 2024

https://github.com/iamsushantk/zira

Zeppelin and Impala for Reporting and Analytics

analytics bigdata hadoop reporting zepplin

Last synced: 29 Jan 2025

https://github.com/billsioros/big-data

Large Scale Data Management Systems MSc. Project

big-data hadoop hdfs pyspark

Last synced: 24 Jan 2025

https://github.com/bearddan2000/dev-java-cli-maven-hbase-client

A POC for connecting to a hadoop cluster using hbase.

cli client dev hadoop hbase java maven zookeeper

Last synced: 29 Jan 2025

https://github.com/davidkhala/data-warehouse

data warehouse index

databricks hadoop teradata

Last synced: 19 Dec 2024

https://github.com/dominicluidold/ws21-introductiontobigdataprojects

A collection of mandatory exercises in "Introduction to Big Data Projects" - 1st semester master @ Vorarlberg University of Applied Sciences (FHV)

avro bigdata hadoop java map-reduce

Last synced: 29 Jan 2025

https://github.com/liuhaozzu/data-mining-algorithms

data mining algorithm -based on Hadoop-2.7.3

data-mining hadoop hadoop-mapreduce java-8

Last synced: 29 Jan 2025

https://github.com/lingumd/amazon_vine_analysis

Analysis to determine if there is any bias toward favorable reviews from Amazon Vine members in the Beauty products dataset.

analysis aws aws-rds aws-s3 etl google-colab hadoop os postgres pyspark sql

Last synced: 23 Jan 2025

https://github.com/srfrnk/spar-kube

Spark cluster deployment on a k8s cluster

hadoop k8s k8s-cluster kubernetes spar-kube spark zeppelin

Last synced: 29 Jan 2025

https://github.com/spineo/accumulo-hdfs-zookeeper

Create a storage cluster running Accumulo on HDFS and Zookeeper for node management.

accumulo accumulo-hdfs-zookeeper ansible ansible-inventory ansible-playbooks cluster hadoop hadoop-hdfs hdfs zookeeper

Last synced: 23 Jan 2025

https://github.com/hindog/grid-executor

Library for remote JVM ExecutorService with only dependency being password-less SSH -- Run clustered Hadoop/Spark jobs from IDE -- IDE-pimped Spark shell with full auto-completion!

cloud grid hadoop ide jvm spark-shell

Last synced: 20 Jan 2025

https://github.com/telefonica/testing.hadoop

Automatic launcher for hadoop-unit from Python

cdco hadoop hadoop-unit python testing

Last synced: 25 Jan 2025

https://github.com/hereismari/relatorio-pibiti-funttel-2015

Classes e scripts utilizados para realização de experimentos durante o PIBITI em 2015 no LSD/UFCG. Trabalho realizado por: Marianne Linhares, orientada por: Andrey Brito.

hadoop pibiti ufcg

Last synced: 17 Dec 2024

https://github.com/manuparra/tallerh2s

Taller HDFS, Hadoop y Spark para el Master Profesional de Ingeniería Informática - Universidad de Granada

hadoop hdfs java map-reduce python spark wordcount

Last synced: 07 Nov 2024