Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
Projects in Awesome Lists tagged with hadoop
A curated list of projects in awesome lists tagged with hadoop .
https://github.com/manuparra/tallerh2s
Taller HDFS, Hadoop y Spark para el Master Profesional de Ingeniería Informática - Universidad de Granada
hadoop hdfs java map-reduce python spark wordcount
Last synced: 07 Nov 2024
https://github.com/ixgnoy/visualize_movie_with_rating
By using Hadoop, visualization.
big-data big-data-analytics hadoop query
Last synced: 08 Dec 2024
https://github.com/mituskillologies/bigdata-ait-sep24
Programs conducted at Army Institute of Technology, Pune in training on Big Data Analytics during September 2024.
apache-hadoop apache-spark big-data big-data-analytics hadoop spark
Last synced: 17 Jan 2025
https://github.com/gmartinezramirez-old/practice-hadoop
[Study] Daily plan for practice Hadoop.
big-data hadoop hadoop-cluster hadoop-mini-clusters hive java mapreduce
Last synced: 01 Feb 2025
https://github.com/zmyzheng/browserassistant
Big Data & Cloud Computing project for recommendation, cluster analysis, data visualization with Hadoop and Spark deployed in auto- scaling cloud environment, youtube link:
angular big-data-analytics cloud cluster-analysis data-visualization elasticsearch flask hadoop recommendation-system spark spring-boot
Last synced: 11 Dec 2024
https://github.com/jhleeeme/fake-log-collector
Fake Log Collection & Visualization
cpp data-pipeline docker docker-compose grafana hadoop hdfs influxdb ipc java kafka kafka-streams message-queue python3 telegraf visualization
Last synced: 17 Jan 2025
https://github.com/ahmed-ahmed/casscasinghelloworld
This is a hello world example for using cascading
Last synced: 16 Dec 2024
https://github.com/elek/ozone-flekszible
Apache Hadoop Ozone deployment definitions with flekszible
Last synced: 19 Dec 2024
https://github.com/tejanhu/map-reduce_twitter
Map Reduce
big-data bigdata hadoop hadoop-mapreduce java twitter
Last synced: 09 Dec 2024
https://github.com/wanjinyoo/advanced_database
Hadoop,Spark,Map reduce
advanced-database hadoop mapreduce sparkjava
Last synced: 17 Jan 2025
https://github.com/davidpissarra/ddbs-project
Tsinghua University | Distributed Database Systems | Final Project
distributed-database distributed-systems hadoop hdfs mongodb redis tkinter
Last synced: 16 Dec 2024
https://github.com/yosrak5/datapipeline_hdfs_kafka
aws data-engineering docker etl-pipeline hadoop hdfs json kafka python
Last synced: 10 Dec 2024
https://github.com/ndiplacide7/air_quality_monitor
Real-Time Air Quality Monitoring System using Django, Apache Hadoop, Apache Kafka, and AWS services.
apache-kafka aws css django hadoop html mysql python3
Last synced: 10 Dec 2024
https://github.com/raz-mon/dsp_ass2
Assignment 2 of the course 'Distributed Systems Programming' by Meni Adler. In the assignment we build an application that calculates the probabilities for any word to come after a couple of words, for ANY couple of words in the n-gram corpus (google).
aws distributed-systems ec2 emr hadoop n-gram s3
Last synced: 16 Dec 2024
https://github.com/zmyzheng/stack_overflow_qa_assistant
Big Data Analysis project with recommendation, cluster analysis and graph database
big-data-analytics cluster-analysis data-visualization graph-database hadoop mahout recommendation-system
Last synced: 11 Dec 2024
https://github.com/huwngnosleep/complete_lakehouse_techstack
This project implements an end-to-end techstack for a data platform, for local development.
bigdata data-lakehouse data-platform data-warehouse etl hadoop kafka lambda-architecture spark
Last synced: 11 Dec 2024
https://github.com/rlenferink/bdph-apache-flink
apache apache-storm hadoop storm twitter-stream
Last synced: 17 Dec 2024
https://github.com/avojak/aws-hadoop-cluster
Infrastructure and configuration-as-code for standing up a Hadoop cluster in AWS
ansible aws aws-ec2 configuration-as-code hadoop hadoop-cluster infrastructure-as-code terraform
Last synced: 12 Dec 2024
https://github.com/melinamoraiti/hadoop-text-analytics
📊 An implementation of Number of files a term appears, Maximum Term Frequency, TF-IDF calculation using Hadoop MapReduce framework.
hadoop inverted-index mapreduce term-frequency tf-idf
Last synced: 10 Jan 2025
https://github.com/spineo/hadoop-app
ansible ansible-inventory ansible-playbook hadoop hadoop-cluster hadoop-hdfs hadoop-mapreduce hdfs yarn
Last synced: 23 Jan 2025
https://github.com/waynejz/comp9313-19t2
COMP9313 Big Data Management 2019T2
big-data hadoop java mapreduce
Last synced: 18 Dec 2024
https://github.com/misterzurg/stepik_vk_hadoop
📓 Solutions to Stepik "Hadoop. Система для обработки больших объемов данных" course
hadoop stepik vk vk-education vkteam
Last synced: 18 Dec 2024
https://github.com/brynlai/data-engineering-assignment-rdsy2s2
This repository contains a data engineering project aimed at processing and analyzing scraped data using PySpark, Redis, and Neo4j. The goal is to efficiently store, process, and analyze text data.
data-engineering gemini-ai google hadoop kafka neo4j pyspark redis
Last synced: 19 Dec 2024
https://github.com/oguzhanfatihkucuk/data-analytics-project-kafka-spark
The data in this project was collected in a database using Apache Kafka and processed with Apache Spark Streaming. The project aims to create a forecasting model and analyze sales forecasts per customer.
big-data data data-visualization hadoop kafka ml mlpipeline plt pyhton spark
Last synced: 25 Dec 2024
https://github.com/imdeepanshugpt/hadoop
Hadoop-Cluster
docker docker-compose docker-container docker-image hadoop hadoop-cluster hadoop-docker hadoop-filesystem hadoop-framework hadoop-mapreduce hadoop-streaming
Last synced: 23 Jan 2025
https://github.com/amirhnajafiz-university/s7cc03
Third project of Cloud Computing course.
big-data hadoop hadoop-hdfs mapreduce python python3 spark
Last synced: 26 Dec 2024
https://github.com/pawsanie/pyspark_universal_dq_report
The script reads the dataset along the path and selects the columns in it received from the argument for the specified dates. Then it saves the report to the specified path of HDFS.
data-quality data-quality-checks data-quality-monitoring dq hadoop hadoop-hdfs hdfs pyspark python python-3 python-script python3
Last synced: 02 Jan 2025
https://github.com/vagnerbellacosa/030_criandoumecossistemahadooptotalmentegerenciadocomgoogleclouddataproc
Sua missão será criar um ecossistema de Big Data usando o Google Cloud Platform (GCP). Para isso, o expert te ensinará a configurar o Google Cloud Dataproc, um Hadoop totalmente gerenciado, usando seus créditos gratuitos da GCP.
digital-innovation-one dio gcp google-cloud-dataproc google-cloud-platform hadoop labs
Last synced: 03 Jan 2025
https://github.com/fblupi/master_informatica-ccsa
Repositorio de la asignatura Cloud Computing: Servicios y Aplicaciones del Máster de Ingeniería Informática de la UGR
cloud-computing containers data-science docker hadoop mahout map-reduce mapreduce mongodb opennebula virtual-machine
Last synced: 30 Jan 2025
https://github.com/attomos/yarnlog
:yarn: Download Apache Hadoop YARN log to your local machine.
apache-hadoop-yarn command-line-tool hadoop resource-manager
Last synced: 23 Jan 2025
https://github.com/ramitsurana/emr-ml
AWS EMR Info including Hadoop, Map Reduce and Hive along with Machine Learning
Last synced: 03 Jan 2025
https://github.com/mobiletelesystems/hadoop-docker
Docker image with Hadoop cluster
docker-compose-template docker-image hadoop
Last synced: 17 Jan 2025
https://github.com/tomwhite/gvcf-hbase
Genomic variants in HBase
bioinformatics genomics hadoop hbase ngs
Last synced: 17 Jan 2025
https://github.com/rcarvalho16/hadoopbasketballpossession
A Hadoop MapReduce project for analyzing basketball game footage, extracting video frames, and determining ball possession times for teams and players using OpenCV and YOLO object detection.
Last synced: 23 Jan 2025
https://github.com/nagpritam/identification-of-trucks-and-potential-risky-driver-using-databricks-spark-api-
The project intended to identify trucks based on their model, fuel consumption, driving behaviors and past records of violations/accidents
databricks hadoop hive powerbi python3 spark
Last synced: 12 Oct 2024
https://github.com/nihadguluzade/tweetanalyzer-mapreduce
Tweet analyzer using MapReduce.
hadoop hadoop-mapreduce javafx mapreduce tweet-analysis
Last synced: 04 Jan 2025
https://github.com/josericodata/mscdataanalyticssecondsemesterassignmentone
Summary of Assignment One from the Second semester of the MSc in Data Analytics program. This repository contains the CA1 assignment guidelines from the college and my submission. To see all original commits and progress, please visit the original repository using the link below.
advanced-data-analysis big-data big-data-storage-and-processing cct-college cnn-keras data-science dropout-layers dublin hadoop ireland jose-maria-rico-leal jose-rico jupyter-notebook machine-learning msc mysql neural-network rdbms spark ubuntu-linux
Last synced: 17 Jan 2025
https://github.com/kambojankit/hadoop-docker-cluster
A Project to provide a complete docker based Hadoop Environment
Last synced: 11 Jan 2025
https://github.com/ibrahimghali/hadoop_ha
This repository showcases a Hadoop cluster setup with High Availability (HA) using ZooKeeper for automatic failover between NameNodes. It ensures minimal downtime and enhanced fault tolerance, providing a reliable framework for large-scale data storage and processing. Configuration details for both Hadoop and ZooKeeper are included.
big-data hadoop highavailability zookeeper
Last synced: 11 Jan 2025
https://github.com/ansh-info/Hadoop-Pipeline
An end-to-end data engineering pipeline to collect, store, process, and analyze property and crime data using Hadoop, Docker, MySQL, Tailscale, and Selenium
docker docker-compose hadoop jupyter-notebook mapreduce python selenium sql tailscale
Last synced: 21 Jan 2025
https://github.com/vasugi2003/house-price-prediction-using-pyspark---big-data-analytics
House Price Prediction using Pyspark
algorithms big-data-analytics csv hadoop pyspark python
Last synced: 11 Jan 2025
https://github.com/aissam-en/hadoop-installation-on-windows-7-10
This guid will help to install Hadoop on Windows (7)
hadoop hadoop-installation hadoop-windows windows10 windows7 windows7-hadoop
Last synced: 11 Jan 2025
https://github.com/jmkim/hadoopscripts
Some useful scripts for Apache Hadoop Cluster Setup
Last synced: 26 Jan 2025
https://github.com/vinetos/giraph-lab
A starter project for Hadoop and Giraph with Maven and Docker
apache docker giraph hacktoberfest hadoop hadoop-yarn
Last synced: 24 Jan 2025
https://github.com/limdongjin/cse4100_sg
시스템프로그래밍 프로젝트
hadoop machine-learning matplotlib python
Last synced: 12 Jan 2025
https://github.com/ericlondon/map-reduce-20160121
Hadoop, Pig, Ruby, Map/Reduce, on OSX via Homebrew
Last synced: 12 Jan 2025
https://github.com/rupeshtr78/blog
Big Data Spark Hadoop Kafka Flink Spark Streaming
aws bigdata cassandra elasticsearch emr-cluster flink hadoop hive hue kafka mapreduce mongodb oozie spark sparkstreaming yarn
Last synced: 12 Jan 2025
https://github.com/aleskandro/r-hadoop-madreduce-examples
A lot of examples about using R with hadoop for MapReduce with and without libraries as rhadoop/rhipe - [email protected] - Advanced Programming Languages
data-analysis hadoop mapreduce r
Last synced: 27 Dec 2024
https://github.com/getblitzed/magnetize-recommendations
Recommendations and personalization service
docker hadoop magnetize personalization python3
Last synced: 26 Jan 2025
https://github.com/joshuawscott/hadoop-psuedodistributed-docker
Single Docker image for Hadoop psuedodistributed mode
Last synced: 19 Jan 2025
https://github.com/iulianoroberto/mapreducebasicapplications
Basic MapReduce applications in Java.
hadoop hdfs java mapreduce mapreduce-java
Last synced: 24 Jan 2025
https://github.com/iulianoroberto/stormtopology
Java implementation of a simple Storm topology.
hadoop hdfs storm storm-topology stormworks topology
Last synced: 24 Jan 2025
https://github.com/grctest/grc-magnitude-mapreduce-hadoop
GRC Magnitude MapReduce Hadoop
boinc gridcoin hadoop hdfs magnitude mapreduce no-team-req statistics
Last synced: 31 Jan 2025
https://github.com/montybechir/redblacktreemapreduce
A hadoop project that is able to handle very large data sets and construct a red black tree. A script is available to automate iterative map reduce jobs.
data-structures distributed-computing hadoop mapreduce python redblacktree scripting
Last synced: 19 Jan 2025
https://github.com/kriss024/hadoop
Hadoop and Hive fundamental commands
hadoop hadoop-filesystem hadoop-hdfs hive
Last synced: 25 Jan 2025
https://github.com/galaxy092/samsung-innovation-campus-big-data-capstone-project
Samsung Innovation Campus Big Data Capstone Project - Weather Prediction
hadoop jupyter-notebook pandas pyspark scikit-learn sparksql
Last synced: 01 Feb 2025
https://github.com/shortthirdman/apache-hadoop-nativelib
Apache Hadoop NativeLib Build for 64-bit (x86_64)
apache-hadoop hadoop hadoop-hdfs hadoop-mapreduce hadoop-nativelib
Last synced: 20 Jan 2025