Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

Projects in Awesome Lists tagged with hadoop

A curated list of projects in awesome lists tagged with hadoop .

https://github.com/manuparra/tallerh2s

Taller HDFS, Hadoop y Spark para el Master Profesional de Ingeniería Informática - Universidad de Granada

hadoop hdfs java map-reduce python spark wordcount

Last synced: 07 Nov 2024

https://github.com/ixgnoy/visualize_movie_with_rating

By using Hadoop, visualization.

big-data big-data-analytics hadoop query

Last synced: 08 Dec 2024

https://github.com/mituskillologies/bigdata-ait-sep24

Programs conducted at Army Institute of Technology, Pune in training on Big Data Analytics during September 2024.

apache-hadoop apache-spark big-data big-data-analytics hadoop spark

Last synced: 17 Jan 2025

https://github.com/zmyzheng/browserassistant

Big Data & Cloud Computing project for recommendation, cluster analysis, data visualization with Hadoop and Spark deployed in auto- scaling cloud environment, youtube link:

angular big-data-analytics cloud cluster-analysis data-visualization elasticsearch flask hadoop recommendation-system spark spring-boot

Last synced: 11 Dec 2024

https://github.com/konradmalik/spark

Dockerized spark with tools

docker hadoop scala spark

Last synced: 17 Jan 2025

https://github.com/ahmed-ahmed/casscasinghelloworld

This is a hello world example for using cascading

cascading hadoop mapreduce

Last synced: 16 Dec 2024

https://github.com/elek/ozone-flekszible

Apache Hadoop Ozone deployment definitions with flekszible

flekszible hadoop ozone

Last synced: 19 Dec 2024

https://github.com/davidpissarra/ddbs-project

Tsinghua University | Distributed Database Systems | Final Project

distributed-database distributed-systems hadoop hdfs mongodb redis tkinter

Last synced: 16 Dec 2024

https://github.com/ndiplacide7/air_quality_monitor

Real-Time Air Quality Monitoring System using Django, Apache Hadoop, Apache Kafka, and AWS services.

apache-kafka aws css django hadoop html mysql python3

Last synced: 10 Dec 2024

https://github.com/raz-mon/dsp_ass2

Assignment 2 of the course 'Distributed Systems Programming' by Meni Adler. In the assignment we build an application that calculates the probabilities for any word to come after a couple of words, for ANY couple of words in the n-gram corpus (google).

aws distributed-systems ec2 emr hadoop n-gram s3

Last synced: 16 Dec 2024

https://github.com/alokjani/vagrant-hadoop-hive-spark

Single Node Hadoop, Hive and Spark project using Apache BigTop

bigtop hadoop hive spark vagrant

Last synced: 11 Dec 2024

https://github.com/zmyzheng/stack_overflow_qa_assistant

Big Data Analysis project with recommendation, cluster analysis and graph database

big-data-analytics cluster-analysis data-visualization graph-database hadoop mahout recommendation-system

Last synced: 11 Dec 2024

https://github.com/huwngnosleep/complete_lakehouse_techstack

This project implements an end-to-end techstack for a data platform, for local development.

bigdata data-lakehouse data-platform data-warehouse etl hadoop kafka lambda-architecture spark

Last synced: 11 Dec 2024

https://github.com/jewertow/mapreduce-nyc-collisions

Implementation of data processing in the MapReduce model.

airflow avro composer dataproc gcp hadoop hive mapreduce scala terraform

Last synced: 23 Dec 2024

https://github.com/nossbigg/mini-data-pipeline

A quick way to deploy a mini data pipeline

hadoop kafka python spark zookeeper

Last synced: 12 Dec 2024

https://github.com/avojak/aws-hadoop-cluster

Infrastructure and configuration-as-code for standing up a Hadoop cluster in AWS

ansible aws aws-ec2 configuration-as-code hadoop hadoop-cluster infrastructure-as-code terraform

Last synced: 12 Dec 2024

https://github.com/iwasakiyuuki/ansible-hadoop-cluster

Construct on-premises Hadoop cluster using ansible

ansible hadoop hdfs mapreduce yarn

Last synced: 13 Dec 2024

https://github.com/melinamoraiti/hadoop-text-analytics

📊 An implementation of Number of files a term appears, Maximum Term Frequency, TF-IDF calculation using Hadoop MapReduce framework.

hadoop inverted-index mapreduce term-frequency tf-idf

Last synced: 10 Jan 2025

https://github.com/rainbowatcher/notes

计算机相关内容的学习笔记

cs flink hadoop java rust

Last synced: 23 Dec 2024

https://github.com/waynejz/comp9313-19t2

COMP9313 Big Data Management 2019T2

big-data hadoop java mapreduce

Last synced: 18 Dec 2024

https://github.com/misterzurg/stepik_vk_hadoop

📓 Solutions to Stepik "Hadoop. Система для обработки больших объемов данных" course

hadoop stepik vk vk-education vkteam

Last synced: 18 Dec 2024

https://github.com/brynlai/data-engineering-assignment-rdsy2s2

This repository contains a data engineering project aimed at processing and analyzing scraped data using PySpark, Redis, and Neo4j. The goal is to efficiently store, process, and analyze text data.

data-engineering gemini-ai google hadoop kafka neo4j pyspark redis

Last synced: 19 Dec 2024

https://github.com/oguzhanfatihkucuk/data-analytics-project-kafka-spark

The data in this project was collected in a database using Apache Kafka and processed with Apache Spark Streaming. The project aims to create a forecasting model and analyze sales forecasts per customer.

big-data data data-visualization hadoop kafka ml mlpipeline plt pyhton spark

Last synced: 25 Dec 2024

https://github.com/amirhnajafiz-university/s7cc03

Third project of Cloud Computing course.

big-data hadoop hadoop-hdfs mapreduce python python3 spark

Last synced: 26 Dec 2024

https://github.com/iwasakiyuuki/data-analysis-platform-infra

Construct on-premises Hadoop cluster using ansible

ansible hadoop hdfs mapreduce yarn

Last synced: 26 Dec 2024

https://github.com/pawsanie/pyspark_universal_dq_report

The script reads the dataset along the path and selects the columns in it received from the argument for the specified dates. Then it saves the report to the specified path of HDFS.

data-quality data-quality-checks data-quality-monitoring dq hadoop hadoop-hdfs hdfs pyspark python python-3 python-script python3

Last synced: 02 Jan 2025

https://github.com/vagnerbellacosa/030_criandoumecossistemahadooptotalmentegerenciadocomgoogleclouddataproc

Sua missão será criar um ecossistema de Big Data usando o Google Cloud Platform (GCP). Para isso, o expert te ensinará a configurar o Google Cloud Dataproc, um Hadoop totalmente gerenciado, usando seus créditos gratuitos da GCP.

digital-innovation-one dio gcp google-cloud-dataproc google-cloud-platform hadoop labs

Last synced: 03 Jan 2025

https://github.com/fblupi/master_informatica-ccsa

Repositorio de la asignatura Cloud Computing: Servicios y Aplicaciones del Máster de Ingeniería Informática de la UGR

cloud-computing containers data-science docker hadoop mahout map-reduce mapreduce mongodb opennebula virtual-machine

Last synced: 30 Jan 2025

https://github.com/rurumimic/apache

apache on k8s

apache hadoop hive kubernetes

Last synced: 03 Jan 2025

https://github.com/attomos/yarnlog

:yarn: Download Apache Hadoop YARN log to your local machine.

apache-hadoop-yarn command-line-tool hadoop resource-manager

Last synced: 23 Jan 2025

https://github.com/ramitsurana/emr-ml

AWS EMR Info including Hadoop, Map Reduce and Hive along with Machine Learning

emr hadoop map-reduce

Last synced: 03 Jan 2025

https://github.com/mobiletelesystems/hadoop-docker

Docker image with Hadoop cluster

docker-compose-template docker-image hadoop

Last synced: 17 Jan 2025

https://github.com/tomwhite/gvcf-hbase

Genomic variants in HBase

bioinformatics genomics hadoop hbase ngs

Last synced: 17 Jan 2025

https://github.com/ekane3/MapReduce

A project displaying examples of MapReduce jobs, using the "Remarkable Trees of Paris" dataset (https://opendata.paris.fr/explore/dataset/arbresremarquablesparis/information/,

hadoop hdfs-dfs java mapreduce shell yarn

Last synced: 24 Oct 2024

https://github.com/rcarvalho16/hadoopbasketballpossession

A Hadoop MapReduce project for analyzing basketball game footage, extracting video frames, and determining ball possession times for teams and players using OpenCV and YOLO object detection.

hadoop mapreduce opencv yolo

Last synced: 23 Jan 2025

https://github.com/nagpritam/identification-of-trucks-and-potential-risky-driver-using-databricks-spark-api-

The project intended to identify trucks based on their model, fuel consumption, driving behaviors and past records of violations/accidents

databricks hadoop hive powerbi python3 spark

Last synced: 12 Oct 2024

https://github.com/josericodata/mscdataanalyticssecondsemesterassignmentone

Summary of Assignment One from the Second semester of the MSc in Data Analytics program. This repository contains the CA1 assignment guidelines from the college and my submission. To see all original commits and progress, please visit the original repository using the link below.

advanced-data-analysis big-data big-data-storage-and-processing cct-college cnn-keras data-science dropout-layers dublin hadoop ireland jose-maria-rico-leal jose-rico jupyter-notebook machine-learning msc mysql neural-network rdbms spark ubuntu-linux

Last synced: 17 Jan 2025

https://github.com/kambojankit/hadoop-docker-cluster

A Project to provide a complete docker based Hadoop Environment

cluster docker hadoop

Last synced: 11 Jan 2025

https://github.com/ibrahimghali/hadoop_ha

This repository showcases a Hadoop cluster setup with High Availability (HA) using ZooKeeper for automatic failover between NameNodes. It ensures minimal downtime and enhanced fault tolerance, providing a reliable framework for large-scale data storage and processing. Configuration details for both Hadoop and ZooKeeper are included.

big-data hadoop highavailability zookeeper

Last synced: 11 Jan 2025

https://github.com/ansh-info/Hadoop-Pipeline

An end-to-end data engineering pipeline to collect, store, process, and analyze property and crime data using Hadoop, Docker, MySQL, Tailscale, and Selenium

docker docker-compose hadoop jupyter-notebook mapreduce python selenium sql tailscale

Last synced: 21 Jan 2025

https://github.com/jmkim/10.1007-978-981-10-4154-9_60

Improving the B+-Tree Construction for Transaction Log Data in Bank System Using Hadoop

b-tree big-data bplustree hadoop mapreduce

Last synced: 26 Jan 2025

https://github.com/jmkim/hadoopscripts

Some useful scripts for Apache Hadoop Cluster Setup

bash-script hadoop

Last synced: 26 Jan 2025

https://github.com/vinetos/giraph-lab

A starter project for Hadoop and Giraph with Maven and Docker

apache docker giraph hacktoberfest hadoop hadoop-yarn

Last synced: 24 Jan 2025

https://github.com/limdongjin/cse4100_sg

시스템프로그래밍 프로젝트

hadoop machine-learning matplotlib python

Last synced: 12 Jan 2025

https://github.com/ericlondon/map-reduce-20160121

Hadoop, Pig, Ruby, Map/Reduce, on OSX via Homebrew

hadoop mapreduce osx ruby

Last synced: 12 Jan 2025

https://github.com/kemalcanbora/cloudera_documents_turkish

The purpose of this documentation is to publish the Turkish explanations of some tools on the big data.

cloudera hadoop hbase hdfs hive turkish

Last synced: 24 Jan 2025

https://github.com/aleskandro/r-hadoop-madreduce-examples

A lot of examples about using R with hadoop for MapReduce with and without libraries as rhadoop/rhipe - [email protected] - Advanced Programming Languages

data-analysis hadoop mapreduce r

Last synced: 27 Dec 2024

https://github.com/reemadutta/bigdata_project_airlines_data_analysis

Analysis on airlines data using MapReduce on Hadoop, PIG and HIVE

big-data hadoop hive java-8 mapreduce pig-latin

Last synced: 19 Jan 2025

https://github.com/getblitzed/magnetize-recommendations

Recommendations and personalization service

docker hadoop magnetize personalization python3

Last synced: 26 Jan 2025

https://github.com/joshuawscott/hadoop-psuedodistributed-docker

Single Docker image for Hadoop psuedodistributed mode

docker hadoop

Last synced: 19 Jan 2025

https://github.com/iulianoroberto/mapreducebasicapplications

Basic MapReduce applications in Java.

hadoop hdfs java mapreduce mapreduce-java

Last synced: 24 Jan 2025

https://github.com/iulianoroberto/stormtopology

Java implementation of a simple Storm topology.

hadoop hdfs storm storm-topology stormworks topology

Last synced: 24 Jan 2025

https://github.com/getblitzed/magnetize-emr

AWS Elastic Map Reduce

aws hadoop infra magnetize terraform

Last synced: 26 Jan 2025

https://github.com/montybechir/redblacktreemapreduce

A hadoop project that is able to handle very large data sets and construct a red black tree. A script is available to automate iterative map reduce jobs.

data-structures distributed-computing hadoop mapreduce python redblacktree scripting

Last synced: 19 Jan 2025

https://github.com/dynamicheart/DTSS

Distributed Transaction Settlement System

hadoop kafka se347 spark zookeeper

Last synced: 08 Nov 2024

https://github.com/kriss024/hadoop

Hadoop and Hive fundamental commands

hadoop hadoop-filesystem hadoop-hdfs hive

Last synced: 25 Jan 2025

https://github.com/dimajix/hadoop-training

Source Code for Hadoop Training

hadoop hadoop-training spark

Last synced: 05 Jan 2025

https://github.com/galaxy092/samsung-innovation-campus-big-data-capstone-project

Samsung Innovation Campus Big Data Capstone Project - Weather Prediction

hadoop jupyter-notebook pandas pyspark scikit-learn sparksql

Last synced: 01 Feb 2025

https://github.com/shortthirdman/apache-hadoop-nativelib

Apache Hadoop NativeLib Build for 64-bit (x86_64)

apache-hadoop hadoop hadoop-hdfs hadoop-mapreduce hadoop-nativelib

Last synced: 20 Jan 2025