Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
Projects in Awesome Lists tagged with hadoop
A curated list of projects in awesome lists tagged with hadoop .
https://github.com/marcionicolau/mrappsamples
Mapreduce Sample Applications using JAVA
bigdata hadoop hadoop-mapreduce
Last synced: 27 Dec 2024
https://github.com/giantcroc/big-data
big data homework
big-data hadoop mapreduce wordcount
Last synced: 08 Jan 2025
https://github.com/hailiang-wang/hadoop-getstarted
Get started with Apache Hadoop
Last synced: 07 Jan 2025
https://github.com/paucimi/big_data_arquitectura
Integrating ElasticSearch and Hadoop
bigdata elasticsearch hadoop hive kibana ubuntu
Last synced: 08 Jan 2025
https://github.com/captainirs/hadoop-yarn-k8s
A sandbox for running a Hadoop-YARN cluster on Kubernetes
Last synced: 11 Jan 2025
https://github.com/shuuji3/spark-ceph-connector
🌟Spark Ceph Connector: Implementation of Hadoop Filesystem API for Ceph
apache-hadoop apache-spark ceph hadoop spark
Last synced: 27 Jan 2025
https://github.com/viveksyngh/intro-to-hadoop-and-mapreduce
My First Hadoop Map Reduce Code and Projects
Last synced: 11 Jan 2025
https://github.com/dexterposh/azurehdinsight
Repository housing the artifacts to deploy the Hadoop clusters on Azure for my learning.
azure hadoop hdinsight-cluster learning-by-doing spark
Last synced: 04 Jan 2025
https://github.com/mng222n/cloudapp
the code developed in cloud application exercises from the university of illinois at ubarna champage
cloud-computing counter hadoop java python
Last synced: 08 Jan 2025
https://github.com/hrolive/patc-big-data-analytics-bsc
Introduction to the main concepts and technologies related to Big Data and Data Analytics and its applications to real projects.
analytics bias big-data data-analysis hadoop hpc machine-learning mapreduce nosql python spark spark-streaming visualization
Last synced: 04 Jan 2025
https://github.com/hishidama/embulk-parser-hadoop-seqfile
Hadoop SequenceFile parser plugin for Embulk
embulk-parser-plugin embulk-plugin hadoop java-8 sequencefile
Last synced: 07 Jan 2025
https://github.com/cevheri/hadoop.3-config
My Apache Hadoop 3 config files.
hadoop hadoop-conf hadoop-core hadoop-filesystem hadoop-hdfs hadoop-mapreduce linux-bash pom-xml
Last synced: 05 Jan 2025
https://github.com/cevheri/hadoop-mr-example-currency
Hadoop MapReduce, Read currency.txt and driver, mapper, and reducer
hadoop hadoop-filesystem hadoop-hdfs hadoop-mapreduce java maven
Last synced: 05 Jan 2025
https://github.com/ckongala/sparkpythonbigdata
Big-Data with Apache Spark and Python.
apache apache-spark big-data data-frames graphx hadoop hadoop-yarn mapreduce mllib pyspark spark-streaming structured-streaming
Last synced: 12 Oct 2024
https://github.com/yukta026/tokyo-olympics-2021-analytics
An end-to-end ETL pipeline for analyzing and visualizing Tokyo Olympics 2021 data using Azure tools and Power BI.
azure data-engineering etl hadoop powerbi python3 spark sql
Last synced: 11 Oct 2024
https://github.com/menxit/hadoop-3.0
Docker image of hadoop:3.0
bigdata docker hadoop sparkachetipassa
Last synced: 08 Jan 2025
https://github.com/sandysanthosh/hadoop-basics
Hadoop Basics with Tabluae read data from Mysql
Last synced: 11 Jan 2025
https://github.com/mikma03/data_streaming
All topics related to data streaming and real-time analysis
apache docker hadoop kafka kubernetes spark-streaming
Last synced: 09 Jan 2025
https://github.com/ssanthosh010303/collection-data-training
A collection of challenges exercised during data training program.
airflow apache azure azure-data-factory azure-databricks azure-logic-apps bigdata data hadoop spark
Last synced: 17 Jan 2025
https://github.com/mikma03/databases
Main purpose of this repository is to generate knowledge about databases in general view.
cassandra graphql hadoop mongodb msql neo4j newsql nosql oracle-database postgresql redis sql
Last synced: 09 Jan 2025
https://github.com/dev88jerry/cs450
Bishop's University - CS450 Elements of Big Data
big-data data-science hadoop spark
Last synced: 08 Jan 2025
https://github.com/jinsyin/flink-handbook
《Flink 学习指南》
apache-flink bigdata flink flink-handbook hadoop
Last synced: 16 Jan 2025
https://github.com/vladd12/big-data-practice
Introduce to Big Data with Hadoop
hadoop hadoop-hdfs hadoop-mapreduce pig-latin
Last synced: 28 Jan 2025
https://github.com/piotr-kalanski/big-data-dev-environment
Big Data development environment
ansible big-data elasticsearch flink hadoop kafka kibana mysql spark virtual-machine
Last synced: 15 Dec 2024
https://github.com/bishalpaudel/hadoopproductpurchaseprobability
Anticipatory customer order prediction after purchasal of item(s).
cloudera-hadoop hadoop hadoop-mapreduce java
Last synced: 06 Jan 2025
https://github.com/shahiransari/clickstream-data
Analysis On various aspects of clickstream data
analytics clickstream-data hadoop pig pig-latin
Last synced: 26 Jan 2025
https://github.com/shahiransari/twitteranalysis
Use Hive to analyse Data gathered from Twitter using Flume.
hadoop hdfs hive hiveql twitter twitter-sentiment-analysis
Last synced: 26 Jan 2025
https://github.com/shahiransari/sensor-data-
Finding the regions in which the room sensors are most needed and working properly
analysis analytics cloudera hadoop hive sensor-data
Last synced: 26 Jan 2025
https://github.com/pathak-ashutosh/sentiment-analysis-yelp-reviews
Perform sentiment analysis on Yelp dataset with Apache Spark
apache-spark big-data data-engineering data-pipeline data-visualization hadoop hdfs natural-language-processing pyspark sentiment-analysis spark-mllib spark-nlp spark-sql
Last synced: 17 Jan 2025
https://github.com/vigneshss-07/bigdata_technologies
This repo contains all technical knowledge and implementation of big data technologies.
big-data hadoop hadoop-hdfs hbase hive hive-metastore kafka mapreduce-python pyspark spark sparksql
Last synced: 16 Jan 2025
https://github.com/vasugi2003/big-data-analytics
Big Data Analytics - various operations and functions.
big-data data-science dataset googlecolab hadoop hdfs pyspark python3 sql
Last synced: 11 Jan 2025
https://github.com/ericlondon/docker-hadoop-streaming-scala
Docker Hadoop Streaming Scala
docker hadoop hdfs map-reduce scala streaming
Last synced: 12 Jan 2025
https://github.com/ericlondon/spark-csv-to-elasticsearch
Spark CSV to Elasticsearch
apache csv docker elasticsearch export hadoop spark
Last synced: 12 Jan 2025
https://github.com/ansh-info/hadoop-pipeline
An end-to-end data engineering pipeline to collect, store, process, and analyze property and crime data using Hadoop, Docker, MySQL, Tailscale, and Selenium
docker docker-compose hadoop jupyter-notebook mapreduce python selenium sql tailscale
Last synced: 11 Oct 2024
https://github.com/labex-labs/hadoop-practice-labs
[Hadoop Practice Labs] This repository collects 78 of programming scenarios (labs and challenges) for Hadoop Practice Labs. This course contains lots of labs for Hadoop, each lab is a small Hadoop project with detailed guidance and solutions. You can practice your Hadoop skills by completing thes...
awesome awesome-list challenges course education hadoop hands-on labex labs programming
Last synced: 13 Nov 2024
https://github.com/labex-labs/hadoop-practice-challenges
[Hadoop Practice Challenges] This repository collects 12 of programming scenarios (labs and challenges) for Hadoop Practice Challenges. This course contains lots of challenges for Hadoop, each challenge is a small Hadoop project with detailed instructions and solutions. You can practice your Hado...
awesome awesome-list challenges course education hadoop hands-on labex labs programming
Last synced: 13 Nov 2024
https://github.com/nikitaeverywhere/hadoop-network-of-keywords
Keywords network builder based on TF-IDF with the use of Hadoop platform
cloudera cloudera-hadoop document-frequency hadoop hadoop-platform keywords-builder mapreduce term-frequency tf-idf
Last synced: 28 Jan 2025
https://github.com/armahdavi/bigdata_pyspark_sales_analytics
Summarizing my big data code in python pyspark to analyze sales data with retail and walmart superstore to draw sales insights
big-data bigquery clustering dataframe hadoop k-means machine-learning pyspark pyspark-ml python spark unsupervised-learning
Last synced: 28 Dec 2024
https://github.com/mirzaim/hadoop-twitter-analysis
Hadoop MapReduce analysis of US Election 2020 Tweets.
hadoop hdfs map-reduce tweet-analysis us-election-2020
Last synced: 09 Jan 2025
https://github.com/prakhar-ff13/hadoop
This repository contains Hadoop Ecosystem Files (Code, data, readme etc...)
flume-ng hadoop hadoop-filesystem hadoop-hdfs hadoop-mapreduce hive java mapreduce-java oozie-mapreduce pig yarn yarn-hadoop-cluster
Last synced: 28 Jan 2025
https://github.com/yiyun-liang/forum-posts-analysis
MapReduce scripts for forum data analysis.
Last synced: 28 Jan 2025
https://github.com/yiyun-liang/geo-ip
A web interface that requests data from search engine and displays results with AmMap.
elasticsearch hadoop jython pig
Last synced: 28 Jan 2025
https://github.com/bobergot/ott-movies-insights-to-recommendations
Analyze movie ratings and build a recommendation system using MapReduce. This project utilizes the Apriori algorithm, optimized for handling large datasets like the Netflix prize data, to provide personalized movie recommendations.
apriori-algorithm aws aws-s3 big-data cloud-computing data-mining hadoop java mapreduce movie-recommendation netflix-prize parallel-computing personalization
Last synced: 22 Jan 2025
https://github.com/bobergot/large-scale-data-processing-design-patterns
Explore essential MapReduce design patterns for big data processing! This repository includes practical implementations of patterns from the "MapReduce Design Patterns" book, complete with examples across summarization, filtering, organization, joins, and more.
bigdata bigdataanalytics cloudcomputing dataengineering dataprocessing datascience designpatterns distributedcomputing hadoop java mapreduce
Last synced: 22 Jan 2025
https://github.com/rmodi6/theory-of-database-systems
Homework files for CSE532 - Theory of Database Systems
database-queries hadoop ibm-db2 jdbc map-reduce spark spatial-database sql xpath xquery
Last synced: 11 Jan 2025
https://github.com/marco-gallegos/sqoopit
A python package that lets you sqoop into HDFS/Hive/HBase data from RDBMS using sqoop
hadoop hbase hdfs hive py python python3 sqoop sqoop-import
Last synced: 22 Jan 2025
https://github.com/kwonnayeon/hadoop-platform-and-application-framework
Practice exercises for Coursera assignments on Hadoop platform and application framework.
assignments coursera hadoop practice spark
Last synced: 13 Jan 2025
https://github.com/fahimahammed/hadoop-and-hdfs
This repository provides comprehensive documentation and a handy cheat sheet for managing Apache Hadoop 3.4.0 on Debian-based systems. Whether you're setting up a new Hadoop cluster, running MapReduce jobs, or handling HDFS operations, this repository aims to be your go-to resource for all things related to Hadoop.
ddbms dfs hadoop hdfs mapreduce
Last synced: 24 Jan 2025
https://github.com/mdaiyub/big-data-lab
This repository serves as a hub for students, researchers, and enthusiasts interested in diving deep into the realm of big data.
Last synced: 22 Jan 2025
https://github.com/cdarlint/hadoop-unittest
learn unit test on hadoop via mini dfs cluster
hadoop minicluster minidfscluster tutorial unittest wordcount
Last synced: 13 Jan 2025
https://github.com/cclient/mongo_hadoop_map-reduce
Hadoop引用mongodb支持包,实现MapReduce分析Mongodb数据库基础示例。spark支持mongodb后,该方法已无价值
Last synced: 16 Jan 2025
https://github.com/hackolade/avro
Hackolade(https://hackolade.com) plugin for Avro
avro avro-schema confluent confluent-cloud confluent-kafka data-modeling data-models entity-relationship-diagram er-diagram hadoop kafka nosql nosql-database schema-design schema-registry
Last synced: 17 Nov 2024
https://github.com/hackolade/parquet
Hackolade plugin for Apache Parquet schema
columnar-storage data-modeling data-models entity-relationship-diagram er-diagram hadoop nosql parquet parquet-schema schema-design
Last synced: 17 Nov 2024
https://github.com/hackolade/hive
Hackolade plugin for Apache Hive
data-modeling data-models entity-relationship-diagram er-diagram hadoop hive nosql nosql-databases schema-design
Last synced: 17 Nov 2024
https://github.com/christian-konrad/mapreduce-invertedindexer-example
Simplified example of an Inverted Indexer for plain text documents built on Hadoop's MapReduce framework.
example hadoop hadoop-mapreduce inverted-index mapreduce
Last synced: 23 Jan 2025
https://github.com/matchy233/distributed-system-project
☁ Batch processing Word-Letter Count application with a customed k8s scheduler
distributed-systems hadoop java k8s python scheduler spark
Last synced: 09 Jan 2025
https://github.com/cleberzumba/hadoop-in-pseudodistributed-mode
Installation and Configuration of the Big Data Environment with Hadoop and Spark
Last synced: 31 Dec 2024
https://github.com/jaini-bhavsar/big-data
This repository contains project related to big data. All the projects are using real- world data of real-world problems.
amazon-ec2 apache-maven hadoop java-8 oozie
Last synced: 19 Jan 2025
https://github.com/heracliteanflux/exercises-scala
Exercises in the Scala programming language with an emphasis on big data programming and applications in Apache Hadoop and Apache Spark.
apache-hadoop apache-maven apache-spark distributed-computing distributed-file-system distributed-systems hadoop map-reduce mrjob scala spark
Last synced: 19 Jan 2025
https://github.com/divinenaman/mapreduce-matrix-multipy
A python implementation of matrix multiplication using Hadoop streaming API
hadoop hadoop-hdfs hadoop-mapreduce python
Last synced: 17 Dec 2024
https://github.com/martincastroalvarez/apache-hive-docker
Running Hive jobs using Docker
Last synced: 22 Dec 2024
https://github.com/martincastroalvarez/hadoop-hdfs-kafka-docker
Running Kafka using Docker
Last synced: 22 Dec 2024
https://github.com/martincastroalvarez/hadoop-hdfs-spark-docker
Running Spark jobs using Docker
Last synced: 22 Dec 2024
https://github.com/dhchenx/simplehadooptool
A tool to submit MapReduce jobs to Hadoop cluster.
client-server hadoop hadoop-api job mapreduce simple-hadoop-tool submit
Last synced: 29 Jan 2025
https://github.com/dhchenx/catla-hs
Catla for Hadoop and Spark (Catla-HS): An open-source system to support tuning MapReduce performance on Hadoop and Spark clusters.
big-data catla-hs hadoop machine-learning mapreduce parameter-search performance-tuning self-tuning-system spark visualization
Last synced: 29 Jan 2025
https://github.com/jferrl/gutemberg-analysis
Gutemberg corpus analysis with apache hadoop
analysis gutemberg hadoop java
Last synced: 19 Jan 2025
https://github.com/billsioros/big-data
Large Scale Data Management Systems MSc. Project
Last synced: 24 Jan 2025
https://github.com/dominicluidold/ws21-introductiontobigdataprojects
A collection of mandatory exercises in "Introduction to Big Data Projects" - 1st semester master @ Vorarlberg University of Applied Sciences (FHV)
avro bigdata hadoop java map-reduce
Last synced: 29 Jan 2025
https://github.com/liuhaozzu/data-mining-algorithms
data mining algorithm -based on Hadoop-2.7.3
data-mining hadoop hadoop-mapreduce java-8
Last synced: 29 Jan 2025
https://github.com/srfrnk/spar-kube
Spark cluster deployment on a k8s cluster
hadoop k8s k8s-cluster kubernetes spar-kube spark zeppelin
Last synced: 29 Jan 2025
https://github.com/yuhexiong/deploy-hadoop-guide
apache-hadoop deployment hadoop hdfs
Last synced: 30 Jan 2025
https://github.com/rishikesh-jadhav/machine-learning-practice-projects
This repository contains my end to end ML practice projects as a part of the ML zero to mastery course by Daniel Bourke and Andrei Neagoie
classification data-science data-visualization dataanalysis deep-learning hadoop jupyter-notebooks machine-learning neural-network numpy pandas pyspark python pytorch regression scikit-learn supervised-learning tensorflow time-series transfer-learning
Last synced: 22 Jan 2025
https://github.com/spineo/accumulo-hdfs-zookeeper
Create a storage cluster running Accumulo on HDFS and Zookeeper for node management.
accumulo accumulo-hdfs-zookeeper ansible ansible-inventory ansible-playbooks cluster hadoop hadoop-hdfs hdfs zookeeper
Last synced: 23 Jan 2025
https://github.com/hindog/grid-executor
Library for remote JVM ExecutorService with only dependency being password-less SSH -- Run clustered Hadoop/Spark jobs from IDE -- IDE-pimped Spark shell with full auto-completion!
cloud grid hadoop ide jvm spark-shell
Last synced: 20 Jan 2025
https://github.com/telefonica/testing.hadoop
Automatic launcher for hadoop-unit from Python
cdco hadoop hadoop-unit python testing
Last synced: 25 Jan 2025
https://github.com/hereismari/relatorio-pibiti-funttel-2015
Classes e scripts utilizados para realização de experimentos durante o PIBITI em 2015 no LSD/UFCG. Trabalho realizado por: Marianne Linhares, orientada por: Andrey Brito.
Last synced: 17 Dec 2024
https://github.com/manuparra/tallerh2s
Taller HDFS, Hadoop y Spark para el Master Profesional de Ingeniería Informática - Universidad de Granada
hadoop hdfs java map-reduce python spark wordcount
Last synced: 07 Nov 2024