Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
Projects in Awesome Lists tagged with apache-hadoop
A curated list of projects in awesome lists tagged with apache-hadoop .
https://github.com/mahmoudparsian/data-algorithms-book
MapReduce, Spark, Java, and Scala for Data Algorithms Book
apache-hadoop apache-spark data-algorithms design-patterns distributed-algorithms distributed-computing hadoop-mapreduce java machine-learning mappers mapreduce partitioning pyspark python reducers scala
Last synced: 19 Dec 2024
https://github.com/mahmoudparsian/big-data-mapreduce-course
Big Data Modeling, MapReduce, Spark, PySpark @ Santa Clara University
algorithms apache-hadoop apache-spark big-data data-algorithms data-analysis data-engineering data-partition data-transformation glossary mapreduce mapreduce-algorithm mapreduce-python monoid partitioning-algorithms pyspark pyspark-algorithms-book santa-clara-university spark-dataframes spark-rdd
Last synced: 16 Dec 2024
https://github.com/tencentyun/hadoop-cos
hadoop-cos(CosN文件系统)为Apache Hadoop、Spark以及Tez等大数据计算框架集成提供支持,可以像访问HDFS一样读写存储在腾讯云COS上的数据。同时也支持作为Druid等查询与分析引擎的Deep Storage
alluxio apache-hadoop hadoop-compatible-filsystem tencent-cloud-cos
Last synced: 15 Dec 2024
https://github.com/s911415/apache-hadoop-3.1.0-winutils
HADOOP 3.1.0 winutils
apache-hadoop hadoop native winutils
Last synced: 30 Oct 2024
https://github.com/pbwebmedia/yarn-prometheus-exporter
Export Hadoop YARN (resource-manager) metrics in prometheus format
apache apache-hadoop exporter hadoop metrics prometheus resource-manager yarn yarn-hadoop-cluster
Last synced: 19 Dec 2024
https://github.com/guru107/hadoop-small-files-merger
A Spark application to merge small files on Hadoop
apache-hadoop apache-spark avro parquet scala text
Last synced: 10 Nov 2024
https://github.com/yingzhuo/logback-flume-appender
logback appender for apache-flume
apache-flume apache-hadoop apache-hive flume logback logback-appender logback-flume-appender slf4j
Last synced: 22 Nov 2024
https://github.com/abdelhakim-gh/bigdata_project
This project aims to establish a data streaming pipeline with storage, processing, and visualization
apache-flink apache-hadoop apache-kafka elasticsearch github-api kibana python
Last synced: 03 Dec 2024
https://github.com/narius2030/sakila-datawarehouse-analysis
Implement a Hive data warehouse to store meaningful data, apply Machine Learning like Clustering or Regression for dealing with business problems
apache-hadoop apache-hive data-analysis etl-pipeline hiveql machine-learning statistics
Last synced: 14 Dec 2024
https://github.com/chabane/spark-custom-datasource
apache-arrow apache-hadoop apache-spark inputformat pyspark
Last synced: 15 Nov 2024
https://github.com/bayunova28/spotify_lyrics
This repository contains my personal project to generate mapreduce using apache hadoop
apache-derby apache-hadoop apache-hive hadoop-mapreduce mapreduce-python spotify
Last synced: 18 Dec 2024
https://github.com/shortthirdman/apache-hadoop-nativelib
Apache Hadoop NativeLib Build for 64-bit (x86_64)
apache-hadoop hadoop hadoop-hdfs hadoop-mapreduce hadoop-nativelib
Last synced: 19 Nov 2024
https://github.com/shuuji3/spark-ceph-connector
🌟Spark Ceph Connector: Implementation of Hadoop Filesystem API for Ceph
apache-hadoop apache-spark ceph hadoop spark
Last synced: 29 Nov 2024
https://github.com/abdurrehman7452/search-engine-utilising-hadoop-mapreduce-technology-with-python-on-wikipedia-articles
Developing a Naive Search Engine Utilising Apache Hadoop MapReduce Technology on a dataset in comma-separated values (CSV) format containing around 5 million Wikipedia articles provided by Wikimedia, as part of an assignment for the Fundamental of Big Data Analytics (DS2004) course.
apache-hadoop big-data-analytics data-science hadoop-mapreduce mapreduce mapreduce-python search-engine wikimedia wikipedia wikipedia-articles
Last synced: 09 Nov 2024
https://github.com/vikentiosvitalis/advanced_topics_in_database_systems
Data Science Project - for 'Advanced Topics in Database Systems' M.Sc. Course ECE @ntua
apache-hadoop apache-spark big-data data-science pyspark python
Last synced: 23 Nov 2024
https://github.com/mituskillologies/bigdata-ait-sep24
Programs conducted at Army Institute of Technology, Pune in training on Big Data Analytics during September 2024.
apache-hadoop apache-spark big-data big-data-analytics hadoop spark
Last synced: 16 Nov 2024
https://github.com/dmarks84/coursework_capstone_full_data_engineering
Final Project for IBM Data Engineering & Python Professional Certificate -- Applied all skills and methods utilized in the series of courses for this certification
apache-airflow apache-hadoop apache-kafka apache-spark api beautifulsoup cassandra dags etl mongodb nosql pandas plotly postgresql python scipy seaborn sql
Last synced: 19 Nov 2024
https://github.com/heracliteanflux/exercises-scala
Exercises in the Scala programming language with an emphasis on big data programming and applications in Apache Hadoop and Apache Spark.
apache-hadoop apache-maven apache-spark distributed-computing distributed-file-system distributed-systems hadoop map-reduce mrjob scala spark
Last synced: 18 Nov 2024
https://github.com/yuhexiong/deploy-hadoop-guide
apache-hadoop deployment hadoop hdfs
Last synced: 02 Dec 2024