Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

Projects in Awesome Lists tagged with apache-hadoop

A curated list of projects in awesome lists tagged with apache-hadoop .

https://github.com/tencentyun/hadoop-cos

hadoop-cos(CosN文件系统)为Apache Hadoop、Spark以及Tez等大数据计算框架集成提供支持,可以像访问HDFS一样读写存储在腾讯云COS上的数据。同时也支持作为Druid等查询与分析引擎的Deep Storage

alluxio apache-hadoop hadoop-compatible-filsystem tencent-cloud-cos

Last synced: 15 Dec 2024

https://github.com/pbwebmedia/yarn-prometheus-exporter

Export Hadoop YARN (resource-manager) metrics in prometheus format

apache apache-hadoop exporter hadoop metrics prometheus resource-manager yarn yarn-hadoop-cluster

Last synced: 19 Dec 2024

https://github.com/guru107/hadoop-small-files-merger

A Spark application to merge small files on Hadoop

apache-hadoop apache-spark avro parquet scala text

Last synced: 10 Nov 2024

https://github.com/abdelhakim-gh/bigdata_project

This project aims to establish a data streaming pipeline with storage, processing, and visualization

apache-flink apache-hadoop apache-kafka elasticsearch github-api kibana python

Last synced: 03 Dec 2024

https://github.com/narius2030/sakila-datawarehouse-analysis

Implement a Hive data warehouse to store meaningful data, apply Machine Learning like Clustering or Regression for dealing with business problems

apache-hadoop apache-hive data-analysis etl-pipeline hiveql machine-learning statistics

Last synced: 14 Dec 2024

https://github.com/bayunova28/spotify_lyrics

This repository contains my personal project to generate mapreduce using apache hadoop

apache-derby apache-hadoop apache-hive hadoop-mapreduce mapreduce-python spotify

Last synced: 18 Dec 2024

https://github.com/shortthirdman/apache-hadoop-nativelib

Apache Hadoop NativeLib Build for 64-bit (x86_64)

apache-hadoop hadoop hadoop-hdfs hadoop-mapreduce hadoop-nativelib

Last synced: 19 Nov 2024

https://github.com/shuuji3/spark-ceph-connector

🌟Spark Ceph Connector: Implementation of Hadoop Filesystem API for Ceph

apache-hadoop apache-spark ceph hadoop spark

Last synced: 29 Nov 2024

https://github.com/abdurrehman7452/search-engine-utilising-hadoop-mapreduce-technology-with-python-on-wikipedia-articles

Developing a Naive Search Engine Utilising Apache Hadoop MapReduce Technology on a dataset in comma-separated values (CSV) format containing around 5 million Wikipedia articles provided by Wikimedia, as part of an assignment for the Fundamental of Big Data Analytics (DS2004) course.

apache-hadoop big-data-analytics data-science hadoop-mapreduce mapreduce mapreduce-python search-engine wikimedia wikipedia wikipedia-articles

Last synced: 09 Nov 2024

https://github.com/vikentiosvitalis/advanced_topics_in_database_systems

Data Science Project - for 'Advanced Topics in Database Systems' M.Sc. Course ECE @ntua

apache-hadoop apache-spark big-data data-science pyspark python

Last synced: 23 Nov 2024

https://github.com/mituskillologies/bigdata-ait-sep24

Programs conducted at Army Institute of Technology, Pune in training on Big Data Analytics during September 2024.

apache-hadoop apache-spark big-data big-data-analytics hadoop spark

Last synced: 16 Nov 2024

https://github.com/dmarks84/coursework_capstone_full_data_engineering

Final Project for IBM Data Engineering & Python Professional Certificate -- Applied all skills and methods utilized in the series of courses for this certification

apache-airflow apache-hadoop apache-kafka apache-spark api beautifulsoup cassandra dags etl mongodb nosql pandas plotly postgresql python scipy seaborn sql

Last synced: 19 Nov 2024

https://github.com/heracliteanflux/exercises-scala

Exercises in the Scala programming language with an emphasis on big data programming and applications in Apache Hadoop and Apache Spark.

apache-hadoop apache-maven apache-spark distributed-computing distributed-file-system distributed-systems hadoop map-reduce mrjob scala spark

Last synced: 18 Nov 2024