Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

Projects in Awesome Lists tagged with hadoop

A curated list of projects in awesome lists tagged with hadoop .

https://github.com/tspannhw/phoenix

Apache Phoenix / Hbase Spring Boot Microservices

hadoop hbase hortonworks java-8 phoenix spring spring-boot

Last synced: 11 Dec 2024

https://github.com/zongxr/bigdata-competition

全国大数据竞赛三等奖解决方案,省赛二等奖解决方案。一键安装大数据环境脚本,自动部署集群环境,包括zookeeper、hadoop、mysql、hive、spark以及一些基础环境。已通过实际服务器测试,效果极佳,仅需要输入密码等少量人为干预。解放安装部署配置所需人力。并添加若干scala案例,结合spark用以进行数据准备。

bigdata hadoop hdfs hive mysql scala shell spark wordcount zookeeper

Last synced: 15 Nov 2024

https://github.com/dimajix/docker-jupyter-spark

Docker image for Jupyter notebooks with PySpark

docker hadoop jupyter pyspark python spark

Last synced: 09 Nov 2024

https://github.com/san089/cloudera_material

Cloudera_Material: Study Material to help people preparing for Cloudera CCA Spark and Hadoop Developer Exam (CCA175). Feel free to collaborate.

big-data bigdata cca cca175 certification cloudera flume hadoop hive hive-metastore pyspark spark sqoop sqoop-export sqoop-import sqoop-session

Last synced: 12 Oct 2024

https://github.com/ibm-cloud/biginsights-on-apache-hadoop

Example projects for 'BigInsights for Apache Hadoop' on IBM Bluemix

ambari biginsights bigsql hadoop hbase hive ibm-bluemix knox oozie spark spark-streaming webhdfs zeppelin

Last synced: 17 Nov 2024

https://github.com/neoremind/app-on-yarn-demo

Demo for service oriented application hosted on Hadoop YARN cluster for HA and scheduling

hadoop service yarn

Last synced: 28 Oct 2024

https://github.com/hoangsonww/moodify-emotion-music-app

🎹 Moodify - an emotion-based music recommendation system that uses AI/ML models to analyze text, speech, and facial expressions, providing personalized music recommendations across web and mobile platforms.

artificial-intelligence django django-rest-framework emotion fullstack-development hadoop kubernetes machine-learning mobile-development mongodb music python pytorch react-native reactjs redis restful-api spark tensorflow torch

Last synced: 01 Nov 2024

https://github.com/tomwhite/hadoop-ecosystem

Visualizations of the Hadoop Ecosystem

hadoop visualization

Last synced: 12 Oct 2024

https://github.com/longshilin/hadoop-mapreduce

基于MapReduce的应用案例 :ear_of_rice:

hadoop mapreduce mr

Last synced: 10 Nov 2024

https://github.com/odpi/egeria-connector-hadoop-ecosystem

Hadoop ecosystem connectors for Egeria: repository proxy connector for Apache Atlas.

apache-atlas connector egeria hadoop metadata proxy

Last synced: 09 Nov 2024

https://github.com/oracle-quickstart/oci-cloudera

Terraform module to deploy Cloudera on Oracle Cloud Infrastructure (OCI)

cdh cdp cloud cloudera dsw edh hadoop oci oracle partner-led spark terraform

Last synced: 07 Nov 2024

https://github.com/googleclouddataproc/hive-bigquery-storage-handler

Hive Storage Handler for interoperability between BigQuery and Apache Hive

apache bigquery gcp google hadoop hive

Last synced: 05 Nov 2024

https://github.com/snowplow/dataflow-runner

Run templatable playbooks of Hadoop/Spark/et al jobs on Amazon EMR

amazon-emr flink golang-application hadoop spark

Last synced: 09 Nov 2024

https://github.com/ihor/phadoop

Map/reduce jobs for Hadoop in PHP

hadoop map-reduce php

Last synced: 07 Nov 2024

https://github.com/romans-weapon/spear-framework

Rapid ETL/ELT-connectors/pipeline development leveraged on top of Apache Spark

docker-compose hadoop kafka scala shell-script spark

Last synced: 10 Oct 2024

https://github.com/aphp/py-hdfs-mount

Mount HDFS with fuse, works with kerberos!

fuse hadoop hdfs kerberos mount mount-hdfs

Last synced: 25 Nov 2024

https://github.com/singgel/bigdata-skilltree

Spark、flink、HBase、Hive、flume集成了一些Hadoop的原生api的一些demo(如HDFS、MapReduce:目前就这两个);同时测试一些异常功能

hadoop hbase hdfs hive kylin mapreduce scala spark

Last synced: 14 Oct 2024

https://github.com/hammerlab/spark-util

low-level helpers for Apache Spark libraries and tests

hadoop kryo scala spark

Last synced: 12 Oct 2024

https://github.com/melin/flink-jobserver

REST job server for Apache Flink

flink hadoop hive java kerberos kubernetes yarn

Last synced: 05 Nov 2024

https://github.com/tomwhite/docker-impala

Run Impala in a Docker container.

docker hadoop impala

Last synced: 12 Oct 2024

https://github.com/jishanshaikh4/hadoop-programs

Hadoop Programs for Hadoop/CUDA Lab at MANIT, Bhopal

hadoop java programs

Last synced: 10 Nov 2024

https://github.com/sylvainhalle/mrsim

A simple MapReduce framework in Java

hadoop java mapreduce tuples

Last synced: 11 Oct 2024

https://github.com/cclient/kubernetes-hadoop

k8s hadoop,在k8s上快速搭建一个hadoop/hbase/hive环境,很早的项目自已用,腾讯tbds培训,以此为基础(多了一个kafka/flink)搭一套环境练习,又捡起来了

docker hadoop k8s kubernetes

Last synced: 16 Nov 2024

https://github.com/dayyass/pydfs

Distributed File System written in Python

distributed-systems filesystem hadoop hdfs mapreduce python

Last synced: 14 Oct 2024

https://github.com/jishanshaikh4/cuda-programs

CUDA Programs for Hadoop/CUDA Lab at MANIT, Bhopal

c cuda hadoop

Last synced: 10 Nov 2024

https://github.com/zenoyang/web-click-flow

网站点击流离线日志分析

etl flume hadoop hive mapreduce sqoop

Last synced: 16 Nov 2024

https://github.com/allegro/camus-compressor

Camus Compressor merges files created by Camus and saves them in a compressed format.

avro etl hadoop kafka spark

Last synced: 06 Nov 2024

https://github.com/collabh/reasearch-bigdata

看书看源码看第三方学习视频

flink hadoop hive spark

Last synced: 28 Oct 2024

https://github.com/cgivre/drillbook

The Official Source Repository for Learning Apache Drill (O'Reilly, 2018)

apache-drill hadoop hbase hive java kafka python python3 sql

Last synced: 22 Dec 2024

https://github.com/manuparra/masterdatcom_bdcc_practice

Practice and Workshop on BigData and Cloud Computing using Docker Containers and OpenNebula. HDFS, hadoop and spark+R

bigdata cloudcomputing containers docker hadoop hdfs linux opennebula practices spark sparkr

Last synced: 07 Nov 2024

https://github.com/xd-deng/diy-a-cluster

How to Do-It-Yourself A Cluster for Spark & Hadoop

cluster-computing hadoop spark

Last synced: 16 Oct 2024

https://github.com/apache/kyuubi-docker

Apache Kyuubi is a distributed and multi-tenant gateway to provide serverless SQL on data warehouses and lakehouses.

data-lake hadoop hive jdbc kubernetes spark spark-sql sql thrift

Last synced: 07 Oct 2024

https://github.com/chaokunyang/bigdata-examples

bigdata examples about spark and flink

bigdata flink hadoop monitor python samples spark spark-sql sparkml

Last synced: 19 Nov 2024

https://github.com/hyeonsangjeon/dataplatform

Hadoop3.2 single/cluster mode with web terminal gotty, spark, jupyter pyspark, hive, eco etc.

hadoop hadoop-cluster hadoop-docker hadoop-ecosystem hadoop-mapreduce hive pyspark-notebook zeppelin-notebook

Last synced: 17 Nov 2024

https://github.com/pasqualesalza/elephant56

A Genetic Algorithms framework for Hadoop MapReduce.

genetic-algorithm hadoop hadoop-mapreduce parallel

Last synced: 18 Dec 2024

https://github.com/manuparra/masterdegreecc_practice

Taller del Máster Profesional de Informática UGR. Curso de CloudComputing.

cloudcomputing cluster docker docker-cluster docker-container hadoop hadoop-cluster hdfs opennebula practice virtual-machine

Last synced: 07 Nov 2024

https://github.com/spirals-team/hadoop-benchmark

Docker containers to build an Hadoop infrastructure and experiment feedback control loops atop of it.

docker evaluation hadoop

Last synced: 01 Jan 2025

https://github.com/ibmstreams/streamsx.hdfs

This toolkit provides operators and functions for interacting with Hadoop File System.

hadoop hdfs ibm-streams java stream-processing toolkit

Last synced: 23 Nov 2024

https://github.com/isislab-unisa/sof

Simulation Optimization and exploration Framework on the cloud: SOF

agent-based-simulation hadoop java mapreduce optimization-process simulation-model simulation-optimization sof

Last synced: 15 Nov 2024

https://github.com/x4ax/lxss-install-zeppelin

Step by step guide on how to install Zeppelin 0.7.3 on Linux subsystem (WSL) for Windows 10

hadoop linux-subsystem lxss spark wsl zeppelin

Last synced: 04 Dec 2024

https://github.com/mjstealey/hadoop

Apache Hadoop - Docker distribution based on CentOS 7 and Oracle Java 8

centos7 docker hadoop java8

Last synced: 11 Oct 2024

https://github.com/risdenk/webhdfs-dotnet

WebHDFS API for .Net

csharp dotnet hadoop hdfs knox webhdfs

Last synced: 16 Oct 2024

https://github.com/lucasbotang/coursera_big_data_for_data_engineers

Assignments for Big Data for Data Engineers specialization on Coursera by Yandex.

hadoop hive spark spark-sql

Last synced: 25 Nov 2024

https://github.com/perfectlysoft/perfect-hadoop

Perfect Hadoop: WebHDFS, MapReduce & Yarn.

hadoop mapreduce perfect server-side-swift swift webhdfs yarn

Last synced: 13 Nov 2024

https://github.com/saket-sk/semester6-sppu-data-analysis-lab

I installed Hadoop on Virtual Machine and all Assignments are performed on Ubuntu OS. Refer to this repo for completion of the Hadoop Assignments. It is recommended that you have a stable internet connection while doing these things.

charts data-visualization hadoop hadoop-assignments hadoop-bigdata-assignments hadoop-framework hadoop-mapreduce plot r tableau

Last synced: 10 Nov 2024

https://github.com/hexnn/stark

基于Spark+Debezium打造的简单易用、超高性能大数据治理引擎,适用于批流一体的数据集成和数据分析场景,支持CDC实时数据采集,支持海量数据同步、数据建模和OLAP数据分析

cdc datax debezium flink hadoop seatunnel spark

Last synced: 11 Oct 2024

https://github.com/bigconnect/bigconnect

A multi-model Big Data graph store supporting graph, document, key/value, and object models

accumulo bigdata elasticsearch graph-database hadoop

Last synced: 10 Oct 2024

https://github.com/brunocampos01/programacao-paralela-e-distribuida

Aulas e exercícios da matéria: Programação Paralela e Distribuída (INE5645) e Computação Distribuida (INE5625).

ditributed hadoop ine ine5625 ine5645 java openmp producer-consumer programacao-paralela socket thread threads ufsc

Last synced: 16 Nov 2024

https://github.com/tritondatacenter/hadoop-manta

Hadoop Filesystem Driver for Manta

drill hadoop hadoop-filesystem joyent manta sqoop triton

Last synced: 05 Nov 2024

https://github.com/axsaucedo/hadoop-overview

Hands on Hadoop, services, installation

ambari hadoop hdfs hive mapreduce mesos notes pig spark yarn

Last synced: 06 Nov 2024

https://github.com/nikoshet/monitoring-spark-on-docker

Spark Monitoring With Prometheus And Grafana Using Docker

docker docker-compose grafana hadoop hdfs monitoring node-exporter prometheus spark

Last synced: 09 Nov 2024

https://github.com/rdblue/brotli-codec

Hadoop Codec for Brotli

brotli compression hadoop

Last synced: 06 Nov 2024

https://github.com/prabaprakash/hadoop-map-reduce-code

Apache Hadoop for Windows

hadoop java

Last synced: 14 Nov 2024

https://github.com/piotr-kalanski/big-data-dev-environment-docker

Big Data Development environment based on Docker

big-data docker elasticsearch hadoop kafka kibana spark

Last synced: 27 Oct 2024

https://github.com/prabaprakash/youtube-channel

Configuration files for my YouTube tutors

docker hadoop

Last synced: 14 Nov 2024

https://github.com/apache/calcite-site

Apache Calcite Website

big-data calcite geospatial hadoop java sql

Last synced: 07 Oct 2024

https://github.com/gtkcyber/drillworkshop

Learn how to quickly explore your data with Apache Drill

apache-drill big-data database hadoop jdbc python r sql

Last synced: 14 Nov 2024

https://github.com/manuparra/instalacion-bigdata-upnavarra

Taller de instalación de Hadoop, HDFS, Spark, Scala y R para DataMining / ML en modo Multi nodo

bigdata hadoop hdfs multinode scala setup spark workshop

Last synced: 07 Nov 2024

https://github.com/risdenk/solr-s3a-testing

Apache Solr - S3A Testing

apache hadoop s3 solr

Last synced: 16 Oct 2024

https://github.com/laertispappas/mapreduce_python

TFIDF ALgorithm on Hadoop - Python

hadoop python tfidf

Last synced: 19 Jan 2025

https://github.com/wittline/moving-average-spark

How to Compute Moving Average with Spark

databricks hadoop moving-average spark

Last synced: 14 Oct 2024

https://github.com/chabane/mitosis-microservice-spark-cassandra

Microservice application that uses Apache Spark, Kafka and Cassandra

cassandra dockerfile hadoop jenkinsfile kafka sbt scala spark spark-streaming

Last synced: 15 Nov 2024

https://github.com/ibmstreams/streamsx.parquet

(Incubation) Toolkit providing adapters to Parquet

hadoop ibm-streams parquet stream-processing toolkit

Last synced: 23 Nov 2024

https://github.com/mahmoud-nfz/football-big-data

This is a comprehensive solution for real-time football analytics, leveraging Apache Spark execution on yarn for both streaming and batch processing, Hadoop HDFS for distributed storage, Kafka for real-time data ingestion, rethinkdb for live data updates , a custom built search engine and Next.js for data visualization.

hadoop hadoop-hdfs kafka nextjs rethinkdb search-engine spark spark-streaming t3-stack

Last synced: 10 Oct 2024

https://github.com/minhthong582000/my-data-stack

A simple Big data stack with Docker

docker docker-compose hadoop spark

Last synced: 13 Jan 2025

https://github.com/nduytg/flink_prometheus_sd

A simple service for discovering Flink cluster on Hadoop Yarn

flink flink-clusters flink-prometheus-sd go golang hadoop hadoop-yarn prometheus service-discovery yarn

Last synced: 09 Dec 2024

https://github.com/ren294/log-analysis-project

This project builds a scalable log analytics pipeline use Lambda architecture for real-time and batch processing of NASA server logs.

apache-kafka apache-nifi apache-spark big-data big-data-analytics cassandra cassandra-driver data-engineering data-science grafana hadoop hadoop-hdfs hive powerbi spark-rdd spark-sql spark-streaming

Last synced: 11 Oct 2024

https://github.com/this/docker-hadoop-hive

Kerberized Apache Hadoop, Apache Hive Docker Images

docker hadoop hive kerberos

Last synced: 15 Nov 2024

https://github.com/zoltan-nz/docker-hadoop-ubuntu

Docker Image. Hadoop 2.8.1, Java 8, Ubuntu stable

docker docker-image hadoop java8 ubuntu

Last synced: 12 Oct 2024

https://github.com/davidemiceli/drillnode

Node.js client for Apache Drill

apache-drill bigdata datascience hadoop hdfs node-js nosql

Last synced: 27 Nov 2024

https://github.com/garystafford/dataproc-workflow-templates

Demonstration of Google Cloud Dataproc Workflow Templates

dataproc gcp google-cloud-platform hadoop pyspark spark

Last synced: 06 Dec 2024

https://github.com/jordicenzano/hive-presto-tutorial

Experiments with hadoop, hive, and prestoDB

bigdata docker docker-compose hadoop hive prestodb

Last synced: 06 Jan 2025

https://github.com/HwiLu/Hadoop-cluster

大数据组件笔记

hadoop hbase hive yarn

Last synced: 24 Oct 2024

https://github.com/mesmacosta/hive-custom-hook

Example on how to implement a hive hook

hadoop hive hive-hook java metadata-extraction

Last synced: 11 Nov 2024

https://github.com/serenasensini/docker-apogeo

Repo che contiene gli esempi presenti nel libro "Docker", edito da Apogeo. Guida al deploy di applicazioni in contenitori software, disponibile dal 24 settembre 2020!

apogeo docker flask hadoop kafka laravel nodejs sentiment-analysis sqlite

Last synced: 20 Nov 2024

https://github.com/steveloughran/validate-hadoop-client-artifacts

build/validate hadoop RCs. moved into apache hadoop itself.

hadoop

Last synced: 15 Nov 2024

https://github.com/skyleaworlder/hadoop-cfg

:elephant: Quick-Start scripts. *.sh about Hadoop 2.10.1 config on Ubuntu 20.04

hadoop

Last synced: 15 Nov 2024

https://github.com/cubxxw/big_data

Big data, hadoop installation and deployment

big-data cluster database git hadoop linux mysql

Last synced: 11 Oct 2024

https://github.com/sneaksanddata/hadoop-fs-wrapper

Python Wrappers for Hadoop FileSystem

distributed-computing hadoop spark

Last synced: 11 Nov 2024

https://github.com/touero/rhodeinae

A Java program for remotely operating Hbase tasks.

hadoop hbase java maven

Last synced: 14 Nov 2024

https://github.com/innorealm/hadoop-rust

Hadoop Client in Rust

hadoop rust

Last synced: 13 Dec 2024