Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
Projects in Awesome Lists tagged with hadoop
A curated list of projects in awesome lists tagged with hadoop .
https://github.com/ait-aecid/anomaly-detection-log-datasets
Analysis scripts for log data sets used in anomaly detection.
anomaly-detection bgl hadoop hdfs log-data logs machine-learning python review semi-supervised sequences survey unsupervised
Last synced: 21 Dec 2024
https://github.com/tspannhw/phoenix
Apache Phoenix / Hbase Spring Boot Microservices
hadoop hbase hortonworks java-8 phoenix spring spring-boot
Last synced: 11 Dec 2024
https://github.com/zongxr/bigdata-competition
全国大数据竞赛三等奖解决方案,省赛二等奖解决方案。一键安装大数据环境脚本,自动部署集群环境,包括zookeeper、hadoop、mysql、hive、spark以及一些基础环境。已通过实际服务器测试,效果极佳,仅需要输入密码等少量人为干预。解放安装部署配置所需人力。并添加若干scala案例,结合spark用以进行数据准备。
bigdata hadoop hdfs hive mysql scala shell spark wordcount zookeeper
Last synced: 15 Nov 2024
https://github.com/san089/cloudera_material
Cloudera_Material: Study Material to help people preparing for Cloudera CCA Spark and Hadoop Developer Exam (CCA175). Feel free to collaborate.
big-data bigdata cca cca175 certification cloudera flume hadoop hive hive-metastore pyspark spark sqoop sqoop-export sqoop-import sqoop-session
Last synced: 12 Oct 2024
https://github.com/ibm-cloud/biginsights-on-apache-hadoop
Example projects for 'BigInsights for Apache Hadoop' on IBM Bluemix
ambari biginsights bigsql hadoop hbase hive ibm-bluemix knox oozie spark spark-streaming webhdfs zeppelin
Last synced: 17 Nov 2024
https://github.com/neoremind/app-on-yarn-demo
Demo for service oriented application hosted on Hadoop YARN cluster for HA and scheduling
Last synced: 28 Oct 2024
https://github.com/hoangsonww/moodify-emotion-music-app
🎹 Moodify - an emotion-based music recommendation system that uses AI/ML models to analyze text, speech, and facial expressions, providing personalized music recommendations across web and mobile platforms.
artificial-intelligence django django-rest-framework emotion fullstack-development hadoop kubernetes machine-learning mobile-development mongodb music python pytorch react-native reactjs redis restful-api spark tensorflow torch
Last synced: 01 Nov 2024
https://github.com/tomwhite/hadoop-ecosystem
Visualizations of the Hadoop Ecosystem
Last synced: 12 Oct 2024
https://github.com/longshilin/hadoop-mapreduce
基于MapReduce的应用案例 :ear_of_rice:
Last synced: 10 Nov 2024
https://github.com/odpi/egeria-connector-hadoop-ecosystem
Hadoop ecosystem connectors for Egeria: repository proxy connector for Apache Atlas.
apache-atlas connector egeria hadoop metadata proxy
Last synced: 09 Nov 2024
https://github.com/snowplow/dataflow-runner
Run templatable playbooks of Hadoop/Spark/et al jobs on Amazon EMR
amazon-emr flink golang-application hadoop spark
Last synced: 09 Nov 2024
https://github.com/romans-weapon/spear-framework
Rapid ETL/ELT-connectors/pipeline development leveraged on top of Apache Spark
docker-compose hadoop kafka scala shell-script spark
Last synced: 10 Oct 2024
https://github.com/hiejulia/data-pipeline-project
Data pipeline project
amazon-web-services azure bigml classification data-pipeline deployment distributed-systems hadoop java kafka machine-learning mapreduce maven spark streaming
Last synced: 16 Dec 2024
https://github.com/xmlking/cdc-kafka-hadoop
MySQL to NoSQL real time dataflow
architecture cdc change-data-capture data-flow debezium groovy hadoop kafka maxwell mysql nifi
Last synced: 02 Dec 2024
https://github.com/zunzhuowei/qs-hadoop
大数据生态圈学习
bigdata elasticsearch hadoop mapreduce spark spark-streaming storm
Last synced: 02 Dec 2024
https://github.com/strongjz/aws-big-data-study
Study Guide for AWS Big Data Speciality Certification
amazon-web-services aws bigdata certification certification-exam certification-prep cloud elasticmapreduce emr hadoop kinesis redshift
Last synced: 08 Nov 2024
https://github.com/aphp/py-hdfs-mount
Mount HDFS with fuse, works with kerberos!
fuse hadoop hdfs kerberos mount mount-hdfs
Last synced: 25 Nov 2024
https://github.com/simbafl/interview-notes
Python随笔
data-science hadoop hive machine-learning python spark
Last synced: 06 Nov 2024
https://github.com/smithros/kpi-stuff
Some of my laboratories work in KPI and stuff connected with it.
ada android asm bigdata computer-engineering cpp cs hadoop java kpi kpi-ua masm32 ntuu-kpi parrallel-computing python realtime-system security system-programming vhdl
Last synced: 23 Oct 2024
https://github.com/hammerlab/spark-util
low-level helpers for Apache Spark libraries and tests
Last synced: 12 Oct 2024
https://github.com/melin/flink-jobserver
REST job server for Apache Flink
flink hadoop hive java kerberos kubernetes yarn
Last synced: 05 Nov 2024
https://github.com/tomwhite/docker-impala
Run Impala in a Docker container.
Last synced: 12 Oct 2024
https://github.com/jishanshaikh4/hadoop-programs
Hadoop Programs for Hadoop/CUDA Lab at MANIT, Bhopal
Last synced: 10 Nov 2024
https://github.com/cclient/kubernetes-hadoop
k8s hadoop,在k8s上快速搭建一个hadoop/hbase/hive环境,很早的项目自已用,腾讯tbds培训,以此为基础(多了一个kafka/flink)搭一套环境练习,又捡起来了
Last synced: 16 Nov 2024
https://github.com/dayyass/pydfs
Distributed File System written in Python
distributed-systems filesystem hadoop hdfs mapreduce python
Last synced: 14 Oct 2024
https://github.com/jishanshaikh4/cuda-programs
CUDA Programs for Hadoop/CUDA Lab at MANIT, Bhopal
Last synced: 10 Nov 2024
https://github.com/sivasamyk/graylog-plugin-output-webhdfs
WebHDFS Output plugin for Graylog
archiving graylog graylog-plugin hadoop webhdfs
Last synced: 15 Oct 2024
https://github.com/cgivre/drillbook
The Official Source Repository for Learning Apache Drill (O'Reilly, 2018)
apache-drill hadoop hbase hive java kafka python python3 sql
Last synced: 22 Dec 2024
https://github.com/manuparra/masterdatcom_bdcc_practice
Practice and Workshop on BigData and Cloud Computing using Docker Containers and OpenNebula. HDFS, hadoop and spark+R
bigdata cloudcomputing containers docker hadoop hdfs linux opennebula practices spark sparkr
Last synced: 07 Nov 2024
https://github.com/confluentinc/kafka-connect-hdfs
Kafka Connect HDFS connector
apache-kafka big-data confluent hadoop hdfs kafka kafka-connect-hdfs kafka-connector streaming
Last synced: 17 Nov 2024
https://github.com/xd-deng/diy-a-cluster
How to Do-It-Yourself A Cluster for Spark & Hadoop
cluster-computing hadoop spark
Last synced: 16 Oct 2024
https://github.com/apache/kyuubi-docker
Apache Kyuubi is a distributed and multi-tenant gateway to provide serverless SQL on data warehouses and lakehouses.
data-lake hadoop hive jdbc kubernetes spark spark-sql sql thrift
Last synced: 07 Oct 2024
https://github.com/hyeonsangjeon/dataplatform
Hadoop3.2 single/cluster mode with web terminal gotty, spark, jupyter pyspark, hive, eco etc.
hadoop hadoop-cluster hadoop-docker hadoop-ecosystem hadoop-mapreduce hive pyspark-notebook zeppelin-notebook
Last synced: 17 Nov 2024
https://github.com/pasqualesalza/elephant56
A Genetic Algorithms framework for Hadoop MapReduce.
genetic-algorithm hadoop hadoop-mapreduce parallel
Last synced: 18 Dec 2024
https://github.com/manuparra/masterdegreecc_practice
Taller del Máster Profesional de Informática UGR. Curso de CloudComputing.
cloudcomputing cluster docker docker-cluster docker-container hadoop hadoop-cluster hdfs opennebula practice virtual-machine
Last synced: 07 Nov 2024
https://github.com/criccomini/hive-metastore-standalone
Apache Hive Metastore in Standalone Mode With Docker
docker github-workflow github-workflows hadoop hcatalog hive hive-metastore presto prestodb trino trinodb
Last synced: 30 Nov 2024
https://github.com/vertica/pstl
Parallel Streaming Transformation Loader
bigdata data-mining data-science etl-pipeline hadoop ingestion realtime-messaging streaming-data vertica
Last synced: 12 Nov 2024
https://github.com/spirals-team/hadoop-benchmark
Docker containers to build an Hadoop infrastructure and experiment feedback control loops atop of it.
Last synced: 01 Jan 2025
https://github.com/ibmstreams/streamsx.hdfs
This toolkit provides operators and functions for interacting with Hadoop File System.
hadoop hdfs ibm-streams java stream-processing toolkit
Last synced: 23 Nov 2024
https://github.com/isislab-unisa/sof
Simulation Optimization and exploration Framework on the cloud: SOF
agent-based-simulation hadoop java mapreduce optimization-process simulation-model simulation-optimization sof
Last synced: 15 Nov 2024
https://github.com/x4ax/lxss-install-zeppelin
Step by step guide on how to install Zeppelin 0.7.3 on Linux subsystem (WSL) for Windows 10
hadoop linux-subsystem lxss spark wsl zeppelin
Last synced: 04 Dec 2024
https://github.com/mjstealey/hadoop
Apache Hadoop - Docker distribution based on CentOS 7 and Oracle Java 8
Last synced: 11 Oct 2024
https://github.com/ahmetfurkandemir/data-engineering-project-with-hdfs-and-kafka
Data Engineering Project with Hadoop HDFS and Kafka
data data-engineer data-engineering data-engineering-pipeline docker docker-compose hadoop hadoop-filesystem hadoop-hdfs hdfs hdfs-client hdfs-dfs kafka kafka-consumer kafka-producer kafka-ui kafkaui pipline python python-hdfs-client
Last synced: 16 Nov 2024
https://github.com/lucasbotang/coursera_big_data_for_data_engineers
Assignments for Big Data for Data Engineers specialization on Coursera by Yandex.
Last synced: 25 Nov 2024
https://github.com/mikeroyal/apache-ignite-guide
Apache Ignite Guide
data-science database hadoop hadoop-cluster ignite nosql nosql-data-storage nosql-databases stream-processing streaming
Last synced: 12 Dec 2024
https://github.com/drsnowbird/tensorflow-python3-jupyter
tensorflow-python3-jupyter
docker docker-compose hadoop jupyter jupyter-notebook machine-learning python spark tensorflow tensorflow-board tensorflow-tutorials topic-modeling
Last synced: 14 Nov 2024
https://github.com/perfectlysoft/perfect-hadoop
Perfect Hadoop: WebHDFS, MapReduce & Yarn.
hadoop mapreduce perfect server-side-swift swift webhdfs yarn
Last synced: 13 Nov 2024
https://github.com/saket-sk/semester6-sppu-data-analysis-lab
I installed Hadoop on Virtual Machine and all Assignments are performed on Ubuntu OS. Refer to this repo for completion of the Hadoop Assignments. It is recommended that you have a stable internet connection while doing these things.
charts data-visualization hadoop hadoop-assignments hadoop-bigdata-assignments hadoop-framework hadoop-mapreduce plot r tableau
Last synced: 10 Nov 2024
https://github.com/bigconnect/bigconnect
A multi-model Big Data graph store supporting graph, document, key/value, and object models
accumulo bigdata elasticsearch graph-database hadoop
Last synced: 10 Oct 2024
https://github.com/disizmj/node-farmer
A lightweight automation tool built for Linux machines
ansible automation automation-framework big-data centos chef cloud cloud-management code deploy-tool hadoop infrastructure-management installation linux package-management puppet redhat server-management servers ubuntu
Last synced: 25 Nov 2024
https://github.com/amey-thakur/hadoop
HADOOP
amey ameythakur big-data big-data-analytics bigdata bigdataanalytics computer-engineering engineering hadoop mapper megasatish reducer
Last synced: 09 Nov 2024
https://github.com/brunocampos01/programacao-paralela-e-distribuida
Aulas e exercícios da matéria: Programação Paralela e Distribuída (INE5645) e Computação Distribuida (INE5625).
ditributed hadoop ine ine5625 ine5645 java openmp producer-consumer programacao-paralela socket thread threads ufsc
Last synced: 16 Nov 2024
https://github.com/tritondatacenter/hadoop-manta
Hadoop Filesystem Driver for Manta
drill hadoop hadoop-filesystem joyent manta sqoop triton
Last synced: 05 Nov 2024
https://github.com/imsanjoykb/pyspark-bootcamp
My Practice and project on PySpark
hadoop hadoop-mapreduce pyspark pyspark-machine-learning pyspark-ml pyspark-mllib pyspark-notebook spark-sql spark-streaming sparkjava transformation
Last synced: 12 Oct 2024
https://github.com/nikoshet/monitoring-spark-on-docker
Spark Monitoring With Prometheus And Grafana Using Docker
docker docker-compose grafana hadoop hdfs monitoring node-exporter prometheus spark
Last synced: 09 Nov 2024
https://github.com/prabaprakash/hadoop-map-reduce-code
Apache Hadoop for Windows
Last synced: 14 Nov 2024
https://github.com/piotr-kalanski/big-data-dev-environment-docker
Big Data Development environment based on Docker
big-data docker elasticsearch hadoop kafka kibana spark
Last synced: 27 Oct 2024
https://github.com/prabaprakash/youtube-channel
Configuration files for my YouTube tutors
Last synced: 14 Nov 2024
https://github.com/apache/calcite-site
Apache Calcite Website
big-data calcite geospatial hadoop java sql
Last synced: 07 Oct 2024
https://github.com/gtkcyber/drillworkshop
Learn how to quickly explore your data with Apache Drill
apache-drill big-data database hadoop jdbc python r sql
Last synced: 14 Nov 2024
https://github.com/laertispappas/mapreduce_python
TFIDF ALgorithm on Hadoop - Python
Last synced: 19 Jan 2025
https://github.com/wittline/moving-average-spark
How to Compute Moving Average with Spark
databricks hadoop moving-average spark
Last synced: 14 Oct 2024
https://github.com/chabane/mitosis-microservice-spark-cassandra
Microservice application that uses Apache Spark, Kafka and Cassandra
cassandra dockerfile hadoop jenkinsfile kafka sbt scala spark spark-streaming
Last synced: 15 Nov 2024
https://github.com/ibmstreams/streamsx.parquet
(Incubation) Toolkit providing adapters to Parquet
hadoop ibm-streams parquet stream-processing toolkit
Last synced: 23 Nov 2024
https://github.com/mahmoud-nfz/football-big-data
This is a comprehensive solution for real-time football analytics, leveraging Apache Spark execution on yarn for both streaming and batch processing, Hadoop HDFS for distributed storage, Kafka for real-time data ingestion, rethinkdb for live data updates , a custom built search engine and Next.js for data visualization.
hadoop hadoop-hdfs kafka nextjs rethinkdb search-engine spark spark-streaming t3-stack
Last synced: 10 Oct 2024
https://github.com/minhthong582000/my-data-stack
A simple Big data stack with Docker
docker docker-compose hadoop spark
Last synced: 13 Jan 2025
https://github.com/nduytg/flink_prometheus_sd
A simple service for discovering Flink cluster on Hadoop Yarn
flink flink-clusters flink-prometheus-sd go golang hadoop hadoop-yarn prometheus service-discovery yarn
Last synced: 09 Dec 2024
https://github.com/ren294/log-analysis-project
This project builds a scalable log analytics pipeline use Lambda architecture for real-time and batch processing of NASA server logs.
apache-kafka apache-nifi apache-spark big-data big-data-analytics cassandra cassandra-driver data-engineering data-science grafana hadoop hadoop-hdfs hive powerbi spark-rdd spark-sql spark-streaming
Last synced: 11 Oct 2024
https://github.com/this/docker-hadoop-hive
Kerberized Apache Hadoop, Apache Hive Docker Images
Last synced: 15 Nov 2024
https://github.com/zoltan-nz/docker-hadoop-ubuntu
Docker Image. Hadoop 2.8.1, Java 8, Ubuntu stable
docker docker-image hadoop java8 ubuntu
Last synced: 12 Oct 2024
https://github.com/davidemiceli/drillnode
Node.js client for Apache Drill
apache-drill bigdata datascience hadoop hdfs node-js nosql
Last synced: 27 Nov 2024
https://github.com/garystafford/dataproc-workflow-templates
Demonstration of Google Cloud Dataproc Workflow Templates
dataproc gcp google-cloud-platform hadoop pyspark spark
Last synced: 06 Dec 2024
https://github.com/jordicenzano/hive-presto-tutorial
Experiments with hadoop, hive, and prestoDB
bigdata docker docker-compose hadoop hive prestodb
Last synced: 06 Jan 2025
https://github.com/apache/fluo-yarn
Apache Fluo Yarn
accumulo big-data fluo hacktoberfest hadoop
Last synced: 27 Nov 2024
https://github.com/mesmacosta/hive-custom-hook
Example on how to implement a hive hook
hadoop hive hive-hook java metadata-extraction
Last synced: 11 Nov 2024
https://github.com/serenasensini/docker-apogeo
Repo che contiene gli esempi presenti nel libro "Docker", edito da Apogeo. Guida al deploy di applicazioni in contenitori software, disponibile dal 24 settembre 2020!
apogeo docker flask hadoop kafka laravel nodejs sentiment-analysis sqlite
Last synced: 20 Nov 2024
https://github.com/steveloughran/validate-hadoop-client-artifacts
build/validate hadoop RCs. moved into apache hadoop itself.
Last synced: 15 Nov 2024
https://github.com/skyleaworlder/hadoop-cfg
:elephant: Quick-Start scripts. *.sh about Hadoop 2.10.1 config on Ubuntu 20.04
Last synced: 15 Nov 2024
https://github.com/sneaksanddata/hadoop-fs-wrapper
Python Wrappers for Hadoop FileSystem
distributed-computing hadoop spark
Last synced: 11 Nov 2024
https://github.com/touero/rhodeinae
A Java program for remotely operating Hbase tasks.
Last synced: 14 Nov 2024