Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

Projects in Awesome Lists tagged with hadoop

A curated list of projects in awesome lists tagged with hadoop .

https://github.com/donnemartin/data-science-ipython-notebooks

Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.

aws big-data caffe data-science deep-learning hadoop kaggle keras machine-learning mapreduce matplotlib numpy pandas python scikit-learn scipy spark tensorflow theano

Last synced: 14 Jan 2025

https://github.com/spotify/luigi

Luigi is a Python module that helps you build complex pipelines of batch jobs. It handles dependency resolution, workflow management, visualization etc. It also comes with Hadoop support built in.

hadoop luigi orchestration-framework python scheduling

Last synced: 13 Jan 2025

https://github.com/tencent/apijson

🏆 实时 零代码、全功能、强安全 ORM 库 🚀 后端接口和文档零代码,前端(客户端) 定制返回 JSON 的数据和结构 🏆 Real-Time coding-free, powerful and secure ORM 🚀 providing APIs and Docs without coding by Backend, and the returned JSON of API can be customized by Frontend(Client) users

baas clickhouse crud databricks elasticsearch hadoop hive influxdb low-code lowcode milvus nocode oracle postgresql postgresql-database serverless snowflake sqlserver tdengine tidb

Last synced: 13 Jan 2025

https://github.com/Tencent/APIJSON

🏆 实时 零代码、全功能、强安全 ORM 库 🚀 后端接口和文档零代码,前端(客户端) 定制返回 JSON 的数据和结构 🏆 Real-Time coding-free, powerful and secure ORM 🚀 providing APIs and Docs without coding by Backend, and the returned JSON of API can be customized by Frontend(Client) users

baas clickhouse crud databricks elasticsearch hadoop hive influxdb low-code lowcode milvus nocode oracle postgresql postgresql-database serverless snowflake sqlserver tdengine tidb

Last synced: 02 Nov 2024

https://github.com/prestodb/presto

The official home of the Presto distributed SQL query engine for big data

big-data data hadoop hive java lakehouse presto query sql

Last synced: 13 Jan 2025

https://github.com/apache/hadoop

Apache Hadoop

hadoop

Last synced: 13 Jan 2025

https://github.com/deeplearning4j/deeplearning4j

Suite of tools for deploying and training deep learning models using the JVM. Highlights include model import for keras, tensorflow, and onnx/pytorch, a modular and tiny c++ library for running math code and a java based math library on top of the core c++ library. Also includes samediff: a pytorch/tensorflow like library for running deep learn...

artificial-intelligence clojure deeplearning deeplearning4j dl4j gpu hadoop intellij java linear-algebra matrix-library neural-nets python scala spark

Last synced: 13 Jan 2025

https://github.com/apache/doris

Apache Doris is an easy-to-use, high performance and unified analytics database.

bigquery database dbt delta-lake elt etl hadoop hive hudi iceberg lakehouse olap query-engine real-time redshift snowflake spark sql

Last synced: 13 Jan 2025

https://github.com/apache/incubator-doris

Apache Doris is an easy-to-use, high performance and unified analytics database.

bigquery database dbt delta-lake elt etl hadoop hive hudi iceberg lakehouse olap query-engine real-time redshift snowflake spark sql

Last synced: 14 Dec 2024

https://github.com/trinodb/trino

Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)

analytics big-data data-science database databases datalake delta-lake distributed-database distributed-systems hadoop hive iceberg java jdbc presto prestodb query-engine sql trino

Last synced: 13 Jan 2025

https://github.com/wangzhiwubigdata/god-of-bigdata

专注大数据学习面试,大数据成神之路开启。Flink/Spark/Hadoop/Hbase/Hive...

azkaban bigdata flink flume hadoop hbase hdfs hive kafka spark zookeeper

Last synced: 14 Jan 2025

https://github.com/wangzhiwubigdata/God-Of-BigData

专注大数据学习面试,大数据成神之路开启。Flink/Spark/Hadoop/Hbase/Hive...

azkaban bigdata flink flume hadoop hbase hdfs hive kafka spark zookeeper

Last synced: 30 Oct 2024

https://github.com/linkedin/school-of-sre

At LinkedIn, we are using this curriculum for onboarding our entry-level talents into the SRE role.

git hadoop linux mysql networking nosql python security sre system-design

Last synced: 14 Jan 2025

https://github.com/h2oai/h2o-3

H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.

automl big-data data-science deep-learning distributed ensemble-learning gbm gpu h2o h2o-automl hadoop java machine-learning naive-bayes opensource pca python r random-forest spark

Last synced: 13 Jan 2025

https://github.com/alluxio/alluxio

Alluxio, data orchestration for analytics and machine learning in the cloud

alluxio data-analysis data-orchestration hadoop memory-speed presto spark tensorflow virtual-distributed-filesystem

Last synced: 13 Jan 2025

https://github.com/Alluxio/alluxio

Alluxio, data orchestration for analytics and machine learning in the cloud

alluxio data-analysis data-orchestration hadoop memory-speed presto spark tensorflow virtual-distributed-filesystem

Last synced: 29 Oct 2024

https://github.com/harisekhon/devops-bash-tools

1000+ DevOps Bash Scripts - AWS, GCP, Kubernetes, Docker, CI/CD, APIs, SQL, PostgreSQL, MySQL, Hive, Impala, Kafka, Hadoop, Jenkins, GitHub, GitLab, BitBucket, Azure DevOps, TeamCity, Spotify, MP3, LDAP, Code/Build Linting, pkg mgmt for Linux, Mac, Python, Perl, Ruby, NodeJS, Golang, Advanced dotfiles: .bashrc, .vimrc, .gitconfig, .screenrc, tmux..

api aws bash ci cloudera devops docker gcp git github hacktoberfest hadoop jenkins kafka kubernetes linux mysql perl postgresql terraform

Last synced: 14 Jan 2025

https://github.com/HariSekhon/DevOps-Bash-tools

1000+ DevOps Bash Scripts - AWS, GCP, Kubernetes, Docker, CI/CD, APIs, SQL, PostgreSQL, MySQL, Hive, Impala, Kafka, Hadoop, Jenkins, GitHub, GitLab, BitBucket, Azure DevOps, TeamCity, Spotify, MP3, LDAP, Code/Build Linting, pkg mgmt for Linux, Mac, Python, Perl, Ruby, NodeJS, Golang, Advanced dotfiles: .bashrc, .vimrc, .gitconfig, .screenrc, tmux..

api aws bash ci cloudera devops docker gcp git github hacktoberfest hadoop jenkins kafka kubernetes linux mysql perl postgresql terraform

Last synced: 03 Nov 2024

https://github.com/tomwhite/hadoop-book

Example source code accompanying O'Reilly's "Hadoop: The Definitive Guide" by Tom White

book hadoop o-reilly

Last synced: 29 Nov 2024

https://github.com/webankfintech/dataspherestudio

DataSphereStudio is a one stop data application development& management portal, covering scenarios including data exchange, desensitization/cleansing, analysis/mining, quality measurement, visualization, and task scheduling.

airflow atlas azkaban dataworks davinci dolphinscheduler flink governance griffin hadoop hive hue kettle linkis spark supperset tableau visualis workflow zeppelin

Last synced: 14 Jan 2025

https://github.com/WeBankFinTech/DataSphereStudio

DataSphereStudio is a one stop data application development& management portal, covering scenarios including data exchange, desensitization/cleansing, analysis/mining, quality measurement, visualization, and task scheduling.

airflow atlas azkaban dataworks davinci dolphinscheduler flink governance griffin hadoop hive hue kettle linkis spark supperset tableau visualis workflow zeppelin

Last synced: 26 Oct 2024

https://github.com/apache/nutch

Apache Nutch is an extensible and scalable web crawler

apache crawling hadoop java nutch web-crawler

Last synced: 14 Jan 2025

https://github.com/luckyzxl2016/movie_recommend

基于Spark的电影推荐系统,包含爬虫项目、web网站、后台管理系统以及spark推荐系统

hadoop hive mysql nginx scala scrapy spark-mllib spark-streaming ssm-maven

Last synced: 18 Jan 2025

https://github.com/LuckyZXL2016/Movie_Recommend

基于Spark的电影推荐系统,包含爬虫项目、web网站、后台管理系统以及spark推荐系统

hadoop hive mysql nginx scala scrapy spark-mllib spark-streaming ssm-maven

Last synced: 29 Oct 2024

https://github.com/moran1607/bigdataguide

大数据学习,从零开始学习大数据,包含大数据学习各阶段学习视频、面试资料

bigdata flink flume hadoop hbase hive javase kafka scala spark zookeeper

Last synced: 15 Jan 2025

https://github.com/MoRan1607/BigDataGuide

大数据学习,从零开始学习大数据,包含大数据学习各阶段学习视频、面试资料

bigdata flink flume hadoop hbase hive javase kafka scala spark zookeeper

Last synced: 05 Nov 2024

https://github.com/apache/kyuubi

Apache Kyuubi is a distributed and multi-tenant gateway to provide serverless SQL on data warehouses and lakehouses.

data-lake hacktoberfest hadoop hive jdbc kubernetes spark spark-sql sql thrift

Last synced: 14 Jan 2025

https://github.com/dahuoyzs/javapdf

🍣100本 Java电子书 技术书籍PDF(以下载阅读为荣,以点赞收藏为耻)

hadoop java-pdf jvm mysql mysql-innodb

Last synced: 18 Jan 2025

https://github.com/cdarlint/winutils

winutils.exe hadoop.dll and hdfs.dll binaries for hadoop windows

binaries hadoop winutils

Last synced: 17 Jan 2025

https://github.com/apache/drill

Apache Drill is a distributed MPP query layer for self describing data

big-data drill hadoop hive java jdbc parquet sql

Last synced: 14 Jan 2025

https://github.com/gchq/Gaffer

A large-scale entity and relation database supporting aggregation of properties

accumulo aggregation big-data graph graph-database hadoop hbase parquet spark

Last synced: 13 Nov 2024

https://github.com/gchq/gaffer

A large-scale entity and relation database supporting aggregation of properties

accumulo aggregation big-data graph graph-database hadoop hbase parquet spark

Last synced: 14 Jan 2025

https://github.com/water8394/bigdata-interview

:dart: :star2:[大数据面试题]分享自己在网络上收集的大数据相关的面试题以及自己的答案总结.目前包含Hadoop/Hive/Spark/Flink/Hbase/Kafka/Zookeeper框架的面试题知识总结

bigdata flink hadoop hbase hdfs interview interview-questions kafka mapreduce spark yarn

Last synced: 18 Jan 2025

https://github.com/water8394/BigData-Interview

:dart: :star2:[大数据面试题]分享自己在网络上收集的大数据相关的面试题以及自己的答案总结.目前包含Hadoop/Hive/Spark/Flink/Hbase/Kafka/Zookeeper框架的面试题知识总结

bigdata flink hadoop hbase hdfs interview interview-questions kafka mapreduce spark yarn

Last synced: 30 Oct 2024

https://github.com/collabh/bigdata-growth

大数据知识仓库涉及到数据仓库建模、实时计算、大数据、数据中台、系统设计、Java、算法等。

bigdata bigdatalearning debezium flink hadoop hbase hdfs hive hudi kafka kudu mapreduce olap spark

Last synced: 16 Jan 2025

https://github.com/collabH/bigdata-growth

大数据知识仓库涉及到数据仓库建模、实时计算、大数据、数据中台、系统设计、Java、算法等。

bigdata bigdatalearning debezium flink hadoop hbase hdfs hive hudi kafka kudu mapreduce olap spark

Last synced: 31 Oct 2024

https://github.com/apache/carbondata

High performance data store solution

apache big-data carbondata data-format hadoop java scala spark

Last synced: 14 Jan 2025

https://github.com/DTStack/Taier

Taier is a big data development platform for submission, scheduling, operation and maintenance, and indicator information display

azkaban chunjun cronjob-scheduler dag data-schedule distributed-schedule-system flink hadoop hive job-scheduler scheduler spark task-schedule workflow-scheduling-system

Last synced: 30 Oct 2024

https://github.com/harisekhon/dockerfiles

50+ DockerHub public images for Docker & Kubernetes - DevOps, CI/CD, GitHub Actions, CircleCI, Jenkins, TeamCity, Alpine, CentOS, Debian, Fedora, Ubuntu, Hadoop, Kafka, ZooKeeper, HBase, Cassandra, Solr, SolrCloud, Presto, Apache Drill, Nifi, Spark, Consul, Riak

apache-drill cassandra consul devops docker dockerhub hacktoberfest hadoop hbase kafka kubernetes linux nagios-plugins presto rabbitmq rabbitmq-cluster solr solrcloud spark zookeeper

Last synced: 16 Jan 2025

https://github.com/HariSekhon/Dockerfiles

50+ DockerHub public images for Docker & Kubernetes - DevOps, CI/CD, GitHub Actions, CircleCI, Jenkins, TeamCity, Alpine, CentOS, Debian, Fedora, Ubuntu, Hadoop, Kafka, ZooKeeper, HBase, Cassandra, Solr, SolrCloud, Presto, Apache Drill, Nifi, Spark, Consul, Riak

apache-drill cassandra consul devops docker dockerhub hacktoberfest hadoop hbase kafka kubernetes linux nagios-plugins presto rabbitmq rabbitmq-cluster solr solrcloud spark zookeeper

Last synced: 04 Nov 2024

https://github.com/dtstack/taier

Taier is a big data development platform for submission, scheduling, operation and maintenance, and indicator information display

azkaban chunjun cronjob-scheduler dag data-schedule distributed-schedule-system flink hadoop hive job-scheduler scheduler spark task-schedule workflow-scheduling-system

Last synced: 16 Jan 2025

https://github.com/wgzhao/addax

Addax is a versatile open-source ETL tool that can seamlessly transfer data between various RDBMS and NoSQL databases, making it an ideal solution for data migration.

clickhouse database etl excel hadoop hdfs hive impala influxdb kudu mysql oracle postgresql sqlserver trino

Last synced: 16 Jan 2025

https://github.com/harisekhon/nagios-plugins

450+ AWS, Hadoop, Cloud, Kafka, Docker, Elasticsearch, RabbitMQ, Redis, HBase, Solr, Cassandra, ZooKeeper, HDFS, Yarn, Hive, Presto, Drill, Impala, Consul, Spark, Jenkins, Travis CI, Git, MySQL, Linux, DNS, Whois, SSL Certs, Yum Security Updates, Kubernetes, Cloudera etc...

aws cassandra cloud cloudera consul docker elasticsearch hacktoberfest hadoop hbase jenkins kafka kubernetes linux mysql nagios-plugins rabbitmq redis solr zookeeper

Last synced: 16 Jan 2025

https://github.com/HariSekhon/nagios-plugins

450+ AWS, Hadoop, Cloud, Kafka, Docker, Elasticsearch, RabbitMQ, Redis, HBase, Solr, Cassandra, ZooKeeper, HDFS, Yarn, Hive, Presto, Drill, Impala, Consul, Spark, Jenkins, Travis CI, Git, MySQL, Linux, DNS, Whois, SSL Certs, Yum Security Updates, Kubernetes, Cloudera etc...

aws cassandra cloud cloudera consul docker elasticsearch hacktoberfest hadoop hbase jenkins kafka kubernetes linux mysql nagios-plugins rabbitmq redis solr zookeeper

Last synced: 16 Nov 2024

https://github.com/teradata/kylo

Kylo is a data lake management software platform and framework for enabling scalable enterprise-class data lakes on big data technologies such as Teradata, Apache Spark and/or Hadoop. Kylo is licensed under Apache 2.0. Contributed by Teradata Inc.

data-lake hadoop kylo nifi spark teradata

Last synced: 17 Jan 2025

https://github.com/Teradata/kylo

Kylo is a data lake management software platform and framework for enabling scalable enterprise-class data lakes on big data technologies such as Teradata, Apache Spark and/or Hadoop. Kylo is licensed under Apache 2.0. Contributed by Teradata Inc.

data-lake hadoop kylo nifi spark teradata

Last synced: 05 Nov 2024

https://github.com/wgzhao/Addax

Addax is a versatile open-source ETL tool that can seamlessly transfer data between various RDBMS and NoSQL databases, making it an ideal solution for data migration.

clickhouse data-integrity database datax etl excel hadoop hdfs hive impala influxdb kudu mysql oracle postgresql sqlserver trino

Last synced: 08 Nov 2024

https://github.com/oeljeklaus-you/useractionanalyzeplatform

电商用户行为分析大数据平台

accumulator hadoop java kyro spark spark-sql sparkjava

Last synced: 20 Jan 2025

https://github.com/apache/ozone

Scalable, redundant, and distributed object store for Apache Hadoop

big-data hadoop kubernetes object-store s3 storage

Last synced: 16 Jan 2025

https://github.com/HariSekhon/DevOps-Python-tools

80+ DevOps & Data CLI Tools - AWS, GCP, GCF Python Cloud Functions, Log Anonymizer, Spark, Hadoop, HBase, Hive, Impala, Linux, Docker, Spark Data Converters & Validators (Avro/Parquet/JSON/CSV/INI/XML/YAML), Travis CI, AWS CloudFormation, Elasticsearch, Solr etc.

avro aws cloudformation devops docker dockerhub elasticsearch gcf gcp hadoop hbase hdfs json linux parquet pyspark python solr spark travis-ci

Last synced: 07 Nov 2024

https://github.com/tony-framework/TonY

TonY is a framework to natively run deep learning frameworks on Apache Hadoop.

deep-learning hadoop hadoop-yarn horovod machine-learning tensorflow

Last synced: 09 Nov 2024

https://github.com/tony-framework/tony

TonY is a framework to natively run deep learning frameworks on Apache Hadoop.

deep-learning hadoop hadoop-yarn horovod machine-learning tensorflow

Last synced: 20 Jan 2025

https://github.com/cerndb/dist-keras

Distributed Deep Learning, with a focus on distributed training, using Keras and Apache Spark.

apache-spark data-parallelism data-science deep-learning distributed-optimizers hadoop keras machine-learning optimization-algorithms tensorflow

Last synced: 28 Sep 2024

https://github.com/absaoss/spline

Data Lineage Tracking And Visualization Solution

bigdata hadoop lineage scala spark tracking visualization

Last synced: 19 Jan 2025

https://github.com/AbsaOSS/spline

Data Lineage Tracking And Visualization Solution

bigdata hadoop lineage scala spark tracking visualization

Last synced: 05 Nov 2024

https://github.com/Esri/gis-tools-for-hadoop

The GIS Tools for Hadoop are a collection of GIS tools for spatial analysis of big data.

hadoop spatial-analysis

Last synced: 01 Nov 2024

https://github.com/raray-chuan/xichuan_note

xichuan的学习总结笔记,覆盖了java、spring、java其他常用框架,以及大数据相关组件等📚

bigdata elk flink hadoop hbase hive java juc jvm kafaka kafka redis spark spring springcloud zabbix zookeeper

Last synced: 19 Jan 2025

https://github.com/linkedin/venice

Venice, Derived Data Platform for Planet-Scale Workloads.

ai database hadoop kafka ml

Last synced: 11 Nov 2024

https://github.com/apache/tez

Apache Tez

apache big-data hadoop java tez

Last synced: 17 Jan 2025

https://github.com/uber/marmaray

Generic Data Ingestion & Dispersal Library for Hadoop

avro-schema data-lake hadoop ingest-data schema-format spark

Last synced: 28 Oct 2024

https://github.com/houshanren/big_data_architect_skills

一个大数据架构师应该掌握的技能

analytics bigdata hadoop skills spark xuan-xing

Last synced: 20 Jan 2025

https://github.com/dromara/cloudeon

CloudEon uses Kubernetes to install and deploy open-source big data components, enabling the containerized operation of an open-source big data platform. This allows you to reduce your focus on underlying resource management and maintenance.

bigdata cloudnative doris hadoop hdfs kubernetes yarn

Last synced: 18 Jan 2025

https://github.com/dromara/CloudEon

CloudEon uses Kubernetes to install and deploy open-source big data components, enabling the containerized operation of an open-source big data platform. This allows you to reduce your focus on underlying resource management and maintenance.

bigdata cloudnative doris hadoop hdfs kubernetes yarn

Last synced: 05 Nov 2024

https://github.com/cubefs/compass

Compass is a task diagnosis platform for bigdata

airflow bigdata diagnose dolphinscheduler flink hadoop mapreduce scheduler spark sql

Last synced: 19 Jan 2025

https://github.com/hortonworks/cloudbreak

CDP Public Cloud is an integrated analytics and data management platform deployed on cloud services. It offers broad data analytics and artificial intelligence functionality along with secure user access and data governance features.

big-data cloud cloudera deployment hacktoberfest hadoop java

Last synced: 15 Nov 2024

https://github.com/kanyun-inc/ytk-learn

Ytk-learn is a distributed machine learning library which implements most of popular machine learning algorithms(GBDT, GBRT, Mixture Logistic Regression, Gradient Boosting Soft Tree, Factorization Machines, Field-aware Factorization Machines, Logistic Regression, Softmax).

distributed factorization-machines gbdt gbm hadoop logistic-regression machine-learning spark

Last synced: 14 Jan 2025

https://github.com/cwensel/cascading

Cascading is a feature rich API for defining and executing complex and fault tolerant data processing flows locally or on a cluster.

hadoop java mapreduce tez

Last synced: 17 Jan 2025

https://github.com/tencent/caelus

Set of Kubernetes solutions for reusing idle resources of nodes by running extra batch jobs

containerd docker hadoop kubernetes runtime yarn

Last synced: 18 Jan 2025

https://github.com/elasticluster/elasticluster

Create clusters of VMs on the cloud and configure them with Ansible.

ansible azure cloud cluster clustering ec2 gcp gridengine hadoop hpc python slurm spark

Last synced: 06 Nov 2024

https://github.com/datawhalechina/juicy-bigdata

🎉🎉🐳 Datawhale大数据处理导论教程 | 大数据技术方向的开篇课程🎉🎉

bigdata hadoop hbase hdfs hive mapreduce spark

Last synced: 15 Jan 2025

https://github.com/sakserv/hadoop-mini-clusters

hadoop-mini-clusters provides an easy way to test Hadoop projects directly in your IDE

hadoop hadoop-mini-clusters ide java test-automation

Last synced: 19 Jan 2025

https://github.com/GoogleCloudDataproc/hadoop-connectors

Libraries and tools for interoperability between Hadoop-related open-source software and Google Cloud Platform.

bigquery google-cloud-dataproc hadoop hadoop-filesystem hadoop-hcfs

Last synced: 25 Oct 2024

https://github.com/googleclouddataproc/hadoop-connectors

Libraries and tools for interoperability between Hadoop-related open-source software and Google Cloud Platform.

bigquery google-cloud-dataproc hadoop hadoop-filesystem hadoop-hcfs

Last synced: 15 Jan 2025

https://github.com/brndnmtthws/facebook-hive-udfs

Facebook's Hive UDFs

hadoop hive udf udf-libraries

Last synced: 19 Jan 2025

https://github.com/apache/calcite-avatica

Apache Calcite Avatica

big-data calcite geospatial hadoop java sql

Last synced: 14 Jan 2025

https://github.com/wavestone-cdt/hadoop-attack-library

A collection of pentest tools and resources targeting Hadoop environments

bigdata hadoop pentest

Last synced: 18 Nov 2024

https://github.com/shifuml/shifu

An end-to-end machine learning and data mining framework on Hadoop

bigdata end-to-end-machine-learning gbdt hadoop machine-learning neural-network pipeline random-forest shifu

Last synced: 20 Jan 2025

https://github.com/oeljeklaus-you/javaorbigdata-interview

Java开发者或者大数据开发者面试知识点整理

bigdata hadoop interview java spark storm

Last synced: 17 Jan 2025

https://github.com/jasonTangxd/recommendSys

推荐项目(实时推荐和离线推荐)

hadoop kafka mahot storm toos

Last synced: 13 Nov 2024

https://github.com/harisekhon/haproxy-configs

80+ HAProxy Configs for Hadoop, Big Data, NoSQL, Docker, Kubernetes, Elasticsearch, SolrCloud, HBase, MySQL, PostgreSQL, Apache Drill, Hive, Presto, Impala, Hue, ZooKeeper, SSH, RabbitMQ, Redis, Riak, Cloudera, OpenTSDB, InfluxDB, Prometheus, Kibana, Graphite, Rancher etc.

apache-drill cassandra cloudera elasticsearch hacktoberfest hadoop haproxy hbase hive influxdb mapr mysql nosql opentsdb postgresql presto prometheus redis solrcloud zookeeper

Last synced: 15 Jan 2025

https://github.com/HariSekhon/HAProxy-configs

80+ HAProxy Configs for Hadoop, Big Data, NoSQL, Docker, Kubernetes, Elasticsearch, SolrCloud, HBase, MySQL, PostgreSQL, Apache Drill, Hive, Presto, Impala, Hue, ZooKeeper, SSH, RabbitMQ, Redis, Riak, Cloudera, OpenTSDB, InfluxDB, Prometheus, Kibana, Graphite, Rancher etc.

apache-drill cassandra cloudera elasticsearch hacktoberfest hadoop haproxy hbase hive influxdb mapr mysql nosql opentsdb postgresql presto prometheus redis solrcloud zookeeper

Last synced: 06 Nov 2024

https://github.com/huangfox/dpkb

大数据相关内容汇总,包括分布式存储引擎、分布式计算引擎、数仓建设等。关键词:Hadoop、HBase、ES、Kudu、Hive、Presto、Spark、Flink、Kylin、ClickHouse

flink hadoop hbase hive presto spark

Last synced: 30 Oct 2024

https://github.com/chabane/bigdata-playground

A complete example of a big data application using : Kubernetes (kops/aws), Apache Spark SQL/Streaming/MLib, Apache Flink, Scala, Python, Apache Kafka, Apache Hbase, Apache Parquet, Apache Avro, Apache Storm, Twitter Api, MongoDB, NodeJS, Angular, GraphQL

angular apache-flink apache-spark avro big-data docker graphql hadoop hbase kafka kops machine-learning mongodb nodejs parquet python scala spark-sql spark-streaming twitter-api

Last synced: 16 Jan 2025