An open API service indexing awesome lists of open source software.

Projects in Awesome Lists tagged with bigdata

A curated list of projects in awesome lists tagged with bigdata .

https://github.com/dataexpert-io/data-engineer-handbook

This is a repo with links to everything you'd ever want to learn about data engineering

apachespark awesome bigdata data dataengineering sql

Last synced: 28 Sep 2025

https://github.com/DataExpert-io/data-engineer-handbook

This is a repo with links to everything you'd ever want to learn about data engineering

apachespark awesome bigdata data dataengineering sql

Last synced: 04 Apr 2025

https://github.com/juicedata/juicefs

JuiceFS is a distributed POSIX file system built on top of Redis and S3.

bigdata cloud-native distributed-systems filesystem go golang hdfs object-storage posix redis s3 storage

Last synced: 12 May 2025

https://github.com/wangzhiwubigdata/god-of-bigdata

专注大数据学习面试,大数据成神之路开启。Flink/Spark/Hadoop/Hbase/Hive...

azkaban bigdata flink flume hadoop hbase hdfs hive kafka spark zookeeper

Last synced: 13 May 2025

https://github.com/wangzhiwubigdata/God-Of-BigData

专注大数据学习面试,大数据成神之路开启。Flink/Spark/Hadoop/Hbase/Hive...

azkaban bigdata flink flume hadoop hbase hdfs hive kafka spark zookeeper

Last synced: 27 Mar 2025

https://github.com/databendlabs/databend

𝗗𝗮𝘁𝗮, 𝗔𝗻𝗮𝗹𝘆𝘁𝗶𝗰𝘀 & 𝗔𝗜. Modern alternative to Snowflake. Cost-effective and simple for massive-scale analytics. https://databend.com

ai bigdata database lakehouse olap rust serverless snowflake

Last synced: 05 Jan 2026

https://github.com/vaexio/vaex

Out-of-Core hybrid Apache Arrow/NumPy DataFrame for Python, ML, visualization and exploration of big tabular data at a billion rows per second 🚀

bigdata data-science dataframe hdf5 machine-learning machinelearning memory-mapped-file pyarrow python tabular-data visualization

Last synced: 12 Dec 2025

https://github.com/apache/hudi

Upserts, Deletes And Incremental Processing on Big Data.

apacheflink apachehudi apachespark bigdata data-integration datalake hudi incremental-processing stream-processing

Last synced: 12 May 2025

https://github.com/volcano-sh/volcano

A Cloud Native Batch System (Project under CNCF)

ai batch-systems bigdata gene golang hpc kubernetes machine-learning serving training

Last synced: 03 Oct 2025

https://github.com/dtstack/chunjun

A data integration framework

bigdata data-integration flink framework java

Last synced: 13 May 2025

https://github.com/DTStack/chunjun

A data integration framework

bigdata data-integration flink framework java

Last synced: 14 Mar 2025

https://github.com/igaowei/bigdataview

100+套大数据可视化炫酷大屏Html5模板;包含行业:社区、物业、政务、交通、金融银行等,全网最新、最多,最全、最酷、最炫大数据可视化模板。陆续更新中

bigdata bigdataviewer echarts html-template viewmodel

Last synced: 14 May 2025

https://github.com/liyupi/sql-generator

🔨 用 JSON 来生成结构化的 SQL 语句,基于 Vue3 + TypeScript + Vite + Ant Design + MonacoEditor 实现,项目简单(重逻辑轻页面)、适合练手~

ant-design bigdata hive javascript json monaco-editor mysql spark sql typescript vite vue vue3

Last synced: 14 May 2025

https://github.com/apache/avro

Apache Avro is a data serialization system.

avro bigdata c cplusplus csharp dotnet java perl php python ruby

Last synced: 09 Sep 2025

https://github.com/moran1607/bigdataguide

大数据学习,从零开始学习大数据,包含大数据学习各阶段学习视频、面试资料

bigdata flink flume hadoop hbase hive javase kafka scala spark zookeeper

Last synced: 14 May 2025

https://github.com/MoRan1607/BigDataGuide

大数据学习,从零开始学习大数据,包含大数据学习各阶段学习视频、面试资料

bigdata flink flume hadoop hbase hive javase kafka scala spark zookeeper

Last synced: 04 Apr 2025

https://github.com/douban/dpark

Python clone of Spark, a MapReduce alike framework in Python

bigdata dpark mapreduce python spark stream-processing

Last synced: 29 Oct 2025

https://github.com/griddb/griddb

GridDB is a next-generation open source database that makes time series IoT and big data fast,and easy.

bigdata database fast griddb iot newsql nosql sql time-series timeseries

Last synced: 13 May 2025

https://github.com/dtstack/flinkstreamsql

基于开源的flink,对其实时sql进行扩展;主要实现了流与维表的join,支持原生flink SQL所有的语法

bigdata flink sql stream

Last synced: 15 May 2025

https://github.com/DTStack/flinkStreamSQL

基于开源的flink,对其实时sql进行扩展;主要实现了流与维表的join,支持原生flink SQL所有的语法

bigdata flink sql stream

Last synced: 27 Mar 2025

https://github.com/shzlw/poli

An easy-to-use BI server built for SQL lovers. Power data analysis in SQL and gain faster business insights.

bigdata business-intelligence dashboard data-visualization jdbc reactjs reporting spring-boot sql sql-editor

Last synced: 15 May 2025

https://github.com/byzer-org/byzer-lang

Byzer (former MLSQL): A low-code open-source programming language for data pipeline, analytics and AI.

bigdata machine-learning mlsql sql-like-dsl

Last synced: 15 May 2025

https://github.com/rustfs/rustfs

🚀 High-performance distributed object storage for MinIO alternative.

bigdata cloud-native distributed-systems filesystem minio object-storage oss rust s3

Last synced: 25 Dec 2025

https://github.com/jadianes/spark-py-notebooks

Apache Spark & Python (pySpark) tutorials for Big Data Analysis and Machine Learning as IPython / Jupyter notebooks

big-data bigdata data-analysis data-science ipython ipython-notebook machine-learning mllib notebook pyspark python spark

Last synced: 15 May 2025

https://github.com/water8394/bigdata-interview

:dart: :star2:[大数据面试题]分享自己在网络上收集的大数据相关的面试题以及自己的答案总结.目前包含Hadoop/Hive/Spark/Flink/Hbase/Kafka/Zookeeper框架的面试题知识总结

bigdata flink hadoop hbase hdfs interview interview-questions kafka mapreduce spark yarn

Last synced: 15 May 2025

https://github.com/water8394/BigData-Interview

:dart: :star2:[大数据面试题]分享自己在网络上收集的大数据相关的面试题以及自己的答案总结.目前包含Hadoop/Hive/Spark/Flink/Hbase/Kafka/Zookeeper框架的面试题知识总结

bigdata flink hadoop hbase hdfs interview interview-questions kafka mapreduce spark yarn

Last synced: 27 Mar 2025

https://github.com/collabh/bigdata-growth

大数据知识仓库涉及到数据仓库建模、实时计算、大数据、数据中台、系统设计、Java、算法等。

bigdata bigdatalearning debezium flink hadoop hbase hdfs hive hudi kafka kudu mapreduce olap spark

Last synced: 14 May 2025

https://github.com/collabH/bigdata-growth

大数据知识仓库涉及到数据仓库建模、实时计算、大数据、数据中台、系统设计、Java、算法等。

bigdata bigdatalearning debezium flink hadoop hbase hdfs hive hudi kafka kudu mapreduce olap spark

Last synced: 28 Mar 2025

https://github.com/kubernetes-retired/kube-batch

A batch scheduler of kubernetes for high performance workload, e.g. AI/ML, BigData, HPC

bigdata hpc k8s-sig-scheduling kubernetes machine-learning

Last synced: 29 Sep 2025

https://github.com/josonle/coding-now

学习记录的一些笔记,以及所看得一些电子书eBooks、视频资源和平常收纳的一些自己认为比较好的博客、网站、工具。涉及大数据几大组件、Python机器学习和数据分析、Linux、操作系统、算法、网络等

bigdata coding ebook-collection hadoop-hive java linux notes spark

Last synced: 16 May 2025

https://github.com/zeromicro/cds

Data syncing in golang for ClickHouse.

bigdata clickhouse go golang kafka-consumer

Last synced: 16 May 2025

https://github.com/apache/amoro

Apache Amoro (incubating) is a Lakehouse management system built on open data lake formats.

bigdata datalake lakehouse

Last synced: 14 May 2025

https://github.com/apache/celeborn

Apache Celeborn is an elastic and high-performance service for shuffle and spilled data.

bigdata shuffle spark

Last synced: 14 May 2025

https://github.com/apache/incubator-livy

Apache Livy is an open source REST interface for interacting with Apache Spark from anywhere.

apachelivy bigdata livy spark

Last synced: 12 May 2025

https://github.com/visualpython/visualpython

GUI-based Python code generator for data science, extension to Jupyter Lab, Jupyter Notebook and Google Colab.

bigdata chrome-extension code-generator data-analysis jupyter-lab-extension jupyter-notebook-extension jupyterlab-extension pandas python visual-coding

Last synced: 15 May 2025

https://github.com/pingcap/tispark

TiSpark is built for running Apache Spark on top of TiDB/TiKV

bigdata spark tidb tikv

Last synced: 14 May 2025

https://github.com/jadianes/spark-movie-lens

An on-line movie recommender using Spark, Python Flask, and the MovieLens dataset

big-data bigdata flask movie-recommendation movielens-dataset python spark

Last synced: 12 Apr 2025

https://github.com/intsmaze/flink-boot

懒松鼠Flink-Boot 脚手架让Flink全面拥抱Spring生态体系,使得开发者可以以Java WEB开发模式开发出分布式运行的流处理程序,懒松鼠让跨界变得更加简单。懒松鼠旨在让开发者以更底上手成本(不需要理解分布式计算的理论知识和Flink框架的细节)便可以快速编写业务代码实现。为了进一步提升开发者使用懒松鼠脚手架开发大型项目的敏捷的度,该脚手架默认集成Spring框架进行Bean管理,同时将微服务以及WEB开发领域中经常用到的框架集成进来,进一步提升开发速度。比如集成Mybatis ORM框架,Hibernate Validator校验框架,Spring Retry重试框架等,具体见下面的脚手架特性。

bigdata flink flink-boot java java-flink mcv mybatis sping spring-boot spring-retry

Last synced: 27 Mar 2025

https://github.com/gearpump/gearpump

Lightweight real-time big data streaming engine over Akka

akka bigdata scala stream-processing

Last synced: 16 Dec 2025

https://github.com/gangly/datafaker

Datafaker is a large-scale test data and flow test data generation tool. Datafaker fakes data and inserts to varied data sources. 测试数据生成工具

bigdata datafaker fakedata faker hbase hive kafka mysql oracle postgresql python testing

Last synced: 16 May 2025

https://github.com/absaoss/spline

Data Lineage Tracking And Visualization Solution

bigdata hadoop lineage scala spark tracking visualization

Last synced: 16 May 2025

https://github.com/nationalsecurityagency/datawave

DataWave is an ingest/query framework that leverages Apache Accumulo to provide fast, secure data access.

accumulo bigdata java

Last synced: 15 May 2025

https://github.com/unum-cloud/ustore

Multi-Modal Database replacing MongoDB, Neo4J, and Elastic with 1 faster ACID solution, with NetworkX and Pandas interfaces, and bindings for C 99, C++ 17, Python 3, Java, GoLang 🗄️

acid apache-arrow arrow big-data bigdata database dataloader document-database graph-database iouring json key-value-store knn-search networkx nosql pandas python search spdk vector-search

Last synced: 11 Apr 2025

https://github.com/simbafl/datawarehouse

从数据仓库到用户画像,从数据建设到数据应用

bigdata datawarehouse olap presto sql userprofile

Last synced: 23 Apr 2025

https://github.com/AbsaOSS/spline

Data Lineage Tracking And Visualization Solution

bigdata hadoop lineage scala spark tracking visualization

Last synced: 04 Apr 2025

https://github.com/apconw/sanic-web

一个轻量级、支持全链路且易于二次开发的大模型应用项目(Large Model Data Assistant) 支持DeepSeek/Qwen2.5等大模型 基于 Dify 、Ollama&Vllm、Sanic 和 Text2SQL 📊 等技术构建的一站式大模型应用开发项目,采用 Vue3、TypeScript 和 Vite 5 打造现代UI。它支持通过 ECharts 📈 实现基于大模型的数据图形化问答,具备处理 CSV 文件 📂 表格问答的能力。同时,能方便对接第三方开源 RAG 系统 检索系统 🌐等,以支持广泛的通用知识问答。

ai bigdata chat chatgpt deepseek-r1 dify echarts large-model-data-assistant llm ollama python qwen rag sanic text2sql vllm vue3

Last synced: 16 May 2025

https://github.com/mvillarrealb/docker-spark-cluster

A simple spark standalone cluster for your testing environment purposses

bigdata developer-tools docker-compose spark

Last synced: 16 May 2025

https://github.com/simbafl/DataWarehouse

从数据仓库到用户画像,从数据建设到数据应用

bigdata datawarehouse olap presto sql userprofile

Last synced: 27 Mar 2025

https://github.com/minio/sidekick

High Performance HTTP Sidecar Load Balancer

bigdata kubernetes load-balancer minio-servers proxy sidecar sidekick spark

Last synced: 20 Jun 2025

https://github.com/NationalSecurityAgency/datawave

DataWave is an ingest/query framework that leverages Apache Accumulo to provide fast, secure data access.

accumulo bigdata java

Last synced: 01 Apr 2025

https://github.com/grailbio/bigslice

A serverless cluster computing system for the Go programming language

bigdata cluster computing etl go golang machinelearning mapreduce

Last synced: 21 Apr 2025

https://github.com/raray-chuan/xichuan_note

xichuan的学习总结笔记,覆盖了java、spring、java其他常用框架,以及大数据相关组件等📚

bigdata elk flink hadoop hbase hive java juc jvm kafaka kafka redis spark spring springcloud zabbix zookeeper

Last synced: 05 Apr 2025

https://github.com/rdkmaster/jigsaw

Jigsaw七巧板 provides a set of web components based on Angular5/8/9+. The main purpose of Jigsaw is to help the application developers to construct complex & intensive interacting & user friendly web pages. Jigsaw is supporting the development of all applications of Big Data Product of ZTE.

angular bigdata component jigsaw jigsaw-seed typescript webui zte

Last synced: 16 May 2025

https://github.com/Kotlin/kotlin-spark-api

This projects gives Kotlin bindings and several extensions for Apache Spark. We are looking to have this as a part of Apache Spark 3.x

bigdata kotlin nullability scala spark

Last synced: 13 May 2025

https://github.com/houshanren/big_data_architect_skills

一个大数据架构师应该掌握的技能

analytics bigdata hadoop skills spark xuan-xing

Last synced: 05 Apr 2025

https://github.com/kotlin/kotlin-spark-api

This projects gives Kotlin bindings and several extensions for Apache Spark. We are looking to have this as a part of Apache Spark 3.x

bigdata kotlin nullability scala spark

Last synced: 12 Apr 2025

https://github.com/dromara/cloudeon

CloudEon uses Kubernetes to install and deploy open-source big data components, enabling the containerized operation of an open-source big data platform. This allows you to reduce your focus on underlying resource management and maintenance.

bigdata cloudnative doris hadoop hdfs kubernetes yarn

Last synced: 15 May 2025

https://github.com/dromara/CloudEon

CloudEon uses Kubernetes to install and deploy open-source big data components, enabling the containerized operation of an open-source big data platform. This allows you to reduce your focus on underlying resource management and maintenance.

bigdata cloudnative doris hadoop hdfs kubernetes yarn

Last synced: 04 Apr 2025

https://github.com/mrxujiang/v6.dooring.public

可视化大屏解决方案, 提供一套可视化编辑引擎, 助力个人或企业轻松定制自己的可视化大屏应用.

antv big-data big-data-analytics bigdata dooring low-code lowcode nodejs react webgl2

Last synced: 05 Apr 2025

https://github.com/zhaoyachao/zdh_web

大数据采集,抽取平台,zdh_web是zdh系列服务的可视化管理平台,包含数据采集,调度,权限,审批流,私域营销等模块

bigdata collection data data-collection datapipeline datax-web etl pipline scheduler spark sparketl

Last synced: 04 Apr 2025

https://github.com/arvados/arvados

An open source platform for managing and analyzing biomedical big data

arvados aws azure bigdata bioinformatics cloud cluster cwl docker gcp genomics go python ruby workflow workflow-engine

Last synced: 17 Dec 2025

https://github.com/cubefs/compass

Compass is a task diagnosis platform for bigdata

airflow bigdata diagnose dolphinscheduler flink hadoop mapreduce scheduler spark sql

Last synced: 15 May 2025

https://github.com/pierre94/flink-notes

flink学习笔记

bigdata flink flink-notes flinkx

Last synced: 04 Apr 2025

https://github.com/binghe001/bingheguide

🔥🔥🔥 📚 本代码库是作者冰河多年从事互联网大厂开发、架构的学习历程技术汇总,旨在为大家提供一个清晰详细的学习教程,侧重点更倾向编写Java核心内容、底层原理、架构知识、渗透技术。如果本仓库能为您提供帮助,请给予支持(关注、点赞、分享)!

aop asm bigdata bytecode ddd dubbo hacker ioc java javafx javassist mybatis mysql spring springboot springcloud springcloudalibaba

Last synced: 16 May 2025

https://github.com/davidesantangelo/api.rss

RSS as RESTful. This service allows you to transform RSS feed into an awesome API.

api bigdata dandelion-api elasticsearch feed machine-learning rails rest-api rss rss-feed ruby semantic-web sidekiq

Last synced: 20 Apr 2025

https://github.com/sksamuel/centurion

Kotlin Bigdata Toolkit

bigdata java kotlin orc parquet

Last synced: 18 Dec 2025

https://github.com/sderosiaux/every-single-day-i-tldr

A daily digest of the articles or videos I've found interesting, that I want to share with you.

akka architecture bigdata category-theory data-engineering ddd googlecloudplatform java javascript kafka kubernetes microservices reactjs scala spark technology watch

Last synced: 16 May 2025

https://github.com/sirkon/ldetool

Code generator for fast log file parsers

bigdata datamining log-parsing logs-analysis logs-parsing parsing parsing-csv

Last synced: 07 Apr 2025

https://github.com/datawhalechina/juicy-bigdata

🎉🎉🐳 Datawhale大数据处理导论教程 | 大数据技术方向的开篇课程🎉🎉

bigdata hadoop hbase hdfs hive mapreduce spark

Last synced: 09 Apr 2025

https://github.com/Eugene-Mark/bigdata-file-viewer

A cross-platform (Windows, MAC, Linux) desktop application to view common bigdata binary format like Parquet, ORC, AVRO, etc. Support local file system, HDFS, AWS S3, Azure Blob Storage ,etc.

avro bigdata hdfs orc parquet

Last synced: 20 Nov 2025

https://github.com/kkyon/Simple-IT-English

Simple-IT-English: smart wordbook from community for community

bigdata dictonary english-learning english-word simple-it-english site

Last synced: 28 Mar 2025

https://github.com/spotify/big-data-rosetta-code

Code snippets for solving common big data problems in various platforms. Inspired by Rosetta Code

bigdata scala scalding scio spark

Last synced: 16 May 2025