Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/huangyueranbbc/SparkDemo
spark全示例代码(java、scala) Spark most full instance code DEMO (java、scala)
bigdata hadoop operator spark spark-sql spark-streaming sparkfun-products sparkjava sparkline sparkp
Last synced: 03 Jul 2024
https://github.com/apache/incubator-gluten
Gluten is a middle layer responsible for offloading JVM-based SQL engines' execution to native engines.
arrow clickhouse simd spark-sql vectorization velox
Last synced: 28 Jun 2024
https://github.com/streamnative/pulsar-spark
Spark Connector to read and write with Pulsar
apache-pulsar apache-spark batch-processing data-processing data-science flink spark spark-sql stream-processing structured-streaming
Last synced: 26 Jun 2024
https://github.com/Chabane/bigdata-playground
A complete example of a big data application using : Kubernetes (kops/aws), Apache Spark SQL/Streaming/MLib, Apache Flink, Scala, Python, Apache Kafka, Apache Hbase, Apache Parquet, Apache Avro, Apache Storm, Twitter Api, MongoDB, NodeJS, Angular, GraphQL
angular apache-flink apache-spark avro big-data docker graphql hadoop hbase kafka kops machine-learning mongodb nodejs parquet python scala spark-sql spark-streaming twitter-api
Last synced: 17 Jun 2024
https://github.com/mc2-project/opaque-sql
An encrypted data analytics platform
analytics enclave machine-learning privacy security spark spark-sql
Last synced: 15 Jun 2024
https://github.com/japila-books/spark-sql-internals
The Internals of Spark SQL
apache-spark book internals mkdocs-material spark spark-sql
Last synced: 07 Jun 2024
https://github.com/jaceklaskowski/spark-workshop
Apache Spark™ and Scala Workshops
apache-spark spark spark-mllib spark-sql spark-structured-streaming spark-workshops workshop
Last synced: 07 Jun 2024
https://github.com/ploomber/jupysql
Better SQL in Jupyter. 📊
bigquery clickhouse data-engineering data-science duckdb hive jupyter mysql polars postgres presto python redshift snowflake spark-sql sql sqlite trino tsql
Last synced: 02 Jun 2024
https://github.com/getredash/redash
Make Your Company Data Driven. Connect to any data source, easily visualize, dashboard and share your data.
analytics athena bi bigquery business-intelligence dashboard databricks hacktoberfest javascript mysql postgresql python redash redshift spark spark-sql visualization
Last synced: 16 May 2024
https://github.com/harryprince/awesome-sparklyr
An awesome sparklyr related package collection
apache-spark awesome big-data dbi machine-learning r r-stats spark-sql sparklyr
Last synced: 14 May 2024
https://github.com/indix/sparkplug
Spark package to "plug" holes in data using SQL based rules ⚡️ 🔌
Last synced: 30 Apr 2024
https://github.com/streamnative/awesome-pulsar
A curated list of Pulsar tools, integrations and resources.
apache-bookkeeper apache-flink apache-kafka apache-pulsar apache-spark apache-storm elastic-beats grafana-dashboard messaging prometheus pub-sub spark spark-sql spark-structured-streaming
Last synced: 11 Apr 2024
https://github.com/apache/kyuubi
Apache Kyuubi is a distributed and multi-tenant gateway to provide serverless SQL on data warehouses and lakehouses.
data-lake hacktoberfest hadoop hive jdbc kubernetes spark spark-sql sql thrift
Last synced: 11 Apr 2024
https://github.com/dotnet/spark
.NET for Apache® Spark™ makes Apache Spark™ easily accessible to .NET developers.
analytics apache-spark azure bigdata csharp databricks dotnet dotnet-core dotnet-standard emr fsharp hdinsight machine-learning microsoft spark spark-sql spark-streaming streaming tpcds tpch
Last synced: 11 Apr 2024
https://github.com/kevinschaich/pyspark-cheatsheet
🐍 Quick reference guide to common patterns & functions in PySpark.
cheat cheatsheet cheatsheets data data-science docs documentation guide guides pyspark pyspark-tutorial quickstart reference references spark spark-sql
Last synced: 10 Apr 2024
https://github.com/qubole/sparklens
Qubole Sparklens tool for performance tuning Apache Spark
cluster performance performance-analysis performance-metrics performance-tuning performance-visualization scala scheduler scheduling simulation spark spark-applications spark-job spark-ml spark-mllib spark-sql sparkjava
Last synced: 31 Mar 2024
https://github.com/harryprince/geospark
bring sf to spark in production
apache-spark gis large-scale-spatial-analysis r spark-sql sparklyr-extension spatial-analysis spatial-queries
Last synced: 20 Mar 2024
https://github.com/almond-sh/almond
A Scala kernel for Jupyter
jupyter jupyter-kernels jupyter-notebook repl scala spark spark-sql
Last synced: 18 Mar 2024