Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

Projects in Awesome Lists tagged with spark-sql

A curated list of projects in awesome lists tagged with spark-sql .

https://github.com/getredash/redash

Make Your Company Data Driven. Connect to any data source, easily visualize, dashboard and share your data.

analytics athena bi bigquery business-intelligence dashboard databricks hacktoberfest javascript mysql postgresql python redash redshift spark spark-sql visualization

Last synced: 16 Dec 2024

https://github.com/apache/kyuubi

Apache Kyuubi is a distributed and multi-tenant gateway to provide serverless SQL on data warehouses and lakehouses.

data-lake hacktoberfest hadoop hive jdbc kubernetes spark spark-sql sql thrift

Last synced: 17 Dec 2024

https://github.com/databricks/learningsparkv2

This is the github repo for Learning Spark: Lightning-Fast Data Analytics [2nd Edition]

apache-spark delta-lake mlflow mllib spark spark-mllib spark-sql structured-streaming

Last synced: 21 Dec 2024

https://github.com/apache/incubator-gluten

Gluten is a middle layer responsible for offloading JVM-based SQL engines' execution to native engines.

arrow clickhouse simd spark-sql vectorization velox

Last synced: 19 Dec 2024

https://github.com/oeljeklaus-you/useractionanalyzeplatform

电商用户行为分析大数据平台

accumulator hadoop java kyro spark spark-sql sparkjava

Last synced: 16 Dec 2024

https://github.com/zsvoboda/ngods-stocks

New Generation Opensource Data Stack Demo

cube dagster datahub dbt iceberg metabase python spark spark-sql trino trinodb

Last synced: 19 Dec 2024

https://github.com/microsoft/data-accelerator

Data Accelerator for Apache Spark simplifies onboarding to Streaming of Big Data. It offers a rich, easy to use experience to help with creation, editing and management of Spark jobs on Azure HDInsights or Databricks while enabling the full power of the Spark engine.

apache-spark azure big-data cosmosdb docker eventhub hdinsight iot iothub kafka kafka-streams nodejs react servicefabric spark spark-sql spark-streaming sparksql streaming streaming-data

Last synced: 20 Dec 2024

https://github.com/chabane/bigdata-playground

A complete example of a big data application using : Kubernetes (kops/aws), Apache Spark SQL/Streaming/MLib, Apache Flink, Scala, Python, Apache Kafka, Apache Hbase, Apache Parquet, Apache Avro, Apache Storm, Twitter Api, MongoDB, NodeJS, Angular, GraphQL

angular apache-flink apache-spark avro big-data docker graphql hadoop hbase kafka kops machine-learning mongodb nodejs parquet python scala spark-sql spark-streaming twitter-api

Last synced: 19 Dec 2024

https://github.com/Chabane/bigdata-playground

A complete example of a big data application using : Kubernetes (kops/aws), Apache Spark SQL/Streaming/MLib, Apache Flink, Scala, Python, Apache Kafka, Apache Hbase, Apache Parquet, Apache Avro, Apache Storm, Twitter Api, MongoDB, NodeJS, Angular, GraphQL

angular apache-flink apache-spark avro big-data docker graphql hadoop hbase kafka kops machine-learning mongodb nodejs parquet python scala spark-sql spark-streaming twitter-api

Last synced: 11 Nov 2024

https://github.com/polomarcus/spark-structured-streaming-examples

Spark Structured Streaming / Kafka / Cassandra / Elastic

cassandra kafka spark spark-sql structured-streaming

Last synced: 18 Dec 2024

https://github.com/mc2-project/opaque-sql

An encrypted data analytics platform

analytics enclave machine-learning privacy security spark spark-sql

Last synced: 31 Oct 2024

https://github.com/huangyueranbbc/SparkDemo

spark全示例代码(java、scala) Spark most full instance code DEMO (java、scala)

bigdata hadoop operator spark spark-sql spark-streaming sparkfun-products sparkjava sparkline sparkp

Last synced: 30 Oct 2024

https://github.com/sjrusso8/spark-connect-rs

Apache Spark Connect Client for Rust

grpc-client spark spark-connect spark-sql

Last synced: 18 Dec 2024

https://github.com/dbiir/paraflow

A real-time analytical system for ID-associated data

hadoop kafka orc parquet presto spark-sql

Last synced: 21 Nov 2024

https://github.com/wh1isper/sparglim

Sparglim✨ makes PySpark App Configurable and Deploy Spark Connect Server Easier!

jupyter-magic pyspark spark spark-connect spark-connect-server spark-on-kubernetes spark-sql

Last synced: 18 Dec 2024

https://github.com/learningjournal/spark-streaming-in-scala

Apache Spark 3 - Structured Streaming Course Material

apache-spark big-data bigdata datalake scala spark spark-sql spark-streaming

Last synced: 19 Nov 2024

https://github.com/indix/sparkplug

Spark package to "plug" holes in data using SQL based rules ⚡️ 🔌

datapipeline spark spark-sql

Last synced: 07 Nov 2024

https://github.com/astrolabsoftware/spark-fits

FITS data source for Spark SQL and DataFrames

apache-spark fits fitsio hdfs pyspark scala spark-sql

Last synced: 11 Oct 2024

https://github.com/zekeriyyaa/pyspark-structured-streaming-ros-kafka-apachespark-cassandra

A structured streaming was applied to the robot data from ROS-Gazebo simulation environment using Apache Spark. Data is collected in Kafka, analyzed by Apache Spark and stored in Cassandra.

apache-cassandra apache-kafka apache-spark cqlsh data-analysis kafka-consumer kafka-producer pyspark python python3 ros ros-noetic spark-cassandra spark-cassandra-connector spark-kafka-connector spark-kafka-integration spark-sql spark-streaming structured-streaming

Last synced: 12 Oct 2024

https://github.com/luckyzxl2016/spark-example

Spark1.6和spark2.2的示例,包含kafka,flume,structuredstreaming,jedis,elasticsearch,mysql,dataframe

dataframe elasticsearch jedis kafka mysql spark spark-example spark-sql spark-streaming spark-structured-streaming

Last synced: 28 Oct 2024

https://github.com/lifeomic/spark-vcf

Spark VCF data source implementation for Dataframes

dataframe genomics genotype lifeomic spark spark-sql team-clinical-intelligence variants vcf vcf-files

Last synced: 12 Nov 2024

https://github.com/asuiu/sparkorm

ORM for Apache Spark and DataFrames schema manager

orm pyspark pyspark-python python python3 spark spark-orm spark-sql sparkql sqlalchemy sqlalchemy-orm

Last synced: 18 Dec 2024

https://github.com/selimhorri/spark-application

Java Application, uses Apache Spark, handles batch as well as streaming processing

dataframes-api java mysql spark spark-batch spark-sql spark-streaming

Last synced: 14 Oct 2024

https://github.com/apache/kyuubi-docker

Apache Kyuubi is a distributed and multi-tenant gateway to provide serverless SQL on data warehouses and lakehouses.

data-lake hadoop hive jdbc kubernetes spark spark-sql sql thrift

Last synced: 07 Oct 2024

https://github.com/chaokunyang/bigdata-examples

bigdata examples about spark and flink

bigdata flink hadoop monitor python samples spark spark-sql sparkml

Last synced: 19 Nov 2024

https://github.com/dirkster99/pynotes

My notebook on using Python with Jupyter Notebook, PySpark etc

dataframe jupyter-notebook panda pandas-dataframe parquet pyspark python spark spark-sql sparknlp

Last synced: 17 Oct 2024

https://github.com/jgperrin/net.jgp.books.spark.ch11

Spark in Action, 2nd edition - chapter 11 - Working with SQL

apache-spark java java8 manning spark spark-sql sparkwithjava sql

Last synced: 09 Nov 2024

https://github.com/lucasbotang/coursera_big_data_for_data_engineers

Assignments for Big Data for Data Engineers specialization on Coursera by Yandex.

hadoop hive spark spark-sql

Last synced: 25 Nov 2024

https://github.com/maziyarpanahi/spark2-template

Intellij template to develop Apache Spark 2.x applications

spark-ml spark-sql spark-streaming spark2

Last synced: 06 Dec 2024

https://github.com/varunu28/aadhar-dataset-analysis

Data analysis of AADHAR dataset using Apache Spark

analysis scala spark spark-sql

Last synced: 08 Nov 2024

https://github.com/ren294/log-analysis-project

This project builds a scalable log analytics pipeline use Lambda architecture for real-time and batch processing of NASA server logs.

apache-kafka apache-nifi apache-spark big-data big-data-analytics cassandra cassandra-driver data-engineering data-science grafana hadoop hadoop-hdfs hive powerbi spark-rdd spark-sql spark-streaming

Last synced: 11 Oct 2024

https://github.com/myxof/sparknotes

Spark 2.0学习笔记

distributed-computing spark spark-sql

Last synced: 15 Oct 2024

https://github.com/flaviostutz/spark-scala-jupyter

Jupyter notebook server prepared for running Spark with Scala kernels on a remote Spark master

hdfs hdfs-cluster hdfs-docker jupyter jupyter-notebook scala scala-spark spark spark-sql

Last synced: 24 Oct 2024

https://github.com/ren294/covid-data-process

This project integrates real-time data processing and analytics using Apache NiFi, Kafka, Spark, Hive, and AWS services for comprehensive COVID-19 data insights.

airflow aws aws-ec2 aws-quicksight big-data big-data-analytics covid19-data docker docker-compose hadoop-hdfs hdfs hive kafka nifi pipeline redpanda spark spark-sql spark-streaming sparksql

Last synced: 11 Oct 2024

https://github.com/mliarakos/spark-typed-ops

Lightweight type-safe operations for Spark

scala scala-macros shapeless spark spark-scala spark-sql

Last synced: 05 Dec 2024

https://github.com/san089/sf-crime-statistics

A Kafka and Spark Streaming Integration project : SF Crime Statistics with Spark Streaming

kafka kafka-consumer kafka-producer kafka-python spark-sql spark-streaming

Last synced: 16 Nov 2024

https://github.com/burhanahmed1/big-data-analytics

Practice tasks in Python programming language using Hadoop, MRJob, PySpark for Big Data Analytics.

apache-spark hadoop hadoop-mapreduce jupyter-notebook mrjob pyspark python spark spark-sql sparksql

Last synced: 11 Oct 2024

https://github.com/multivacplatform/multivac-wikipedia

Wonderful reusable codes, libraries and scripts to process Wikipedia page views by using Apache Spark.

data-frame multivac-wikipedia spark spark-sql wikipedia

Last synced: 13 Nov 2024

https://github.com/emso-exe/comercio_eletronico_brasileiro

Projeto de análise de dados do comércio eletrônico brasileiro disponibilizado pela Olist via plataforma Kaggle.

analise-de-dados ciencia-de-dados data-analytics data-science datascience e-commerce postgres postgresql pyspark python python-3 python3 spark spark-sql sql

Last synced: 15 Nov 2024

https://github.com/ashirwadpradhan/tpsql

Asynchronous execution of parallely executing SQL query

asynchronous-tasks asyncio parallel-processing query-optimization spark-sql sql

Last synced: 16 Nov 2024

https://github.com/lydia-ath/sparklinux

Assignment for Big Data course of MSc

pycharm-community python-programming spark-sql

Last synced: 12 Nov 2024

https://github.com/anant/example-cassandra-spark-sql

Cassandra Data Operations with Spark SQL

cassandra data-operations docker etl spark spark-sql

Last synced: 18 Nov 2024

https://github.com/windi-wulandari/credit-scoring-data-pipeline

This project implements an end-to-end data pipeline designed to manage and analyze large-scale credit scoring data. Using AWS S3 as a scalable storage solution and Databricks for processing, the pipeline leverages the power of Apache Spark through PySpark and SQL Spark to handle data transformation and analysis efficiently.

apache-spark aws aws-s3 credit databricks pyspark spark-sql

Last synced: 07 Nov 2024

https://github.com/angeligareta/spark-hadoop-hbase-overview

First lab for Data-Intensive Computing course at KTH where we are introduced to Apache Spark MLlib and Spark SQL, Hadoop, and HBase.

apache-spark data-intensive hadoop hbase hbase-table id2221 kth scala spark spark-mllib spark-sql

Last synced: 22 Nov 2024

https://github.com/multivacplatform/multivac-pubmed

Update PubMed articles daily on HDFS by using Spark Cluster

apache-spark dataframe hadoop hdfs pubmed pubmed-parser spark-sql yarn

Last synced: 13 Nov 2024

https://github.com/lmouhib/auto-register-spark-ui-k8s

A lightweight operator to automatically expose Spark UI manage its ingress when running Spark on Kubernetes

spark spark-kubernetes spark-sql spark-streaming spark-ui

Last synced: 18 Dec 2024

https://github.com/multivacplatform/multivac-nlp

Testing and benchmarking some of the existing NLP libraries in Apache Spark

nlp spark spark-ml spark-mllib spark-nlp spark-sql stanford-corenlp word2vec

Last synced: 13 Nov 2024

https://github.com/librity/rtjvm_spark_essentials

Rock The JVM - Apache Spark Essentials

apache-spark big-data docker scala spark spark-sql

Last synced: 10 Nov 2024

https://github.com/vitalibo/distributed-heatmap-service

Simple distributed heatmap service on top of Apache HBase

aws hbase hbase-coprocessor heatmap spark spark-sql spring-boot

Last synced: 07 Nov 2024

https://github.com/rakibhhridoy/bigdataanalysiswithapachespark-stockprice

Often we have to deal with large dataset, handling them with traditional method is quite tedious and time consuming. There's come the distributed method like apache spark. This repo consist distributed analysis of stock price which is quite large dataset.

apache-spark big-data pandas pyspark python spark-sql sprk-api stock stock-price-forecasting

Last synced: 06 Nov 2024

https://github.com/harshoza36/movielens_pyspark

MovieLens Dataset analysis using Hadoop and Pyspark

big-data-analytics hadoop movielens movielens-data-analysis pyspark spark spark-sql

Last synced: 12 Nov 2024

https://github.com/manojpawar94/spark-scala-examples

I have implemented the sample programs using apache spark. The programs have developed on the concepts of Spark RDD and Spark SQL Dataframe.

apache-spark spark spark-rdd spark-sql

Last synced: 13 Nov 2024

https://github.com/tuancamtbtx/etl-spark-k8s

ETL With Apache Spark Deployed on K8s

apache k8s spark spark-sql spark-streaming

Last synced: 09 Nov 2024

https://github.com/peteprattis/road-safety-database-with-jdbc-and-spark-rdd

A jdbc application that runs queries in pgAdmin to simulate the functionality of the UK Ministry of Transport's database using Apache Spark RDD for query implementation.

computer-science index java jdbc jdbc-database partitions pgadmin postgresql program query spark spark-sql sparkjava sql student

Last synced: 17 Nov 2024

https://github.com/peteprattis/insurance-company-database-with-jdbc-and-spark-rdd

A jdbc application that runs queries in pgAdmin to simulate the functionality of an insurance company's database using Apache Spark RDD for query implementation.

computer-science java jdbc jdbc-database partitioning partitions postgresql program query spark spark-sql sparkjava sql student

Last synced: 17 Nov 2024

https://github.com/samuelbarbosadev/justweb_technical_test

Esse é um teste técnico para a vaga de Desenvolvedor Python Pleno.

django python spark spark-sql

Last synced: 29 Nov 2024

https://github.com/pregismond/data-analysis-using-spark

Final Project Submission: Data Analysis using Spark

coursera ibm-skills-network pyspark python spark-sql

Last synced: 07 Dec 2024

https://github.com/kayannr/sportstats

Historical Olympics analysis using SQL.

pandasql scala spark-sql sql

Last synced: 22 Nov 2024

https://github.com/sayamalt/amazon-products-api-etl-and-ml-pipeline

In this project, I've created an end-to-end ETL pipeline and subsequently developed a machine learning model to predict the price of Amazon products based on several product-related features.

apache-spark azure-data-factory azure-data-lake-storage-gen2 azure-databricks data-ingestion delta-lake etl-pipeline extract-transform-load feature-engineering linear-regression machine-learning model-training-and-evaluation regression-models spark-mllib spark-sql

Last synced: 26 Nov 2024

https://github.com/mervat-khaled/etl-apache-spark-nyc-taxi-data

The goal of this project is to do some ETL (Extract, Transform, and Load) In NYC Taxi Data and its geographical information Using Apache Spark, performing various transformations using Spark's python API "PySpark" and SQL language. And finally saving the processed data into CSVs file partitioned by the number of executors on spark session.

apache-spark docker-image etl geojson pyspark shaply spark-sql windowfunction

Last synced: 26 Nov 2024

https://github.com/sib-swiss/server-log-analytics

Server Log Analytics in Apache Scala / Spark

analytics logging spark spark-sql

Last synced: 28 Nov 2024

https://github.com/shrikantnaidu/apache-spark

Spark and Spark ML notebooks

spark spark-ml spark-sql udacity

Last synced: 11 Nov 2024

https://github.com/purcellcjp/home_sales

This project demonstrated the usage of SparkSQL to read, query, cache, and analyze home sales data, providing insights into average prices based on various criteria.

big-data cache parquet spark spark-sql sql

Last synced: 03 Dec 2024

https://github.com/adampaternostro/azure-spark-livy

Run a job in Spark 2.x with HDInsight and submit the job through Livy

azure azure-data-lake hdinsight livy spark spark-sql

Last synced: 03 Dec 2024

https://github.com/kaladabrio2020/pyspark-ml-analysis-data

Analises de Dados e machine learning com o Pyspark

analysis-data pyspark pyspark-notebook spark-ml-model spark-sql

Last synced: 16 Dec 2024

https://github.com/007tickooayush/salaries_data_mark1

Scala Data Analysis Project using Spark Sql and Typed Datasets for calculating Salaries according to Job Roles, Location and Companies

hdfs scala spark spark-core spark-sql typed-dataset

Last synced: 05 Dec 2024

https://github.com/29dch/myscalacodesaboutbigdata

My Scala learning code with bigdata

actor akka kafka scala spark spark-sql spark-streaming

Last synced: 11 Nov 2024