Projects in Awesome Lists tagged with structured-streaming
A curated list of projects in awesome lists tagged with structured-streaming .
https://github.com/lw-lin/coolplayspark
酷玩 Spark: Spark 源代码解析、Spark 类库等
apache-spark spark spark-streaming sparkcore structured-streaming
Last synced: 14 May 2025
https://github.com/lw-lin/CoolplaySpark
酷玩 Spark: Spark 源代码解析、Spark 类库等
apache-spark spark spark-streaming sparkcore structured-streaming
Last synced: 04 Apr 2025
https://github.com/databricks/learningsparkv2
This is the github repo for Learning Spark: Lightning-Fast Data Analytics [2nd Edition]
apache-spark delta-lake mlflow mllib spark spark-mllib spark-sql structured-streaming
Last synced: 14 May 2025
https://github.com/japila-books/spark-structured-streaming-internals
The Internals of Spark Structured Streaming
apache-spark book internals mkdocs-material spark structured-streaming
Last synced: 05 Apr 2025
https://github.com/azure/azure-event-hubs-spark
Enabling Continuous Data Processing with Apache Spark and Azure Event Hubs
apache apache-spark azure bigdata connector continuous databricks event-hubs eventhubs ingestion kafka microsoft real-time scala spark spark-streaming stream streaming structured-streaming
Last synced: 13 Feb 2026
https://github.com/polomarcus/spark-structured-streaming-examples
Spark Structured Streaming / Kafka / Cassandra / Elastic
cassandra kafka spark spark-sql structured-streaming
Last synced: 10 Apr 2025
https://github.com/qubole/kinesis-sql
Kinesis Connector for Structured Streaming
kinesis real-time-processing spark spark-streaming spark-structured-streaming structured-streaming
Last synced: 08 Apr 2025
https://github.com/streamnative/pulsar-spark
Spark Connector to read and write with Pulsar
apache-pulsar apache-spark batch-processing data-processing data-science flink spark spark-sql stream-processing structured-streaming
Last synced: 06 Feb 2026
https://github.com/chermenin/spark-states
Custom state store providers for Apache Spark
apache apache-spark spark spark-streaming spark-structured-streaming state state-store stateful structured-streaming
Last synced: 05 Apr 2025
https://github.com/ibm/kafka-streaming-click-analysis
Use Kafka and Apache Spark streaming to perform click stream analytics
apache-spark clickstream data-science ibm-data-science-experience ibmcode jupyter-notebook kafka spark structured-streaming
Last synced: 03 Oct 2025
https://github.com/astrolabsoftware/fink-broker
Astronomy Broker based on Apache Spark
alerts apache-hbase apache-kafka apache-spark astronomy structured-streaming
Last synced: 17 Jan 2026
https://github.com/zaleslaw/spark-tutorial
How to build your first Spark application with MLlib, StructuredStreaming, GraphFrames, Datasets and so on? Answer is here!
kafka spark streaming structured-streaming
Last synced: 05 May 2025
https://github.com/heartsavior/spark-sql-kafka-offset-committer
Kafka offset committer for structured streaming query
kafka spark structured-streaming
Last synced: 18 Aug 2025
https://github.com/sankamuk/pysparkcheatsheet
PySpark Cheatsheet
apache-spark deltalake python structured-streaming
Last synced: 06 May 2025
https://github.com/heartsavior/spark-state-tools
Spark Structured Streaming State Tools
apache-spark structured-streaming
Last synced: 29 Oct 2025
https://github.com/aamend/spark-gdelt
Binding the GDELT universe in a Spark environment
analytics gdelt news parser spark structured-streaming
Last synced: 14 Apr 2025
https://github.com/aws-samples/iceberg-streaming-examples
This repo contains examples of high throughput ingestion using Apache Spark and Apache Iceberg. These examples cover IoT and CDC scenarios using best practices. The code can be deployed into any Spark compatible engine like Amazon EMR Serverless or AWS Glue. A fully local developer environment is also provided.
apache-iceberg apache-spark structured-streaming
Last synced: 29 Oct 2025
https://github.com/qubole/spark-state-store
Rocksdb state storage implementation for Structured Streaming.
performance qubole real-time-processing rocksdb scalability spark state-management streaming structured-streaming
Last synced: 11 Jul 2025
https://github.com/qubole/streaminglens
Qubole Streaminglens tool for tuning Spark Structured Streaming Pipelines
cluster-management micro-batches scala sla spark spark-streaming sparklens streaming streaming-pipeline structured-streaming
Last synced: 11 Jul 2025
https://github.com/zekeriyyaa/pyspark-structured-streaming-ros-kafka-apachespark-cassandra
A structured streaming was applied to the robot data from ROS-Gazebo simulation environment using Apache Spark. Data is collected in Kafka, analyzed by Apache Spark and stored in Cassandra.
apache-cassandra apache-kafka apache-spark cqlsh data-analysis kafka-consumer kafka-producer pyspark python python3 ros ros-noetic spark-cassandra spark-cassandra-connector spark-kafka-connector spark-kafka-integration spark-sql spark-streaming structured-streaming
Last synced: 30 Jun 2025
https://github.com/neuw84/structured-streaming-avro-demo
Spark 3.0.0 Structured Streaming Kafka Avro Demo
java spark spark-streaming structured-streaming
Last synced: 02 Apr 2025
https://github.com/qubole/s3-sqs-connector
A library for reading data from Amzon S3 with optimised listing using Amazon SQS using Spark SQL Streaming ( or Structured streaming).
s3 scala spark spark-streaming sqs streaming structured-streaming
Last synced: 11 Jul 2025
https://github.com/yjshen/spark-connector-test
A tutorial on how to use pulsar-spark-connector
apache-pulsar apache-spark pulsar-spark-connector sparksql structured-streaming
Last synced: 15 Apr 2025
https://github.com/epishova/structured-streaming-cassandra-sink
An example of how to create and use Cassandra sink in Spark Structured Streaming application
cassandra scala sink spark structured-streaming
Last synced: 09 Jul 2025
https://github.com/rishav273/kafkapysparkanalytics
Real-time ETL pipeline for financial data (kafka, pyspark) .
apache-kafka apache-spark pyspark streaming-analytics structured-streaming
Last synced: 11 Mar 2026
https://github.com/trainingbypackt/big-data-processing-with-apache-spark-elearning
Efficiently tackle large datasets and perform big data analysis with Spark and Python
dataset python rdds spark spark-mllib structured-streaming
Last synced: 10 Apr 2025
https://github.com/lamastex/spark-trend-calculus-examples
Example applications of spark-trend-calculus
apache-spark arbitrary-order-markov-processes delta-lake finance multiple-resolutions structured-streaming time-series trend-reversals trends
Last synced: 14 Apr 2025
https://github.com/astrolabsoftware/fink
Fink documentation website
alerts apache-hbase apache-kafka apache-spark astronomy structured-streaming
Last synced: 30 Apr 2025
https://github.com/sebastianruizm/spark-kafka-cassandra
Demo Spark Structured Streaming + Apache Kafka + Apache Cassandra
cassandra docker kafka spark structured-streaming
Last synced: 26 Apr 2025
https://github.com/neuw84/bds2k17
Repository containing code for the Big Data Spain 2017 technical talk "Towards an Unified API for Spark and the IIoT" Edit
cassandra docker kafka nifi spark structured-streaming zeppelin
Last synced: 09 Apr 2026
https://github.com/amitnema/spark-coach
This project contains the learning and experiments with the Apache Spark.
scala spark spark-sql spark-streaming sparksql stream stream-processing streaming streams structured-streaming structured-streaming-kafka
Last synced: 18 May 2026
https://github.com/sankamuk/aws-kinesis-redshift-sparkstream
Spark Structured Streaming from AWS Kinesis and Redshift
aws kinesis pyspark redshift spark structured-streaming terraform
Last synced: 07 May 2026
https://github.com/rezacsedu/Mining-Maximal-Frequent-Pattern-Spark
Implementation of Static mining part of "Mining maximal frequent patterns in transactional databases and dynamic data streams: A spark-based approach" Information Sciences, Volume 432, March 2018, Pages 278-300
data-mining data-stream frequent-pattern-mining java maximal-frequent-pattern spark structured-streaming
Last synced: 26 Mar 2025
https://github.com/neuw84/spark-continuous-streaming
Spark 2.3. End to End Avro Continous Structured Streaming Kafka demo using Twitter´s bijection in Java.
avro bijection kafka spark sparksql structured-streaming
Last synced: 07 Mar 2026
https://github.com/devarshpatel1506/geospatial-analysis-with-spark
Low-Latency Event-Time Analytics: Kafka + Spark Structured Streaming + deck.gl
analytics big-data data-engineering deckgl kafka mongodb nodejs react react-vis realtime-streaming spark structured-streaming websocket
Last synced: 09 Apr 2026
https://github.com/moto123a/real-time-payment-fraud-detection-platform
Enterprise real-time payment fraud detection platform using Kafka, Spark Structured Streaming, Airflow, Iceberg lakehouse concepts, and Redshift-style marts.
airflow data-engineering etl fraud-detection iceberg kafka lakehouse python redshift spark sql streaming structured-streaming
Last synced: 03 Jun 2026
https://github.com/moto123a/enterprise-rail-freight-data-platform
Enterprise-style real-time rail freight data platform using Kafka, Spark Structured Streaming, Airflow Bronze/Silver/Gold, Trino SQL KPIs, and Redshift star schema marts.
airflow data-engineering delta-lake etl iceberg kafka lakehouse python redshift spark sql star-schema streaming structured-streaming trino
Last synced: 03 Jun 2026
https://github.com/seilylook/spark_definition_guide_ch_3
Spark: The Definition Guide - Chapter 3
Last synced: 06 Mar 2026
https://github.com/sleenguyen/spark-logs-analysis
This is an academic project which aim to create a data streaming pipeline using Spark Structured Streaming, Elasticsearch and Kibana.
elasticsearch kibana spark structured-streaming structured-streaming-elasticsearch
Last synced: 10 Apr 2026
https://github.com/igopalakrishna/nyc-subway-foot-traffic-prediction-and-forecasting
Designed and implemented a scalable real-time analytics pipeline using Apache Kafka, Spark Structured Streaming, and MongoDB to simulate NYC MTA turnstile data and forecast real-time subway foot traffic using SparkML Random Forest models.
apache-kafka apache-spark bash big-data data-pipeline foot-traffic machine-learning mongodb nyc-mta prediction pyspark python random-forest real-time-streaming spark-sql sparkml streaming-data structured-streaming time-series-forecasting
Last synced: 18 May 2026