An open API service indexing awesome lists of open source software.

Projects in Awesome Lists tagged with structured-streaming

A curated list of projects in awesome lists tagged with structured-streaming .

https://github.com/lw-lin/coolplayspark

酷玩 Spark: Spark 源代码解析、Spark 类库等

apache-spark spark spark-streaming sparkcore structured-streaming

Last synced: 14 May 2025

https://github.com/lw-lin/CoolplaySpark

酷玩 Spark: Spark 源代码解析、Spark 类库等

apache-spark spark spark-streaming sparkcore structured-streaming

Last synced: 04 Apr 2025

https://github.com/databricks/learningsparkv2

This is the github repo for Learning Spark: Lightning-Fast Data Analytics [2nd Edition]

apache-spark delta-lake mlflow mllib spark spark-mllib spark-sql structured-streaming

Last synced: 14 May 2025

https://github.com/polomarcus/spark-structured-streaming-examples

Spark Structured Streaming / Kafka / Cassandra / Elastic

cassandra kafka spark spark-sql structured-streaming

Last synced: 10 Apr 2025

https://github.com/zaleslaw/spark-tutorial

How to build your first Spark application with MLlib, StructuredStreaming, GraphFrames, Datasets and so on? Answer is here!

kafka spark streaming structured-streaming

Last synced: 05 May 2025

https://github.com/heartsavior/spark-sql-kafka-offset-committer

Kafka offset committer for structured streaming query

kafka spark structured-streaming

Last synced: 18 Aug 2025

https://github.com/heartsavior/spark-state-tools

Spark Structured Streaming State Tools

apache-spark structured-streaming

Last synced: 29 Oct 2025

https://github.com/aamend/spark-gdelt

Binding the GDELT universe in a Spark environment

analytics gdelt news parser spark structured-streaming

Last synced: 14 Apr 2025

https://github.com/aws-samples/iceberg-streaming-examples

This repo contains examples of high throughput ingestion using Apache Spark and Apache Iceberg. These examples cover IoT and CDC scenarios using best practices. The code can be deployed into any Spark compatible engine like Amazon EMR Serverless or AWS Glue. A fully local developer environment is also provided.

apache-iceberg apache-spark structured-streaming

Last synced: 29 Oct 2025

https://github.com/qubole/streaminglens

Qubole Streaminglens tool for tuning Spark Structured Streaming Pipelines

cluster-management micro-batches scala sla spark spark-streaming sparklens streaming streaming-pipeline structured-streaming

Last synced: 11 Jul 2025

https://github.com/zekeriyyaa/pyspark-structured-streaming-ros-kafka-apachespark-cassandra

A structured streaming was applied to the robot data from ROS-Gazebo simulation environment using Apache Spark. Data is collected in Kafka, analyzed by Apache Spark and stored in Cassandra.

apache-cassandra apache-kafka apache-spark cqlsh data-analysis kafka-consumer kafka-producer pyspark python python3 ros ros-noetic spark-cassandra spark-cassandra-connector spark-kafka-connector spark-kafka-integration spark-sql spark-streaming structured-streaming

Last synced: 30 Jun 2025

https://github.com/neuw84/structured-streaming-avro-demo

Spark 3.0.0 Structured Streaming Kafka Avro Demo

java spark spark-streaming structured-streaming

Last synced: 02 Apr 2025

https://github.com/qubole/s3-sqs-connector

A library for reading data from Amzon S3 with optimised listing using Amazon SQS using Spark SQL Streaming ( or Structured streaming).

s3 scala spark spark-streaming sqs streaming structured-streaming

Last synced: 11 Jul 2025

https://github.com/epishova/structured-streaming-cassandra-sink

An example of how to create and use Cassandra sink in Spark Structured Streaming application

cassandra scala sink spark structured-streaming

Last synced: 09 Jul 2025

https://github.com/rishav273/kafkapysparkanalytics

Real-time ETL pipeline for financial data (kafka, pyspark) .

apache-kafka apache-spark pyspark streaming-analytics structured-streaming

Last synced: 11 Mar 2026

https://github.com/trainingbypackt/big-data-processing-with-apache-spark-elearning

Efficiently tackle large datasets and perform big data analysis with Spark and Python

dataset python rdds spark spark-mllib structured-streaming

Last synced: 10 Apr 2025

https://github.com/sebastianruizm/spark-kafka-cassandra

Demo Spark Structured Streaming + Apache Kafka + Apache Cassandra

cassandra docker kafka spark structured-streaming

Last synced: 26 Apr 2025

https://github.com/neuw84/bds2k17

Repository containing code for the Big Data Spain 2017 technical talk "Towards an Unified API for Spark and the IIoT" Edit

cassandra docker kafka nifi spark structured-streaming zeppelin

Last synced: 09 Apr 2026

https://github.com/sankamuk/aws-kinesis-redshift-sparkstream

Spark Structured Streaming from AWS Kinesis and Redshift

aws kinesis pyspark redshift spark structured-streaming terraform

Last synced: 07 May 2026

https://github.com/rezacsedu/Mining-Maximal-Frequent-Pattern-Spark

Implementation of Static mining part of "Mining maximal frequent patterns in transactional databases and dynamic data streams: A spark-based approach" Information Sciences, Volume 432, March 2018, Pages 278-300

data-mining data-stream frequent-pattern-mining java maximal-frequent-pattern spark structured-streaming

Last synced: 26 Mar 2025

https://github.com/neuw84/spark-continuous-streaming

Spark 2.3. End to End Avro Continous Structured Streaming Kafka demo using Twitter´s bijection in Java.

avro bijection kafka spark sparksql structured-streaming

Last synced: 07 Mar 2026

https://github.com/moto123a/real-time-payment-fraud-detection-platform

Enterprise real-time payment fraud detection platform using Kafka, Spark Structured Streaming, Airflow, Iceberg lakehouse concepts, and Redshift-style marts.

airflow data-engineering etl fraud-detection iceberg kafka lakehouse python redshift spark sql streaming structured-streaming

Last synced: 03 Jun 2026

https://github.com/moto123a/enterprise-rail-freight-data-platform

Enterprise-style real-time rail freight data platform using Kafka, Spark Structured Streaming, Airflow Bronze/Silver/Gold, Trino SQL KPIs, and Redshift star schema marts.

airflow data-engineering delta-lake etl iceberg kafka lakehouse python redshift spark sql star-schema streaming structured-streaming trino

Last synced: 03 Jun 2026

https://github.com/seilylook/spark_definition_guide_ch_3

Spark: The Definition Guide - Chapter 3

spark structured-streaming

Last synced: 06 Mar 2026

https://github.com/sleenguyen/spark-logs-analysis

This is an academic project which aim to create a data streaming pipeline using Spark Structured Streaming, Elasticsearch and Kibana.

elasticsearch kibana spark structured-streaming structured-streaming-elasticsearch

Last synced: 10 Apr 2026

https://github.com/igopalakrishna/nyc-subway-foot-traffic-prediction-and-forecasting

Designed and implemented a scalable real-time analytics pipeline using Apache Kafka, Spark Structured Streaming, and MongoDB to simulate NYC MTA turnstile data and forecast real-time subway foot traffic using SparkML Random Forest models.

apache-kafka apache-spark bash big-data data-pipeline foot-traffic machine-learning mongodb nyc-mta prediction pyspark python random-forest real-time-streaming spark-sql sparkml streaming-data structured-streaming time-series-forecasting

Last synced: 18 May 2026