Projects in Awesome Lists tagged with data-streaming
A curated list of projects in awesome lists tagged with data-streaming .
https://github.com/strimzi/strimzi-kafka-operator
Apache Kafka® running on Kubernetes
data-stream data-streaming data-streams hacktoberfest kafka kafka-connect kafka-streams kubernetes kubernetes-controller kubernetes-operator messaging openshift
Last synced: 14 May 2025
https://github.com/strimzi/strimzi
Apache Kafka® running on Kubernetes
data-stream data-streaming data-streams hacktoberfest kafka kafka-connect kafka-streams kubernetes kubernetes-controller kubernetes-operator messaging openshift
Last synced: 05 Mar 2025
https://github.com/superstreamlabs/memphis
Memphis.dev is a highly scalable and effortless data streaming platform
data data-engineering data-pipeline data-stream-processing data-streaming enrichment golang kubernetes message-broker message-bus message-queue messaging-queue microservices schema-registry
Last synced: 12 Jan 2026
https://github.com/apache/inlong
Apache InLong - a one-stop, full-scenario integration framework for massive data
data-streaming event-streaming framework full-scenario-service inlong massive-data-integration one-stop-service
Last synced: 11 Jan 2026
https://github.com/linkedin/brooklin
An extensible distributed system for reliable nearline data streaming at scale
change-data-capture data-streaming distributed-systems java kafka kafka-mirror-maker linkedin scalability
Last synced: 17 Aug 2025
https://github.com/linkedin/Brooklin
An extensible distributed system for reliable nearline data streaming at scale
change-data-capture data-streaming distributed-systems java kafka kafka-mirror-maker linkedin scalability
Last synced: 12 Mar 2025
https://github.com/touk/nussknacker
Low-code tool for automating actions on real time data | Stream processing for the users.
apache-flink automation big-data data-streaming decision-engine decision-making decisioning flink flink-kafka gui kafka low-code lowcode real-time rules-engine scala stream-processing streaming touk
Last synced: 14 May 2025
https://github.com/TouK/nussknacker
Low-code tool for automating actions on real time data | Stream processing for the users.
apache-flink automation big-data data-streaming decision-engine decision-making decisioning flink flink-kafka gui kafka low-code lowcode real-time rules-engine scala stream-processing streaming touk
Last synced: 28 Mar 2025
https://github.com/getindata/flink-http-connector
Http Connector for Apache Flink. Provides sources and sinks for Datastream , Table and SQL APIs.
data-streaming flink flink-sql flink-stream-processing java
Last synced: 12 Jan 2026
https://github.com/getindata/dbt-flink-adapter
Adapter for dbt that executes dbt pipelines on Apache Flink
apache-flink data-streaming dbt streaming-analytics
Last synced: 07 May 2025
https://github.com/dagshub/client
DagsHub client libraries
ai data data-science data-streaming dvc hacktoberfest hacktoberfest2023 keras machine-learning machinelearning mlops python pytorch tensorflow
Last synced: 16 May 2025
https://github.com/eserie/wax-ml
A Python library for machine-learning and feedback loops on streaming data
data-streaming jax machine-learning pandas python reinforcement-learning time-series xarray
Last synced: 26 Apr 2026
https://github.com/pravega/pravega-samples
Sample Applications for Pravega.
data-streaming pravega sample-app streaming-data
Last synced: 27 Apr 2025
https://github.com/build-on-aws/building-apache-kafka-connectors
Sample code that shows the important aspects of developing custom connectors for Kafka Connect. It provides the resources for building, deploying, and running the code on-premises using Docker, as well as running the code in the cloud.
amazon-msk amazon-msk-connect apache-kafka data-streaming java kafka-connect kafka-connector terraform
Last synced: 10 Apr 2025
https://github.com/factorhouse/factorhouse-local
Docker Compose environments for developing modern data platform architectures using Kafka, Flink, Spark, Iceberg, OpenLineage, OpenMetadata, Pinot, ClickHouse, StarRocks + Kpow & Flex by Factor House
clickhouse data-streaming developer-tools docker docker-compose flex flink flink-streaming grafana iceberg kafka kpow lakehouse monitoring openlineage openmetadata pinot postgresql prometheus spark
Last synced: 07 Apr 2026
https://github.com/strimzi/strimzi-canary
Strimzi canary
data-streaming hacktoberfest kafka kubernetes messaging openshift
Last synced: 13 Apr 2025
https://github.com/jaehyeon-kim/flink-demos
Apache Flink (Pyflink) and Related Projects
aws data-streaming docker docker-compose flink kafka kinesis-data-analytics pyflink python real-time-analytics
Last synced: 26 Mar 2025
https://github.com/jaehyeon-kim/kafka-pocs
Apache Kafka and Related Projects
aws data-streaming docker docker-compose kafka kafka-connect msk msk-connect python real-time-analytics
Last synced: 30 Jul 2025
https://github.com/streamnative/streamnative-mcp-server
Developer-friendly MCP server bridging Kafka and Pulsar protocols—built with ❤️ by StreamNative for an agentic, streaming-first future.
apache-kafka apache-pulsar data-streaming mcp mcp-server streamnative
Last synced: 04 Apr 2026
https://github.com/kenthsu/udacity-data-streaming-nanodegree
Udacity Data Streaming Nanodegree Program
apache-kafka data-streaming faust-application kafka-connect kafka-rest-proxy ksql spark-streaming
Last synced: 11 Jul 2025
https://github.com/marian-nmt/sotastream
A library for data streaming and augmentation
data-augmentation data-streaming machine-learning pretraining
Last synced: 29 Jul 2025
https://github.com/ppatierno/kafka-hybrid-iot
Apache Kafka for the Hybrid IoT
amqp apache-camel apache-kafka data-stream-processing data-streaming internet-of-things iot iot-application iot-cloud kafka-streams messaging vertx
Last synced: 18 Mar 2025
https://github.com/jlumbroso/python-random-hash
A simple, time-tested, family of random hash functions in Python, based on CRC32 and xxHash, affine transformations, and the Mersenne Twister. 🎲
analysis-of-algorithms analytic-combinatorics data-streaming flajolet flajolet-martin hash-functions hyperloglog python randomized-algorithm streaming-algorithms
Last synced: 26 Jul 2025
https://github.com/maswag/monaa
A Tool for Timed Patten Matching with Automata-Based Acceleration
automata data-streaming formal-specification monitoring monitoring-tool regular-expression runtime-verification
Last synced: 18 Jul 2025
https://github.com/jlumbroso/java-random-hash
A simple, time-tested, family of random hash functions in Java, based on CRC32, affine transformations, and the Mersenne Twister. 🎲
data-streaming flajolet flajolet-martin hash-functions hyperloglog java
Last synced: 26 Jul 2025
https://github.com/lexvicacom/monoblok
monoblok is a NATS core-style messaging broker with a built-in stream processing DSL
data-streaming dsl edge edge-computing edn financial-data industrial-iot iot market-data monoblok nats pubsub stream-processing telemetry ticker-data trading-servers
Last synced: 24 May 2026
https://github.com/zabir-nabil/picast
A lightweight fast data streaming library for raspberry pi in python.
data-streaming raspberry-pi sensors-data-collection server-client-communication video-streaming
Last synced: 26 Jul 2025
https://github.com/improvetheworld/dataflow.net
Unified sync/async stream processing with category-based filtering. Single API for IEnumerable/IAsyncEnumerable with Cases/SelectCase/ForEachCase pattern. Eliminates Rx complexity, enables elegant pipeline composition for real-time data processing.
async-await async-enumerable category-filtering csharp data-streaming dotnet event-processing fluent-api functional-programming linq-extensions observables-alternative pipeline-composition- reactive-programming stream-processing
Last synced: 10 Feb 2026
https://github.com/aymane-maghouti/real-time-data-pipeline-using-kafka
This project implements a real-time data pipeline using Apache Kafka, Python's psutil library for metric collection, and SQL Server for data storage. The pipeline collects metrics data from the local computer, processes it through Kafka brokers, and loads it into a SQL Server database. Additionally, a real-time dashboard is created using Power BI.
apache-kafka data-collection data-streaming data-visualization powerbi python real-time real-time-data-pipeline
Last synced: 05 Jul 2025
https://github.com/conduitio/streaming-benchmarks
Benchmarks for Conduit and other data streaming tools.
benchmark conduit data-streaming
Last synced: 30 Jul 2025
https://github.com/jlumbroso/affirmative-sampling
Reference implementation of the Affirmative Sampling algorithm by Jérémie Lumbroso and Conrado Martínez (2022). 🍀
affirmative-sampling cardinality-estimation data-streaming probabilistic-algorithm random-sampling
Last synced: 26 Oct 2025
https://github.com/daq-tools/lorrystream
A lightweight and polyglot stream-processing library, to be used as a data backplane-, message relay-, or pipeline-subsystem.
amqp broker cratedb data-stream data-stream-processing data-stream-processing-framework data-streaming kotori-daq message-broker message-bus message-queue mosquitto mqtt pandas sqlalchemy stream streaming streamz zeromq zmq
Last synced: 30 Apr 2025
https://github.com/codeterrayt/streamguard
StreamGuard is a high-performance data management script using Kafka and MongoDB to efficiently handle and process real-time data streams. Ideal for scenarios like live GPS tracking, it features real-time data processing, reduced database load, and bulk data insertion.
api-integration bulk-insert-query-optimization data-management data-streaming event-driven-architecture high-velocity-data kafka microservices mongodb nodejs nodejs-kafkajs performance-optimization real-time-data-processing scalability system-design system-design-project zookeeper
Last synced: 27 Feb 2026
https://github.com/thiagobarradas/logstash-beats-demo
Elastic Stack with Nginx, Logstash and Beats demo
beats data data-pipepline data-processing data-streaming demo elastic elasticsearch logstash
Last synced: 26 Jun 2025
https://github.com/improveTheWorld/DataFlow.NET
Unified sync/async stream processing with category-based filtering. Single API for IEnumerable/IAsyncEnumerable with Cases/SelectCase/ForEachCase pattern. Eliminates Rx complexity, enables elegant pipeline composition for real-time data processing.
async-await async-enumerable category-filtering csharp data-streaming dotnet event-processing fluent-api functional-programming linq-extensions observables-alternative pipeline-composition- reactive-programming stream-processing
Last synced: 17 Jul 2025
https://github.com/sidiahmedhabib/e2e-data-engineering
This project is an end-to-end data engineering pipeline that orchestrates data ingestion, processing, and storage using a variety of powerful tools including Apache Airflow, Apache Kafka, Apache Spark and Cassandra. All components are containerized with Docker for easy deployment and scalability.
apache-airflow apache-kafka apache-spark big-data cassandra data-engineering data-streaming
Last synced: 20 Jul 2025
https://github.com/j3-signalroom/ccaf-kickstarter-flight_consolidator_app-lambda
Demonstrates a best practice implementation for using an AWS Lambda function to deploy a Flink Job Graph to Confluent Cloud for Apache Flink.
apache-kafka apacheflink aws-lambda aws-secrets-manager confluentcloud data-streaming flink kafka
Last synced: 27 Feb 2026
https://github.com/dataflow-operator/dataflow
DataFlow Operator is a Kubernetes operator for streaming data between different data sources with support for message transformations.
clickhouse data-processing data-streaming dataflow etl kafka nessie postgresql trino
Last synced: 08 Jun 2026
https://github.com/dataphos/lib-streamproc
A Go library that exposes executors, interfaces, data structures, and utility functions which combined a universal stream processor, invariant to any specific messaging system.
cloud-native data-stream data-streaming go library messaging
Last synced: 25 Feb 2026
https://github.com/gunnarmorling/streaming-examples
Example projects and demos around data streaming , stream processing, change data capture, and more.
change-data-capture data-engineering data-streaming stream-processing
Last synced: 28 Apr 2025
https://github.com/j3-signalroom/ccaf_kickstarter-flight_consolidator_app-lambda
Demonstrates a best practice implementation for using an AWS Lambda function to deploy a Flink Job Graph to Confluent Cloud for Apache Flink.
apache-kafka apacheflink aws-lambda aws-secrets-manager confluentcloud data-streaming flink kafka
Last synced: 15 Apr 2025
https://github.com/zainali104/distributed-file-system-go
This project offers a peer-to-peer content-addressable distributed file storage in Go with a peer-to-peer library built on top of TCP from scratch. It also supports data encryption during storage and transmission
content-addressed-storage data-streaming distributed-file-system distributed-systems go go-routine golang large-file-transfers peer-to-peer tcp
Last synced: 22 Feb 2026
https://github.com/beam-pyio/firehose_pyio
Apache Beam Python I/O connector for Amazon Data Firehose
apache-beam aws data-engineering data-streaming firehose python
Last synced: 05 May 2025
https://github.com/nyoungstudios/multiflow
A Python multithreading library for data processing pipelines, data streaming, etc.
concurrency data-streaming multithreading python thread-pool
Last synced: 14 Mar 2025
https://github.com/pirate-emperor/k2bq
K2BQ is a dataflow pipeline that streams data from Kafka to BigQuery. It uses Google Cloud’s managed Kafka, Dataflow for processing, and BigQuery for real-time analytics, offering scalable, automated data integration for fast insights.
bigquery cloud-computing cloud-infrastructure data-integration data-streaming dataflow google-cloud infrastructure-as-code kafka python realtime-analytics terraform
Last synced: 28 Jan 2026
https://github.com/dataphos/lib-brokers
lib-brokers is a Go library which contains the interfaces used to interact with messaging systems without relying on a specific technology or client library. This library attempts to solve the issue of properly abstracting away the interaction between applications and messaging systems.
cloud-native data-stream data-streaming go jetstream kafka library messaging pubsub pulsar servicebus
Last synced: 22 May 2026
https://github.com/beam-pyio/dynamodb_pyio
Apache Beam Python I/O connector for Amazon DynamoDB
apache-beam aws data-engineering data-streaming dynamodb python
Last synced: 04 Jan 2026
https://github.com/beam-pyio/sqs_pyio
Apache Beam Python I/O connector for Amazon SQS
apache-beam aws data-engineering data-streaming python sqs
Last synced: 05 Jan 2026
https://github.com/mauriciovazquezm/spark_bigdata_architecture_project
Final project for the course 'Architecture for Large Data Volumes', taught in the Bachelor's program in Data Science at ITAM
data-stream-processing data-streaming pyspark python spark time-series
Last synced: 09 May 2026
https://github.com/bdbao/etl-randuser
An end-to-end data engineering pipeline that orchestrates data ingestion, processing, and storage using Apache Airflow, Python, Apache Kafka, Apache Zookeeper, Apache Spark, and Cassandra.
apache-airflow apache-kafka apache-spark api cassandra data-streaming docker-compose postgresql python
Last synced: 14 Apr 2026
https://github.com/aleximb/automl-streams-research-paper
AutoML Techniques for Data Streams - Research Paper
Last synced: 19 Mar 2026
https://github.com/bdbao/kafka-vm
This project demonstrates a basic Kafka implementation: using the kafka-python library via Ubuntu virtual machine; and Change Data Capture (CDC) between 2 DBMS via Docker.
apache-kafka data-streaming docker mysql postgresql python
Last synced: 16 Apr 2026
https://github.com/dumbremadhura/apache-kafka-city-bike-project
Real-time data pipeline using Apache Kafka and Python to stream Citi Bike NYC station data. Demonstrates producing and consuming messages via Kafka, containerized with Docker. Built as part of a hands-on Kafka learning project.
citibikenyc data-streaming docker gbfs kafka kafka-python real-time
Last synced: 29 Apr 2026
https://github.com/joeycumines/go-longpoll
Package longpoll supports batching e.g. receiving as many values as possible from a channel.
asynchronous backend-development batch-processing channel channels concurrency context-handler data-streaming error-handling event-driven go golang goroutines long-polling message-queue microbatch performance-optimization real-time streaming-data timeout-manager
Last synced: 25 Mar 2025
https://github.com/shiningflash/kafka-python-messaging-engine
A Python-based Kafka messaging engine with Avro serialization, Schema Registry integration, and Redpanda support for real-time data streaming.
avro confluent-kafka data-pipeline data-streaming event-driven event-driven-architecture kafka kafka-python python real-time-data real-time-messaging redpanda schema-registry
Last synced: 15 May 2026
https://github.com/lupusruber/crypto_stats
A project that provides a cloud-native solution for ingesting, transforming, and visualizing cryptocurrency data, utilizing modern tools and workflows for scalability and automation.
data-engineering data-streaming etl-pipeline gcp terraform
Last synced: 15 May 2026
https://github.com/serdaraltin/fusion-bridge
Establishes and manages communication between different hardware components and software layers, ensuring seamless data exchange and synchronization.
data-streaming motion-tracking protocol-interfaces real-time-communication serial-communication
Last synced: 14 Mar 2025
https://github.com/omerada/kafka-rabbitmq-redis-elastichsearch-turkce-kaynak
Kafka, RabbitMQ, Redis ve Elasticsearch ile modern dağıtık sistemler için kapsamlı teknik dökümantasyon ve örnekler.
cache cloud-native data-streaming distributed-systems elasticsearch event-driven-architecture kafka message-queue microservices nosql rabbitmq real-time-data redis scalable-systems search-engine turkce-kaynak
Last synced: 06 May 2026
https://github.com/richardbnk/telegram_tool
Utility functions for seamless message streaming and automation with Telegram.
data-streaming messaging streaming telegram telegram-bot
Last synced: 08 Oct 2025
https://github.com/ppatierno/devday-meet-apache-kafka
Meet Apache Kafka
apache-kafka data-streaming event-stream internet-of-things iot kafka kafka-connect kafka-streams messaging
Last synced: 20 Jan 2026
https://github.com/cloaky233/datastreams
Real-time cat/dog image classifier using Kafka and CNN. Sender uploads images to Kafka, receiver processes with pre-trained model, returns predictions via Kafka. Demonstrates distributed, scalable image processing with instant feedback. Uses TensorFlow, OpenCV, and Kafka for efficient, asynchronous communication.
binary-classification data-streaming kafka kafka-basic
Last synced: 11 Oct 2025
https://github.com/pranavbarthwal/kafka
Kafka is an open-source software platform for storing, processing, and analyzing streaming data in real time. It's used to build data pipelines and applications that can adapt to data streams.
apache-kafka data-streaming distributed-systems documentation kafka system-design
Last synced: 16 Jun 2025
https://github.com/omalperera/thermosensor-data-streamer
data-simulation data-streaming iot json-records real-time scala
Last synced: 27 Oct 2025
https://github.com/tashi-2004/apache-flink-spark-data-streaming
This project showcases a real-time data streaming pipeline using Apache Flink, Apache Spark, and Grafana. It streams data, stores it in Parquet format, and performs aggregations for insights, with seamless visualization via Grafana dashboards.
apache-flink apache-spark data-aggregation data-analysis data-science data-streaming data-visualization flink flink-stream-processing flink-streaming grafana-dashboard grafana-plugin pyflink python3
Last synced: 09 Feb 2026
https://github.com/pyladiesams/kafka-clients-processing-data-streams-aug2025
Learn how to set up a Kafka producer client, then process the data to make it ready for downstream consumers. Discuss the basic of Kafka and get a handle on the different ways to process the data.
data-stream-processing data-streaming kafka kafka-consumer kafka-producer
Last synced: 20 Jul 2025
https://github.com/night-fury-me/real-time-vehicle-data-processing
A repository that contains implementation of a Real-Time Vehicle Data Processing Pipeline that efficiently manages and analyzes vehicle data through a cohesive system.
bigquery cpp data-engineering data-streaming flink grpc kafka python real-time-data-processing
Last synced: 02 Jan 2026
https://github.com/mitgar14/etl-workshop-3
Workshop #3 (Machine Learning and Data Streaming) for the ETL course using scikit-learn to develop the ML model and Apache Kafka to manage the data streaming process.
data-enginner data-science data-streaming etl kafka machine-learning pandas python sklearn sqlalchemy
Last synced: 16 Apr 2026
https://github.com/factorhouse/apac-roadshow-2026
Building Resilient Event-Driven Systems with Kafka and Flink workshop
cdc change-data-capture data-streaming debezium ecommerce event-driven factorhouse flink instaclustr kafka postgresql workshop
Last synced: 14 Feb 2026
https://github.com/ayemunhossain/stream-buffers-in-nodejs
This project focuses on implementing and demonstrating how stream and buffer works along together in nodejs.
buffers data-stream-processing data-streaming nodejs nodejs-buffers nodejs-streams stream streams-buffers
Last synced: 02 May 2026
https://github.com/to-infinitee/real-time-data-system-arch
The architecture ingests data via Kafka, processes it in real-time with Spark Streaming, and stores it in Cassandra and Hadoop HDFS. It supports direct data push to apps using WebSockets/HTTP Streaming, with a front-end built on Spring Boot, Bootstrap.js, and Chart.js.
backend data-streaming frontend kafka real-time rest-api websocket
Last synced: 05 Mar 2026
https://github.com/richardbnk/raspberry_pi
Python Operations for Raspberry Pi in Internet of Things (IoT) Applications
data-processing data-streaming dht11-temperature-sensor hivemq hivemq-cloud raspberry-pi relay-switch
Last synced: 17 Apr 2026
https://github.com/richardbnk/hive_mq_streaming
Streaming data seamlessly using the HiveMQ Broker for efficient communication and IoT integration.
data-streaming hive hivemq hivemq-cloud hivemq-mqtt-client
Last synced: 27 Mar 2025
https://github.com/hiabhishek1888/ninjafilesystem
This project is a peer-to-peer decentralized file storage in Go.
data-streaming decentralized-applications distributed-file-system go go-routine golang peer-to-peer-file-sharing tpc
Last synced: 29 Apr 2026
https://github.com/akurgat/energy_trading_prediction
data-streaming matplotlib mlp-classifier mlp-regressor python tensorflow
Last synced: 04 Jul 2025
https://github.com/eli64s/pyflink-poc
PyFlink data stream processing utilities 🐿
apache-flink data-stream-processing data-streaming data-streams pyflink real-time-data
Last synced: 08 Apr 2025
https://github.com/intogit/ninjafilesystem
This project is a peer-to-peer distributed file storage in Go.
content-addressable-storage data-streaming distributed-file-system go go-routine golang peer-to-peer-file-sharing tpc
Last synced: 18 Mar 2025
https://github.com/thaitechtales/kafka
This repository serves as a collection of projects demonstrating expertise in Apache Kafka, a distributed event-streaming platform. The projects aim to highlight real-time data integration and stream processing solutions.
apache-kafka data-streaming distributed-systems event-driven kafka messaging real-time stream-processing
Last synced: 22 Feb 2026