An open API service indexing awesome lists of open source software.

Projects in Awesome Lists tagged with data-streaming

A curated list of projects in awesome lists tagged with data-streaming .

https://github.com/apache/inlong

Apache InLong - a one-stop, full-scenario integration framework for massive data

data-streaming event-streaming framework full-scenario-service inlong massive-data-integration one-stop-service

Last synced: 11 Jan 2026

https://github.com/linkedin/brooklin

An extensible distributed system for reliable nearline data streaming at scale

change-data-capture data-streaming distributed-systems java kafka kafka-mirror-maker linkedin scalability

Last synced: 17 Aug 2025

https://github.com/linkedin/Brooklin

An extensible distributed system for reliable nearline data streaming at scale

change-data-capture data-streaming distributed-systems java kafka kafka-mirror-maker linkedin scalability

Last synced: 12 Mar 2025

https://github.com/getindata/flink-http-connector

Http Connector for Apache Flink. Provides sources and sinks for Datastream , Table and SQL APIs.

data-streaming flink flink-sql flink-stream-processing java

Last synced: 12 Jan 2026

https://github.com/getindata/dbt-flink-adapter

Adapter for dbt that executes dbt pipelines on Apache Flink

apache-flink data-streaming dbt streaming-analytics

Last synced: 07 May 2025

https://github.com/eserie/wax-ml

A Python library for machine-learning and feedback loops on streaming data

data-streaming jax machine-learning pandas python reinforcement-learning time-series xarray

Last synced: 26 Apr 2026

https://github.com/pravega/pravega-samples

Sample Applications for Pravega.

data-streaming pravega sample-app streaming-data

Last synced: 27 Apr 2025

https://github.com/build-on-aws/building-apache-kafka-connectors

Sample code that shows the important aspects of developing custom connectors for Kafka Connect. It provides the resources for building, deploying, and running the code on-premises using Docker, as well as running the code in the cloud.

amazon-msk amazon-msk-connect apache-kafka data-streaming java kafka-connect kafka-connector terraform

Last synced: 10 Apr 2025

https://github.com/factorhouse/factorhouse-local

Docker Compose environments for developing modern data platform architectures using Kafka, Flink, Spark, Iceberg, OpenLineage, OpenMetadata, Pinot, ClickHouse, StarRocks + Kpow & Flex by Factor House

clickhouse data-streaming developer-tools docker docker-compose flex flink flink-streaming grafana iceberg kafka kpow lakehouse monitoring openlineage openmetadata pinot postgresql prometheus spark

Last synced: 07 Apr 2026

https://github.com/streamnative/streamnative-mcp-server

Developer-friendly MCP server bridging Kafka and Pulsar protocols—built with ❤️ by StreamNative for an agentic, streaming-first future.

apache-kafka apache-pulsar data-streaming mcp mcp-server streamnative

Last synced: 04 Apr 2026

https://github.com/marian-nmt/sotastream

A library for data streaming and augmentation

data-augmentation data-streaming machine-learning pretraining

Last synced: 29 Jul 2025

https://github.com/jlumbroso/python-random-hash

A simple, time-tested, family of random hash functions in Python, based on CRC32 and xxHash, affine transformations, and the Mersenne Twister. 🎲

analysis-of-algorithms analytic-combinatorics data-streaming flajolet flajolet-martin hash-functions hyperloglog python randomized-algorithm streaming-algorithms

Last synced: 26 Jul 2025

https://github.com/maswag/monaa

A Tool for Timed Patten Matching with Automata-Based Acceleration

automata data-streaming formal-specification monitoring monitoring-tool regular-expression runtime-verification

Last synced: 18 Jul 2025

https://github.com/jlumbroso/java-random-hash

A simple, time-tested, family of random hash functions in Java, based on CRC32, affine transformations, and the Mersenne Twister. 🎲

data-streaming flajolet flajolet-martin hash-functions hyperloglog java

Last synced: 26 Jul 2025

https://github.com/lexvicacom/monoblok

monoblok is a NATS core-style messaging broker with a built-in stream processing DSL

data-streaming dsl edge edge-computing edn financial-data industrial-iot iot market-data monoblok nats pubsub stream-processing telemetry ticker-data trading-servers

Last synced: 24 May 2026

https://github.com/zabir-nabil/picast

A lightweight fast data streaming library for raspberry pi in python.

data-streaming raspberry-pi sensors-data-collection server-client-communication video-streaming

Last synced: 26 Jul 2025

https://github.com/improvetheworld/dataflow.net

Unified sync/async stream processing with category-based filtering. Single API for IEnumerable/IAsyncEnumerable with Cases/SelectCase/ForEachCase pattern. Eliminates Rx complexity, enables elegant pipeline composition for real-time data processing.

async-await async-enumerable category-filtering csharp data-streaming dotnet event-processing fluent-api functional-programming linq-extensions observables-alternative pipeline-composition- reactive-programming stream-processing

Last synced: 10 Feb 2026

https://github.com/aymane-maghouti/real-time-data-pipeline-using-kafka

This project implements a real-time data pipeline using Apache Kafka, Python's psutil library for metric collection, and SQL Server for data storage. The pipeline collects metrics data from the local computer, processes it through Kafka brokers, and loads it into a SQL Server database. Additionally, a real-time dashboard is created using Power BI.

apache-kafka data-collection data-streaming data-visualization powerbi python real-time real-time-data-pipeline

Last synced: 05 Jul 2025

https://github.com/conduitio/streaming-benchmarks

Benchmarks for Conduit and other data streaming tools.

benchmark conduit data-streaming

Last synced: 30 Jul 2025

https://github.com/jlumbroso/affirmative-sampling

Reference implementation of the Affirmative Sampling algorithm by Jérémie Lumbroso and Conrado Martínez (2022). 🍀

affirmative-sampling cardinality-estimation data-streaming probabilistic-algorithm random-sampling

Last synced: 26 Oct 2025

https://github.com/daq-tools/lorrystream

A lightweight and polyglot stream-processing library, to be used as a data backplane-, message relay-, or pipeline-subsystem.

amqp broker cratedb data-stream data-stream-processing data-stream-processing-framework data-streaming kotori-daq message-broker message-bus message-queue mosquitto mqtt pandas sqlalchemy stream streaming streamz zeromq zmq

Last synced: 30 Apr 2025

https://github.com/codeterrayt/streamguard

StreamGuard is a high-performance data management script using Kafka and MongoDB to efficiently handle and process real-time data streams. Ideal for scenarios like live GPS tracking, it features real-time data processing, reduced database load, and bulk data insertion.

api-integration bulk-insert-query-optimization data-management data-streaming event-driven-architecture high-velocity-data kafka microservices mongodb nodejs nodejs-kafkajs performance-optimization real-time-data-processing scalability system-design system-design-project zookeeper

Last synced: 27 Feb 2026

https://github.com/improveTheWorld/DataFlow.NET

Unified sync/async stream processing with category-based filtering. Single API for IEnumerable/IAsyncEnumerable with Cases/SelectCase/ForEachCase pattern. Eliminates Rx complexity, enables elegant pipeline composition for real-time data processing.

async-await async-enumerable category-filtering csharp data-streaming dotnet event-processing fluent-api functional-programming linq-extensions observables-alternative pipeline-composition- reactive-programming stream-processing

Last synced: 17 Jul 2025

https://github.com/sidiahmedhabib/e2e-data-engineering

This project is an end-to-end data engineering pipeline that orchestrates data ingestion, processing, and storage using a variety of powerful tools including Apache Airflow, Apache Kafka, Apache Spark and Cassandra. All components are containerized with Docker for easy deployment and scalability.

apache-airflow apache-kafka apache-spark big-data cassandra data-engineering data-streaming

Last synced: 20 Jul 2025

https://github.com/j3-signalroom/ccaf-kickstarter-flight_consolidator_app-lambda

Demonstrates a best practice implementation for using an AWS Lambda function to deploy a Flink Job Graph to Confluent Cloud for Apache Flink.

apache-kafka apacheflink aws-lambda aws-secrets-manager confluentcloud data-streaming flink kafka

Last synced: 27 Feb 2026

https://github.com/dataflow-operator/dataflow

DataFlow Operator is a Kubernetes operator for streaming data between different data sources with support for message transformations.

clickhouse data-processing data-streaming dataflow etl kafka nessie postgresql trino

Last synced: 08 Jun 2026

https://github.com/dataphos/lib-streamproc

A Go library that exposes executors, interfaces, data structures, and utility functions which combined a universal stream processor, invariant to any specific messaging system.

cloud-native data-stream data-streaming go library messaging

Last synced: 25 Feb 2026

https://github.com/gunnarmorling/streaming-examples

Example projects and demos around data streaming , stream processing, change data capture, and more.

change-data-capture data-engineering data-streaming stream-processing

Last synced: 28 Apr 2025

https://github.com/j3-signalroom/ccaf_kickstarter-flight_consolidator_app-lambda

Demonstrates a best practice implementation for using an AWS Lambda function to deploy a Flink Job Graph to Confluent Cloud for Apache Flink.

apache-kafka apacheflink aws-lambda aws-secrets-manager confluentcloud data-streaming flink kafka

Last synced: 15 Apr 2025

https://github.com/zainali104/distributed-file-system-go

This project offers a peer-to-peer content-addressable distributed file storage in Go with a peer-to-peer library built on top of TCP from scratch. It also supports data encryption during storage and transmission

content-addressed-storage data-streaming distributed-file-system distributed-systems go go-routine golang large-file-transfers peer-to-peer tcp

Last synced: 22 Feb 2026

https://github.com/beam-pyio/firehose_pyio

Apache Beam Python I/O connector for Amazon Data Firehose

apache-beam aws data-engineering data-streaming firehose python

Last synced: 05 May 2025

https://github.com/nyoungstudios/multiflow

A Python multithreading library for data processing pipelines, data streaming, etc.

concurrency data-streaming multithreading python thread-pool

Last synced: 14 Mar 2025

https://github.com/pirate-emperor/k2bq

K2BQ is a dataflow pipeline that streams data from Kafka to BigQuery. It uses Google Cloud’s managed Kafka, Dataflow for processing, and BigQuery for real-time analytics, offering scalable, automated data integration for fast insights.

bigquery cloud-computing cloud-infrastructure data-integration data-streaming dataflow google-cloud infrastructure-as-code kafka python realtime-analytics terraform

Last synced: 28 Jan 2026

https://github.com/dataphos/lib-brokers

lib-brokers is a Go library which contains the interfaces used to interact with messaging systems without relying on a specific technology or client library. This library attempts to solve the issue of properly abstracting away the interaction between applications and messaging systems.

cloud-native data-stream data-streaming go jetstream kafka library messaging pubsub pulsar servicebus

Last synced: 22 May 2026

https://github.com/beam-pyio/dynamodb_pyio

Apache Beam Python I/O connector for Amazon DynamoDB

apache-beam aws data-engineering data-streaming dynamodb python

Last synced: 04 Jan 2026

https://github.com/beam-pyio/sqs_pyio

Apache Beam Python I/O connector for Amazon SQS

apache-beam aws data-engineering data-streaming python sqs

Last synced: 05 Jan 2026

https://github.com/mauriciovazquezm/spark_bigdata_architecture_project

Final project for the course 'Architecture for Large Data Volumes', taught in the Bachelor's program in Data Science at ITAM

data-stream-processing data-streaming pyspark python spark time-series

Last synced: 09 May 2026

https://github.com/bdbao/etl-randuser

An end-to-end data engineering pipeline that orchestrates data ingestion, processing, and storage using Apache Airflow, Python, Apache Kafka, Apache Zookeeper, Apache Spark, and Cassandra.

apache-airflow apache-kafka apache-spark api cassandra data-streaming docker-compose postgresql python

Last synced: 14 Apr 2026

https://github.com/aleximb/automl-streams-research-paper

AutoML Techniques for Data Streams - Research Paper

automl data-streaming

Last synced: 19 Mar 2026

https://github.com/bdbao/kafka-vm

This project demonstrates a basic Kafka implementation: using the kafka-python library via Ubuntu virtual machine; and Change Data Capture (CDC) between 2 DBMS via Docker.

apache-kafka data-streaming docker mysql postgresql python

Last synced: 16 Apr 2026

https://github.com/dumbremadhura/apache-kafka-city-bike-project

Real-time data pipeline using Apache Kafka and Python to stream Citi Bike NYC station data. Demonstrates producing and consuming messages via Kafka, containerized with Docker. Built as part of a hands-on Kafka learning project.

citibikenyc data-streaming docker gbfs kafka kafka-python real-time

Last synced: 29 Apr 2026

https://github.com/shiningflash/kafka-python-messaging-engine

A Python-based Kafka messaging engine with Avro serialization, Schema Registry integration, and Redpanda support for real-time data streaming.

avro confluent-kafka data-pipeline data-streaming event-driven event-driven-architecture kafka kafka-python python real-time-data real-time-messaging redpanda schema-registry

Last synced: 15 May 2026

https://github.com/lupusruber/crypto_stats

A project that provides a cloud-native solution for ingesting, transforming, and visualizing cryptocurrency data, utilizing modern tools and workflows for scalability and automation.

data-engineering data-streaming etl-pipeline gcp terraform

Last synced: 15 May 2026

https://github.com/serdaraltin/fusion-bridge

Establishes and manages communication between different hardware components and software layers, ensuring seamless data exchange and synchronization.

data-streaming motion-tracking protocol-interfaces real-time-communication serial-communication

Last synced: 14 Mar 2025

https://github.com/richardbnk/telegram_tool

Utility functions for seamless message streaming and automation with Telegram.

data-streaming messaging streaming telegram telegram-bot

Last synced: 08 Oct 2025

https://github.com/cloaky233/datastreams

Real-time cat/dog image classifier using Kafka and CNN. Sender uploads images to Kafka, receiver processes with pre-trained model, returns predictions via Kafka. Demonstrates distributed, scalable image processing with instant feedback. Uses TensorFlow, OpenCV, and Kafka for efficient, asynchronous communication.

binary-classification data-streaming kafka kafka-basic

Last synced: 11 Oct 2025

https://github.com/pranavbarthwal/kafka

Kafka is an open-source software platform for storing, processing, and analyzing streaming data in real time. It's used to build data pipelines and applications that can adapt to data streams.

apache-kafka data-streaming distributed-systems documentation kafka system-design

Last synced: 16 Jun 2025

https://github.com/tashi-2004/apache-flink-spark-data-streaming

This project showcases a real-time data streaming pipeline using Apache Flink, Apache Spark, and Grafana. It streams data, stores it in Parquet format, and performs aggregations for insights, with seamless visualization via Grafana dashboards.

apache-flink apache-spark data-aggregation data-analysis data-science data-streaming data-visualization flink flink-stream-processing flink-streaming grafana-dashboard grafana-plugin pyflink python3

Last synced: 09 Feb 2026

https://github.com/pyladiesams/kafka-clients-processing-data-streams-aug2025

Learn how to set up a Kafka producer client, then process the data to make it ready for downstream consumers. Discuss the basic of Kafka and get a handle on the different ways to process the data.

data-stream-processing data-streaming kafka kafka-consumer kafka-producer

Last synced: 20 Jul 2025

https://github.com/night-fury-me/real-time-vehicle-data-processing

A repository that contains implementation of a Real-Time Vehicle Data Processing Pipeline that efficiently manages and analyzes vehicle data through a cohesive system.

bigquery cpp data-engineering data-streaming flink grpc kafka python real-time-data-processing

Last synced: 02 Jan 2026

https://github.com/mitgar14/etl-workshop-3

Workshop #3 (Machine Learning and Data Streaming) for the ETL course using scikit-learn to develop the ML model and Apache Kafka to manage the data streaming process.

data-enginner data-science data-streaming etl kafka machine-learning pandas python sklearn sqlalchemy

Last synced: 16 Apr 2026

https://github.com/ayemunhossain/stream-buffers-in-nodejs

This project focuses on implementing and demonstrating how stream and buffer works along together in nodejs.

buffers data-stream-processing data-streaming nodejs nodejs-buffers nodejs-streams stream streams-buffers

Last synced: 02 May 2026

https://github.com/to-infinitee/real-time-data-system-arch

The architecture ingests data via Kafka, processes it in real-time with Spark Streaming, and stores it in Cassandra and Hadoop HDFS. It supports direct data push to apps using WebSockets/HTTP Streaming, with a front-end built on Spring Boot, Bootstrap.js, and Chart.js.

backend data-streaming frontend kafka real-time rest-api websocket

Last synced: 05 Mar 2026

https://github.com/richardbnk/raspberry_pi

Python Operations for Raspberry Pi in Internet of Things (IoT) Applications

data-processing data-streaming dht11-temperature-sensor hivemq hivemq-cloud raspberry-pi relay-switch

Last synced: 17 Apr 2026

https://github.com/richardbnk/hive_mq_streaming

Streaming data seamlessly using the HiveMQ Broker for efficient communication and IoT integration.

data-streaming hive hivemq hivemq-cloud hivemq-mqtt-client

Last synced: 27 Mar 2025

https://github.com/thaitechtales/kafka

This repository serves as a collection of projects demonstrating expertise in Apache Kafka, a distributed event-streaming platform. The projects aim to highlight real-time data integration and stream processing solutions.

apache-kafka data-streaming distributed-systems event-driven kafka messaging real-time stream-processing

Last synced: 22 Feb 2026