Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

awesome-streaming

a curated list of awesome streaming frameworks, applications, etc
https://github.com/eric-erki/awesome-streaming

Last synced: 3 days ago
JSON representation

  • Website

  • Table of Contents

    • Streaming Engine

      • Apache Apex - unified platform for big data stream and batch processing.
      • Apache Flink - system for high-throughput, low-latency data stream processing that supports stateful computation, data-driven windowing semantics and iterative stream processing.
      • Apache Heron (incubating) - a realtime, distributed, fault-tolerant stream processing engine from Twitter.
      • Apache Samza - distributed stream processing framework that build on Kafka(messaging, storage) and YARN(fault tolerance, processor isolation, security and resource management).
      • Apache Spark Streaming - makes it easy to build scalable fault-tolerant streaming applications.
      • Apache Storm - distributed real-time computation system. Storm is to stream processing what Hadoop is to batch processing.
      • AthenaX - Uber's Stream Analytics Framework used in production
      • Faust - stream processing library, porting the ideas from Kafka Streams to Python
      • Gearpump - lightweight real-time distributed streaming engine built on Akka.
      • Hazelcast Jet - A general purpose distributed data processing engine, built on top of Hazelcast.
      • hailstorm - distributed stream processing with exactly-once semantics based on Storm.
      • mantis - Netflix's platform to build an ecosystem of realtime stream processing applications
      • mupd8(muppet) - mapReduce-style framework for processing fast/streaming data.
      • Onyx - Distributed, masterless, high performance, fault tolerant data processing.
      • s4 - general-purpose, distributed, scalable, fault-tolerant, pluggable platform that allows programmers to easily develop applications for processing continuous unbounded streams of data.
      • SABER - Window-Based Hybrid CPU/GPU Stream Processing Engine.
      • SPQR - dynamic framework for processing high volumn data streams through pipelines.
      • tigon - high throughput real-time streaming processing framework built on Hadoop and HBase.
      • Teknek - Simple elegant stream processing with interactive prototying shell SOL (Stream Operator Language)
      • Trill - Trill is a high-performance one-pass in-memory streaming analytics engine from Microsoft Research.
      • AthenaX - Uber's Stream Analytics Framework used in production
      • s4 - general-purpose, distributed, scalable, fault-tolerant, pluggable platform that allows programmers to easily develop applications for processing continuous unbounded streams of data.
      • tigon - high throughput real-time streaming processing framework built on Hadoop and HBase.
    • Streaming Library

      • FS2(prev. 'Scalaz-Stream') - Compositional, streaming I/O library for Scala.
      • monix - high-performance Scala / Scala.js library for composing asynchronous and event-based programs.
      • Streamline - Stream Analytics Framework by Hortonworks, designed as a wrapper around existing streaming solutions like Storm. Aimed to allow users to drag-and-drop streaming components to focus on business logic.
      • StreamAlert - Airbnb's Real-time Data Analysis and Alerting.
      • Swave - A lightweight Reactive Streams Infrastructure Toolkit for Scala.
      • Streamz - A lightweight library for building pipelines to manage continuous streams of data; supports complex pipelines that involve branching, joining, flow control, feedback, back pressure, and so on.
      • Stream Ops - A fully embeddable data streaming engine and stream processing API for Java.
      • FS2(prev. 'Scalaz-Stream') - Compositional, streaming I/O library for Scala.
    • Streaming Application

      • straw - A platform for real-time streaming search.
      • storm-crawler - Web crawler SDK based on Apache Storm.
    • IoT

      • sensorbee - lightweight stream processing engine for IoT.
      • Apache Edgent - a programming model and runtime that enables continuous streaming analytics on gateways and edge devices which can work with centralized systems to provide efficient and timely analytics across the whole IoT ecosystem: from the center to the edge, opens sourced by IBM.
      • Apache StreamPipes - a self-service (Industrial) IoT toolbox to enable non-technical users to connect, analyze and explore IoT data streams.
      • Apache Edgent - a programming model and runtime that enables continuous streaming analytics on gateways and edge devices which can work with centralized systems to provide efficient and timely analytics across the whole IoT ecosystem: from the center to the edge, opens sourced by IBM.
      • Apache StreamPipes - a self-service (Industrial) IoT toolbox to enable non-technical users to connect, analyze and explore IoT data streams.
    • DSL

      • Apache Beam - unified model and set of language-specific SDKs for defining and executing data processing workflows, and also data ingestion and integration flows, supporting Enterprise Integration Patterns (EIPs) and Domain Specific Languages (DSLs), open sourced by Google.
      • coast - a DSL that builds DAGs on top of Samza and provides exactly-once semantics.
      • Esper - component for complex event processing (CEP) and event series analysis.
      • summingbird - library that lets you write MapReduce programs that look like native Scala or Java collection transformations and execute them on a number of well-known distributed MapReduce platforms, including Storm and Scalding.
      • Streamparse - lets you run Python code against real-time streams of data via Apache Storm.
    • Data Pipeline

      • camus - Linkedin's Kafka -> HDFS pipeline.
      • databus - Linkedin's source-agnostic distributed change data capture system.
      • flume - distributed, reliable, and available service for efficiently collecting, aggregating, and moving large amounts of log data.
      • metaq - Taobao's high available, high performance distributed messaging system
      • NATS streaming - fast disk-backed messaging solution
      • nsq - realtime distributed messaging platform designed to operate at scale, handling billions of messages per day.
      • suro - data pipeline service for collecting, aggregating, and dispatching large volume of application events including log data.
      • LogDevice - a high-performant distributed system by Facebook for streaming and storing sequential data, using a log structure.
      • StreamSets Data Collector - continuous big data ingestion infrastructure that reads from and writes to a large number of end-points, including S3, JDBC, Hadoop, Kafka, Cassandra and many others.
      • Apache Kafka - distributed, partitioned, replicated commit log service, which provides the functionality of a messaging system, but with a unique design.
      • brooklin - a distributed system intended for streaming data between various heterogeneous source and destination systems with high reliability and throughput at scale from Linkedin (replaced databus).
      • camus - Linkedin's Kafka -> HDFS pipeline.
      • flume - distributed, reliable, and available service for efficiently collecting, aggregating, and moving large amounts of log data.
      • Apache Pulsar - distributed pub-sub messaging platform with a very flexible messaging model and an intuitive client API.
    • Online Machine Learning

      • Apache Samoa - distributed streaming machine learning (ML) framework that contains a programing abstraction for distributed streaming ML algorithms.
      • DataSketches - sketches library from Yahoo!.
      • streamDM - mining Big Data streams using Spark Streaming from Huawei.
      • StreamingBandit - Provides a webserver to quickly setup and evaluate possible solutions to contextual multi-armed bandit (cMAB) problems.
      • StormCV - enables the use of Apache Storm for video processing by adding computer vision (CV) specific operations and data model.
      • trident-ml - realtime online machine learning library based on Trident.
      • yurita - Anomaly detection framework built on Spark Structured Streaming from Paypal.
      • DataSketches - sketches library from Yahoo!.
    • Streaming SQL

      • pipelinedb - An open-source relational database that runs SQL queries continuously on streams, incrementally storing results in tables.
      • squall - Squall executes SQL queries on top of Storm for doing online processing.
      • StreamCQL - Continuous Query Language on RealTime Computation System.
      • KSQL - a Streaming SQL Engine for Apache Kafka.
    • Benchmark

      • storm-perf-test - a simple storm performance/stress test.
      • streaming-benchmarks - Benchmarks for Low Latency (Streaming) solutions including Apache Storm, Apache Spark, Apache Flink, etc.
      • flotilla - Automated message queue orchestration for scaled-up benchmarking.
      • storm-benchmark - a set of benchmarks to test Storm performance.
    • Toolkit

      • pulsar - Actor based event driven concurrent framework for Python.
      • aeron - efficient reliable unicast and multicast message transport.
      • StreamFlow - stream processing tool designed to help build and monitor processing workflows.
      • samza-luwak - uses Luwak, a stored-query engine built on Lucene, to implement full-text search on streams.
      • Turbine - tool for aggregating streams of Server-Sent Event (SSE) JSON data into a single stream.
      • akka - toolkit and runtime for building highly concurrent, distributed, and resilient message-driven application on the JVM.
      • pulsar - Actor based event driven concurrent framework for Python.
    • Closed Source

      • Amazon Kinesis Streams - real-time, fully managed and scalable data stream engine provided by AWS.
      • concord - a distributed stream processing framework built in C++ on top of Apache.
      • jubatus - distributed processing framework and streaming machine learning library.
      • millwheel - framework for building low-latency data-processing applications that is widely used at Google.
      • Azure Stream Analytics - time, data stream engine provided by Microsoft Azure.
    • Readings