Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/bkirwi/coast
Experiments in Streaming
https://github.com/bkirwi/coast
Last synced: 9 days ago
JSON representation
Experiments in Streaming
- Host: GitHub
- URL: https://github.com/bkirwi/coast
- Owner: bkirwi
- License: apache-2.0
- Created: 2014-09-26T17:47:21.000Z (about 10 years ago)
- Default Branch: master
- Last Pushed: 2016-08-27T23:16:52.000Z (about 8 years ago)
- Last Synced: 2024-08-01T17:31:18.305Z (3 months ago)
- Language: Scala
- Size: 473 KB
- Stars: 60
- Watchers: 15
- Forks: 3
- Open Issues: 5
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- awesome-streaming - coast - a DSL that builds DAGs on top of Samza and provides exactly-once semantics. (Table of Contents / DSL)
- awesome-streaming - coast - a DSL that builds DAGs on top of Samza and provides exactly-once semantics. (Table of Contents / DSL)
README
# coast
In this dark stream-processing landscape, `coast` is a ray of light.
## Why `coast`?
- **Simple:** `coast` provides a simple streaming model with strong ordering and
exactly-once semantics. This straightforward behaviour extends across multiple
machines, state aggregations, and even between independent jobs, making it
easier to reason about how your entire system behaves.- **Easy:** Streams are built up and wired together using a concise, idiomatic
Scala API. These dataflow graphs can be as small or as large as you like:
no need to cram all your logic in one big job, or to write a bunch
of single-stage jobs and track their relationships by hand.- **Kafkaesque:** `coast`'s core abstractions are patterned after Kafka's
data model, and it's designed to fit comfortably in the middle of a larger
Kafka-based infrastructure. By taking advantage of Kafka's messaging
guarantees, `coast` can implement [exactly-once semantics][impossible]
for messages and state without a heavy coordination cost.## Quick Introduction
`coast`'s streams are closely patterned after Kafka's topics: a stream has
multiple partitions, and each partition has an ordered series of values. A
stream can have any number of partitions, each of which has a unique key.
You can create a stream by pulling data from a topic, but `coast` also
has a rich API for building derivative streams: applying transformations,
merging streams together, regrouping, aggregating state, or performing joins.
Once you've defined a stream you like, you can give it a name and publish it
out to another topic.By defining streams and networking them together, it's possible to
express arbitrarily-complex dataflow graphs, including cycles and joins. You can
use the resulting graphs in multiple ways: print it out as a GraphViz image,
unit-test your logic using a simple in-memory implementation, or compile the
graph to multiple [Samza jobs][samza] and run it on a cluster.Sound promising? You might be interested in:
- The [heavily-commented 'Twitter reach' example][twitter-reach],
which walks through all the pieces of a real job.
- A [fork of the `hello-samza` project][hello-coast] with setup and deployment instructions.
- Some [wiki documentation][wiki] on [the core concepts][wiki-overview],
nuances of the [graph-builder API][wiki-flow],
or [details on the Samza integration][wiki-samza].[samza]: http://samza.apache.org/
[hello-coast]: https://github.com/bkirwi/incubator-samza-hello-samza/tree/hello-coast
[twitter-reach]: core/src/main/scala/com/monovore/example/coast/TwitterReach.scala
[impossible]: http://ben.kirw.in/2014/11/28/kafka-patterns/
[wiki]: https://github.com/bkirwi/coast/wiki
[wiki-overview]: https://github.com/bkirwi/coast/wiki/Overview
[wiki-flow]: https://github.com/bkirwi/coast/wiki/Flow
[wiki-samza]: https://github.com/bkirwi/coast/wiki/Samza## Getting Started
The 0.2.0 release is published on Bintray.
If you're using maven, you'll want to point your `pom.xml` at the repo:```xml
bintray-coast
https://dl.bintray.com/bkirwi/maven```
...and add `coast` to your dependencies:
```xml
com.monovore
coast-samza_2.10
0.2.0```
*Mutatis mutandis*, the same goes for SBT and Gradle.
## Mandatory Word Count Example
```scala
val Sentences = Topic[Source, String]("sentences")val WordCounts = Topic[String, Int]("word-counts")
val graph = Flow.build { implicit builder =>
Sentences.asSource
.flatMap { _.split("\\s+") }
.map { _ -> 1 }
.groupByKey
.streamTo("words")
.sum.updates
.sinkTo(WordCounts)
}
```## Future Work
If you're interested in what the future holds for `coast` --
or have questions or bugs to report --
come on over to the [issue tracker][issues].[issues]: https://github.com/bkirwi/coast/issues