An open API service indexing awesome lists of open source software.

https://github.com/xmlking/beam-examples

Apache Beam / Google Dataflow Examples
https://github.com/xmlking/beam-examples

apache-beam beam dataflow gcd gradle kotlin monorepo

Last synced: about 2 months ago
JSON representation

Apache Beam / Google Dataflow Examples

Awesome Lists containing this project

README

          

# beam-examples

A set of example **Streaming** and **Batch** jobs implementation with **Apache Beam**

`Dataflow brings life to Datalakes`

![“DataLake with Cloud Dataflow](./docs/dataflow.png)

### Features
1. Monorepo(apps, libs) project to showcase workspace setup with multiple apps and shared libraries
2. **Polyglot** - Support multiple languages (java, kotlin)
4. Support making `FatJar` for submitting jobs form CI Environment
7. Cloud Native (Run Local, Run on Cloud, Deploy as Template for GCD)
8. Multiple Runtime (Flink, Spark, Google Cloud Dataflow, Hazelcast Jet )

### Prerequisites
> see [PLAYBOOK](./docs/PLAYBOOK.md)

### Quick Start

Run WordCount kotlin example:

gradle :apps:wordcount:run --args="--runner=DirectRunner --inputFile=./src/test/resources/data/input.txt --output=./build/output.txt"

WordCount pipeline will run on local and produce the output file in `apps/wordcount/build` directory.

WordCount pipeline can run on Google Cloud Dataflow if you have a project setup in your local.

PROJECT_ID=
GCS_BUCKET=
export GOOGLE_APPLICATION_CREDENTIALS=

gradle :apps:wordcount:run --args="--runner=DataflowRunner --project=$PROJECT_ID --gcpTempLocation=gs://$GCS_BUCKET/dataflow/wordcount/temp/ --stagingLocation=gs://$GCS_BUCKET/dataflow/wordcount/staging/ --inputFile=gs://$GCS_BUCKET/dataflow/wordcount/input/shakespeare.txt --output=gs://$GCS_BUCKET/dataflow/wordcount/output/output.txt"

The `inputFile` option is defined by default in WordCount options, so that it will run with the input file and produce output files in

### Reference

1. [Apache Beam Programming Guide](https://beam.apache.org/documentation/programming-guide/)
1. https://github.com/xmlking/micro-apps
2. https://github.com/sfeir-open-source/kbeam
3. https://github.com/thinhha/gcp-data-project-template
4. https://google.github.io/flogger/best_practice
5. https://github.com/apache/beam/tree/master/examples/kotlin