https://github.com/xmlking/beam-examples
Apache Beam / Google Dataflow Examples
https://github.com/xmlking/beam-examples
apache-beam beam dataflow gcd gradle kotlin monorepo
Last synced: about 2 months ago
JSON representation
Apache Beam / Google Dataflow Examples
- Host: GitHub
- URL: https://github.com/xmlking/beam-examples
- Owner: xmlking
- Created: 2019-11-21T16:16:41.000Z (over 6 years ago)
- Default Branch: master
- Last Pushed: 2020-03-09T04:25:01.000Z (over 6 years ago)
- Last Synced: 2025-03-25T06:28:16.848Z (about 1 year ago)
- Topics: apache-beam, beam, dataflow, gcd, gradle, kotlin, monorepo
- Language: Kotlin
- Homepage:
- Size: 193 KB
- Stars: 0
- Watchers: 2
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# beam-examples
A set of example **Streaming** and **Batch** jobs implementation with **Apache Beam**
`Dataflow brings life to Datalakes`

### Features
1. Monorepo(apps, libs) project to showcase workspace setup with multiple apps and shared libraries
2. **Polyglot** - Support multiple languages (java, kotlin)
4. Support making `FatJar` for submitting jobs form CI Environment
7. Cloud Native (Run Local, Run on Cloud, Deploy as Template for GCD)
8. Multiple Runtime (Flink, Spark, Google Cloud Dataflow, Hazelcast Jet )
### Prerequisites
> see [PLAYBOOK](./docs/PLAYBOOK.md)
### Quick Start
Run WordCount kotlin example:
gradle :apps:wordcount:run --args="--runner=DirectRunner --inputFile=./src/test/resources/data/input.txt --output=./build/output.txt"
WordCount pipeline will run on local and produce the output file in `apps/wordcount/build` directory.
WordCount pipeline can run on Google Cloud Dataflow if you have a project setup in your local.
PROJECT_ID=
GCS_BUCKET=
export GOOGLE_APPLICATION_CREDENTIALS=
gradle :apps:wordcount:run --args="--runner=DataflowRunner --project=$PROJECT_ID --gcpTempLocation=gs://$GCS_BUCKET/dataflow/wordcount/temp/ --stagingLocation=gs://$GCS_BUCKET/dataflow/wordcount/staging/ --inputFile=gs://$GCS_BUCKET/dataflow/wordcount/input/shakespeare.txt --output=gs://$GCS_BUCKET/dataflow/wordcount/output/output.txt"
The `inputFile` option is defined by default in WordCount options, so that it will run with the input file and produce output files in
### Reference
1. [Apache Beam Programming Guide](https://beam.apache.org/documentation/programming-guide/)
1. https://github.com/xmlking/micro-apps
2. https://github.com/sfeir-open-source/kbeam
3. https://github.com/thinhha/gcp-data-project-template
4. https://google.github.io/flogger/best_practice
5. https://github.com/apache/beam/tree/master/examples/kotlin