Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/ev2900/flink_kinesis_data_analytics

Apache Flink examples designed to be run by AWS Kinesis Data Analytics (KDA).
https://github.com/ev2900/flink_kinesis_data_analytics

aws flink flink-examples flink-sql flink-stream-processing flink-streaming flinksql kinesis

Last synced: about 2 months ago
JSON representation

Apache Flink examples designed to be run by AWS Kinesis Data Analytics (KDA).

Awesome Lists containing this project

README

        

# Kinesis Data Analytics Lab

map-user map-user

Processing real-time data via. Kinesis Data Analytics - Apache Flink

Youtube video(s)
1. [Send Data to Kinesis from a Python Script][12]
2. Optional - [Send Data to Kinesis from a KDA Notebook][22]
3. [Create a Kinesis Data Analytics Studio and Upload a Notebook][13]
4. [Running the Interactive Flink Zeppelin Notebook][16]
5. [Deploy a Kinesis Data Analytics Studio Notebook][21]

## Data Producer

***Note*** if you want to get started and do not want to set up a Kinesis Data Stream & load data into the stream / set up a [data simulator][19], use the [sql_1.13_DataGen.zpln][18] notebook. This Zeppelin notebook uses the Flink [DataGen][17] connector to generate data with in the Zeppelin notebook **without needing a connnection to Kineis or Kafka**.

In order to get started with Apache Flink via. Kinesis Data Analytics (KDA), a Kinesis Data Stream with sample data is required. The [```kinesis_data_producer```][1] folder provides two python scripts that will read the data from the CSV file [```yellow_tripdata_2020-01.csv```][3] in the [```data```][2] folder and stream each line in the file as a JSON record/message to a Kineis Data Stream specified.

Two variations of this python data producer are provided.
* [```NycTaxi_Producer_Cloud9_JSON.py```][4]
* [```NycTaxi_Producer_Desktop_JSON.py```][5]

The two scripts/programs are very similar. A few differences exist depending on if you want run the producer application(s) from your local computer/laptop or if you want to use [Cloud9][6].

For a step by step walk through view the Youtube video [Send Data to Kinesis from a Python Script][12]

**An alternative method to send sample data to a Kinesis Data Stream - without the need to set up the python data producer**(s) described above - is to use the [```Nyc_Taxi_Produce_KDA_Zeppelin_Notebook.zpln```][20] notebook in KDA Studio. This notebook can be uploaded and has instructions to sends sample data from S3 to a Kinesis Data Stream.

To benefit the most from the sample Flink code / labs provided it will be important that you can easily start and stop a python data producer.

## Interactive KDA Flink Zeppelin Notebook(s)

The [```interactive_KDA_flink_zeppelin_notebook```][7] folder provides [Zeppelin][8] notebooks that are design to work with [Kinesis Data Analytics Studio][9]. Deploy a Kinesis Data Analytics Studio instance and upload the Zeppelin (.zpln) notebook(s).

Note - with in the the [```interactive_KDA_flink_zeppelin_notebook```][7] folder are subfolders
* [```Flink v1.11```][10]
* [```Flink v1.13```][11]

Depending on which version of Flink your notebook is configured to use. I would recommend using Flink v1.13.

To upload the notebook

upload_notebook

Once uploaded and opended in Zeppelin. Run the notebook one cell at a time

interactive_notebook

For a step by step walk through of the notebook running view the Youtube video [Running the Interactive Flink Zeppelin Notebook][16]

## Deployable KDA Flink Zeppelin Notebook(s)

Kinesis Data Analytics Studio provides an excellent development environment. When you are ready to deploy you application Kinesis Data Analytics Studio has a mechanism to build and deploy your notebook code as a long running Kinesis Data Analytics application.

To deploy your notebook

Ensure that when you created your notebook environment you configured the ```Deploy as application configuration - optional``` setting with a valid S3 bucket.

deploy_config

To access this configuration menu during the creation of your studio notebook select ```Create with custom settings``` instead of the default Quick create with sample code. Follow the set up prompts and on ```Step 3 - Configure``` select an S3 bucket for the ```Deploy as application configuration - optional```

With this configured your Zeppelin notebook select ```Build deployable and export to Amazon S3```

build_action

Once the build is complete. Select ```Deploy deployable as Kinesis Analytics application```

deploy_action

When the deployment is complete you will see the application under the analytics application section of Kinesis Data Analytics

deployed

## Future Improvements Planned for this Repository
* YouTube video - DataGen based interactive_KDA_flink_zeppelin_notebook [sql_1.13_DataGen.zpln][18]
* [Versioned Tables][15]
* Examples for Managed Streaming for Kafka (MSK)

[1]:https://github.com/ev2900/Kinesis_Data_Analytics_Lab/tree/main/kinesis_data_producer
[2]:https://github.com/ev2900/Kinesis_Data_Analytics_Lab/tree/main/data
[3]:https://github.com/ev2900/Kinesis_Data_Analytics_Lab/blob/main/data/yellow_tripdata_2020-01.csv
[4]:https://github.com/ev2900/Kinesis_Data_Analytics_Lab/blob/main/kinesis_data_producer/NycTaxi_Producer_Cloud9_JSON.py
[5]:https://github.com/ev2900/Kinesis_Data_Analytics_Lab/blob/main/kinesis_data_producer/NycTaxi_Producer_Desktop_JSON.py
[6]:https://aws.amazon.com/cloud9/
[7]:https://github.com/ev2900/Apache_Flink_via_Kinesis_Data_Analytics/tree/main/interactive_KDA_flink_zeppelin_notebook
[8]:https://flink.apache.org/news/2020/06/15/flink-on-zeppelin-part1.html
[9]:https://aws.amazon.com/blogs/aws/introducing-amazon-kinesis-data-analytics-studio-quickly-interact-with-streaming-data-using-sql-python-or-scala/
[10]:https://github.com/ev2900/Apache_Flink_via_Kinesis_Data_Analytics/tree/main/interactive_KDA_flink_zeppelin_notebook/Flink%20v1.11
[11]:https://github.com/ev2900/Apache_Flink_via_Kinesis_Data_Analytics/tree/main/interactive_KDA_flink_zeppelin_notebook/Flink%20v1.13
[12]:https://www.youtube.com/watch?v=pPCg6SWhv-0
[13]:https://www.youtube.com/watch?v=5--oWB2udCc
[14]:https://nightlies.apache.org/flink/flink-docs-master/docs/connectors/table/formats/raw/
[15]:https://nightlies.apache.org/flink/flink-docs-release-1.13/docs/dev/table/concepts/versioned_tables/
[16]:https://youtu.be/dO9GFcAy-YM
[17]:https://nightlies.apache.org/flink/flink-docs-master/docs/connectors/table/datagen/
[18]:https://github.com/ev2900/Flink_Kinesis_Data_Analytics/blob/main/interactive_KDA_flink_zeppelin_notebook/Flink%20v1.13/sql_1.13_DataGen.zpln
[19]:https://github.com/ev2900/Flink_Kinesis_Data_Analytics/tree/main/kinesis_data_producer
[20]:https://github.com/ev2900/Flink_Kinesis_Data_Analytics/blob/main/kinesis_data_producer/Nyc_Taxi_Produce_KDA_Zeppelin_Notebook.zpln
[21]:https://youtu.be/0GO8drcWv3c
[22]:https://youtu.be/oAQO8cmip7Q