{"id":18359117,"url":"https://github.com/ev2900/flink_kinesis_data_analytics","last_synced_at":"2025-08-25T21:16:53.589Z","repository":{"id":85139174,"uuid":"411347160","full_name":"ev2900/Flink_Kinesis_Data_Analytics","owner":"ev2900","description":"Apache Flink examples designed to be run by AWS Kinesis Data Analytics (KDA). ","archived":false,"fork":false,"pushed_at":"2023-12-04T19:55:36.000Z","size":18151,"stargazers_count":10,"open_issues_count":0,"forks_count":11,"subscribers_count":3,"default_branch":"main","last_synced_at":"2025-04-06T13:37:02.718Z","etag":null,"topics":["aws","flink","flink-examples","flink-sql","flink-stream-processing","flink-streaming","flinksql","kinesis"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ev2900.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2021-09-28T15:54:08.000Z","updated_at":"2024-03-27T13:57:19.000Z","dependencies_parsed_at":"2024-11-05T22:33:21.060Z","dependency_job_id":null,"html_url":"https://github.com/ev2900/Flink_Kinesis_Data_Analytics","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/ev2900/Flink_Kinesis_Data_Analytics","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ev2900%2FFlink_Kinesis_Data_Analytics","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ev2900%2FFlink_Kinesis_Data_Analytics/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ev2900%2FFlink_Kinesis_Data_Analytics/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ev2900%2FFlink_Kinesis_Data_Analytics/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ev2900","download_url":"https://codeload.github.com/ev2900/Flink_Kinesis_Data_Analytics/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ev2900%2FFlink_Kinesis_Data_Analytics/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":265253235,"owners_count":23735090,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["aws","flink","flink-examples","flink-sql","flink-stream-processing","flink-streaming","flinksql","kinesis"],"created_at":"2024-11-05T22:21:01.163Z","updated_at":"2025-07-14T06:34:09.224Z","avatar_url":"https://github.com/ev2900.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Kinesis Data Analytics Lab\n\n\u003cimg width=\"85\" alt=\"map-user\" src=\"https://img.shields.io/badge/views-0000-green\"\u003e \u003cimg width=\"125\" alt=\"map-user\" src=\"https://img.shields.io/badge/unique visits-0000-green\"\u003e\n\nProcessing real-time data via. Kinesis Data Analytics - Apache Flink\n\nYoutube video(s)\n1.    [Send Data to Kinesis from a Python Script][12]\n2.    Optional - [Send Data to Kinesis from a KDA Notebook][22]\n3.    [Create a Kinesis Data Analytics Studio and Upload a Notebook][13]\n4.    [Running the Interactive Flink Zeppelin Notebook][16]\n5.    [Deploy a Kinesis Data Analytics Studio Notebook][21]\n\n## Data Producer\n\n***Note*** if you want to get started and do not want to set up a Kinesis Data Stream \u0026 load data into the stream / set up a [data simulator][19], use the [sql_1.13_DataGen.zpln][18] notebook. This Zeppelin notebook uses the Flink [DataGen][17] connector to generate data with in the Zeppelin notebook **without needing a connnection to Kineis or Kafka**.\n\nIn order to get started with Apache Flink via. Kinesis Data Analytics (KDA), a Kinesis Data Stream with sample data is required. The [```kinesis_data_producer```][1] folder provides two python scripts that will read the data from the CSV file [```yellow_tripdata_2020-01.csv```][3] in the [```data```][2] folder and stream each line in the file as a JSON record/message to a Kineis Data Stream specified.\n\nTwo variations of this python data producer are provided.\n* [```NycTaxi_Producer_Cloud9_JSON.py```][4]\n* [```NycTaxi_Producer_Desktop_JSON.py```][5]\n\nThe two scripts/programs are very similar. A few differences exist depending on if you want run the producer application(s) from your local computer/laptop or if you want to use  [Cloud9][6].\n\nFor a step by step walk through view the Youtube video [Send Data to Kinesis from a Python Script][12] \n\n**An alternative method to send sample data to a Kinesis Data Stream - without the need to set up the python data producer**(s) described above - is to use the [```Nyc_Taxi_Produce_KDA_Zeppelin_Notebook.zpln```][20] notebook in KDA Studio. This notebook can be uploaded and has instructions to sends sample data from S3 to a Kinesis Data Stream.\n\nTo benefit the most from the sample Flink code / labs provided it will be important that you can easily start and stop a python data producer. \n\n## Interactive KDA Flink Zeppelin Notebook(s) \n\nThe [```interactive_KDA_flink_zeppelin_notebook```][7] folder provides [Zeppelin][8] notebooks that are design to work with [Kinesis Data Analytics Studio][9]. Deploy a Kinesis Data Analytics Studio instance and upload the Zeppelin (.zpln) notebook(s). \n\nNote - with in the the [```interactive_KDA_flink_zeppelin_notebook```][7] folder are subfolders \n* [```Flink v1.11```][10]\n* [```Flink v1.13```][11]\n\nDepending on which version of Flink your notebook is configured to use. I would recommend using Flink v1.13.\n\nTo upload the notebook\n\n\u003cimg width=\"795\" alt=\"upload_notebook\" src=\"https://user-images.githubusercontent.com/5414004/137349377-80cc961e-e918-4c31-85c5-bfaac7bf3da9.png\"\u003e\n\nOnce uploaded and opended in Zeppelin. Run the notebook one cell at a time\n\n\u003cimg width=\"788\" alt=\"interactive_notebook\" src=\"https://user-images.githubusercontent.com/5414004/137350050-c962e127-f198-4819-9e2a-55dfc16571ed.PNG\"\u003e\n\nFor a step by step walk through of the notebook running view the Youtube video [Running the Interactive Flink Zeppelin Notebook][16]\n\n## Deployable KDA Flink Zeppelin Notebook(s)\n\nKinesis Data Analytics Studio provides an excellent development environment. When you are ready to deploy you application Kinesis Data Analytics Studio has a mechanism to build and deploy your notebook code as a long running Kinesis Data Analytics application. \n\nTo deploy your notebook \n\nEnsure that when you created your notebook environment you configured the ```Deploy as application configuration - optional``` setting with a valid S3 bucket.\n\n\u003cimg width=\"614\" alt=\"deploy_config\" src=\"https://user-images.githubusercontent.com/5414004/137352921-d16fc081-4190-4e42-b978-26b247139f86.png\"\u003e\n\nTo access this configuration menu during the creation of your studio notebook select ```Create with custom settings``` instead of the default Quick create with sample code. Follow the set up prompts and on ```Step 3 - Configure``` select an S3 bucket for the ```Deploy as application configuration - optional```\n\nWith this configured your Zeppelin notebook select ```Build deployable and export to Amazon S3``` \n\n\u003cimg width=\"783\" alt=\"build_action\" src=\"https://user-images.githubusercontent.com/5414004/137355086-2317d761-0d75-444f-af90-0a4a042b575c.png\"\u003e\n\nOnce the build is complete. Select ```Deploy deployable as Kinesis Analytics application``` \n\n\u003cimg width=\"782\" alt=\"deploy_action\" src=\"https://user-images.githubusercontent.com/5414004/137355601-e8c495d2-ea88-4420-b21f-6f3e95801a7d.png\"\u003e\n\nWhen the deployment is complete you will see the application under the analytics application section of Kinesis Data Analytics \n\n\u003cimg width=\"739\" alt=\"deployed\" src=\"https://user-images.githubusercontent.com/5414004/137364432-45a2fa75-09bf-4c28-82e0-d007c6cd66b7.png\"\u003e\n\n## Future Improvements Planned for this Repository\n* YouTube video - DataGen based interactive_KDA_flink_zeppelin_notebook [sql_1.13_DataGen.zpln][18]\n* [Versioned Tables][15]\n* Examples for Managed Streaming for Kafka (MSK)\n\n[1]:https://github.com/ev2900/Kinesis_Data_Analytics_Lab/tree/main/kinesis_data_producer\n[2]:https://github.com/ev2900/Kinesis_Data_Analytics_Lab/tree/main/data\n[3]:https://github.com/ev2900/Kinesis_Data_Analytics_Lab/blob/main/data/yellow_tripdata_2020-01.csv\n[4]:https://github.com/ev2900/Kinesis_Data_Analytics_Lab/blob/main/kinesis_data_producer/NycTaxi_Producer_Cloud9_JSON.py\n[5]:https://github.com/ev2900/Kinesis_Data_Analytics_Lab/blob/main/kinesis_data_producer/NycTaxi_Producer_Desktop_JSON.py\n[6]:https://aws.amazon.com/cloud9/\n[7]:https://github.com/ev2900/Apache_Flink_via_Kinesis_Data_Analytics/tree/main/interactive_KDA_flink_zeppelin_notebook\n[8]:https://flink.apache.org/news/2020/06/15/flink-on-zeppelin-part1.html\n[9]:https://aws.amazon.com/blogs/aws/introducing-amazon-kinesis-data-analytics-studio-quickly-interact-with-streaming-data-using-sql-python-or-scala/\n[10]:https://github.com/ev2900/Apache_Flink_via_Kinesis_Data_Analytics/tree/main/interactive_KDA_flink_zeppelin_notebook/Flink%20v1.11\n[11]:https://github.com/ev2900/Apache_Flink_via_Kinesis_Data_Analytics/tree/main/interactive_KDA_flink_zeppelin_notebook/Flink%20v1.13\n[12]:https://www.youtube.com/watch?v=pPCg6SWhv-0\n[13]:https://www.youtube.com/watch?v=5--oWB2udCc\n[14]:https://nightlies.apache.org/flink/flink-docs-master/docs/connectors/table/formats/raw/\n[15]:https://nightlies.apache.org/flink/flink-docs-release-1.13/docs/dev/table/concepts/versioned_tables/\n[16]:https://youtu.be/dO9GFcAy-YM\n[17]:https://nightlies.apache.org/flink/flink-docs-master/docs/connectors/table/datagen/\n[18]:https://github.com/ev2900/Flink_Kinesis_Data_Analytics/blob/main/interactive_KDA_flink_zeppelin_notebook/Flink%20v1.13/sql_1.13_DataGen.zpln\n[19]:https://github.com/ev2900/Flink_Kinesis_Data_Analytics/tree/main/kinesis_data_producer\n[20]:https://github.com/ev2900/Flink_Kinesis_Data_Analytics/blob/main/kinesis_data_producer/Nyc_Taxi_Produce_KDA_Zeppelin_Notebook.zpln\n[21]:https://youtu.be/0GO8drcWv3c\n[22]:https://youtu.be/oAQO8cmip7Q\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fev2900%2Fflink_kinesis_data_analytics","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fev2900%2Fflink_kinesis_data_analytics","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fev2900%2Fflink_kinesis_data_analytics/lists"}