https://github.com/kenthsu/udacity-data-streaming-nanodegree
Udacity Data Streaming Nanodegree Program
https://github.com/kenthsu/udacity-data-streaming-nanodegree
apache-kafka data-streaming faust-application kafka-connect kafka-rest-proxy ksql spark-streaming
Last synced: 12 months ago
JSON representation
Udacity Data Streaming Nanodegree Program
- Host: GitHub
- URL: https://github.com/kenthsu/udacity-data-streaming-nanodegree
- Owner: KentHsu
- Created: 2021-02-17T05:54:15.000Z (over 5 years ago)
- Default Branch: main
- Last Pushed: 2021-02-20T08:16:03.000Z (over 5 years ago)
- Last Synced: 2025-04-06T04:32:02.387Z (about 1 year ago)
- Topics: apache-kafka, data-streaming, faust-application, kafka-connect, kafka-rest-proxy, ksql, spark-streaming
- Language: Python
- Homepage:
- Size: 624 KB
- Stars: 22
- Watchers: 1
- Forks: 12
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Udacity - Data Streaming Nanodegree Program
Building up the latest skills to process data in real-time by building fluency in modern data engineering tools, such as Apache Spark,Kafka, Spark Streaming, and Kafka Streaming.
* Understand the components of data streaming systems. Ingest data in real-time using Apache Kafka and Spark and run analysis.
* Use the Faust Stream Processing Python library to build a real-time stream-based application. Compile real-time data and run live analytics, as well as draw insights from reports generated by the streaming console.
* Learn about the Kafka ecosystem, and the types of problems each solution is designed to solve. Use the Confluent Kafka Python library for simple topic management, production, and consumption.
* Explain the components of Spark Streaming (architecture and API), integrate Apache Spark Structured Streaming and Apache Kafka, manipulate data using Spark, and read DataFrames in the Spark Streaming Console.
## Course 1 - Data Ingestion with Apache Kafka
Demonstrate knowledge of the tools data streaming tools including Kafka Consumers, Producers and Topics; Kafka Connect Sources and Sinks, Kafka REST Proxy for producing data over REST, Data Schemas with JSON and Apache Avro/Schema Registry, Stream Processing with the Faust Python Library, and Stream Processing with KSQL.
### Contents
* Introduction to Stream Processing
* Apache Kafka
* Data Schemas and Apache Avro
* Kafka Connect and REST Proxy
* Stream Processing Fundamentals
* Stream Processing with Faust
* KSQL
### Projects
* Optimize Chicago Bus and Train Availability Using Kafka
## Course 2 - Streaming API Development and Documentation
Grow expertise in streaming data systems and build a continuous application with Structured Streaming, consume and process data from Apache Kafka with Spark Structured Streaming, create a DataFrame as an aggregation of source DataFrames, sink a composite DataFrame to Kafka, and visually inspect a data sink for accuracy.
#### Contents
* Streaming DataFrames
* Joins and JSON
* Redis, Base64 and JSON
### Project
* Evaluate Human Balance with Spark Streaming