An open API service indexing awesome lists of open source software.

https://github.com/kenthsu/udacity-data-streaming-nanodegree

Udacity Data Streaming Nanodegree Program
https://github.com/kenthsu/udacity-data-streaming-nanodegree

apache-kafka data-streaming faust-application kafka-connect kafka-rest-proxy ksql spark-streaming

Last synced: 12 months ago
JSON representation

Udacity Data Streaming Nanodegree Program

Awesome Lists containing this project

README

          

# Udacity - Data Streaming Nanodegree Program

Building up the latest skills to process data in real-time by building fluency in modern data engineering tools, such as Apache Spark,Kafka, Spark Streaming, and Kafka Streaming.

* Understand the components of data streaming systems. Ingest data in real-time using Apache Kafka and Spark and run analysis.
* Use the Faust Stream Processing Python library to build a real-time stream-based application. Compile real-time data and run live analytics, as well as draw insights from reports generated by the streaming console.
* Learn about the Kafka ecosystem, and the types of problems each solution is designed to solve. Use the Confluent Kafka Python library for simple topic management, production, and consumption.
* Explain the components of Spark Streaming (architecture and API), integrate Apache Spark Structured Streaming and Apache Kafka, manipulate data using Spark, and read DataFrames in the Spark Streaming Console.

## Course 1 - Data Ingestion with Apache Kafka
Demonstrate knowledge of the tools data streaming tools including Kafka Consumers, Producers and Topics; Kafka Connect Sources and Sinks, Kafka REST Proxy for producing data over REST, Data Schemas with JSON and Apache Avro/Schema Registry, Stream Processing with the Faust Python Library, and Stream Processing with KSQL.

### Contents
* Introduction to Stream Processing
* Apache Kafka
* Data Schemas and Apache Avro
* Kafka Connect and REST Proxy
* Stream Processing Fundamentals
* Stream Processing with Faust
* KSQL

### Projects
* Optimize Chicago Bus and Train Availability Using Kafka

## Course 2 - Streaming API Development and Documentation
Grow expertise in streaming data systems and build a continuous application with Structured Streaming, consume and process data from Apache Kafka with Spark Structured Streaming, create a DataFrame as an aggregation of source DataFrames, sink a composite DataFrame to Kafka, and visually inspect a data sink for accuracy.

#### Contents
* Streaming DataFrames
* Joins and JSON
* Redis, Base64 and JSON

### Project
* Evaluate Human Balance with Spark Streaming