https://github.com/taiwotman/pysparkstreaming

Demonstrate pyspark structured programming using template design pattern
https://github.com/taiwotman/pysparkstreaming

analytics data pyspark streaming wordcount

Last synced: 4 months ago
JSON representation

Demonstrate pyspark structured programming using template design pattern

Host: GitHub
URL: https://github.com/taiwotman/pysparkstreaming
Owner: taiwotman
Created: 2018-09-09T02:12:39.000Z (almost 8 years ago)
Default Branch: master
Last Pushed: 2018-09-14T17:21:37.000Z (almost 8 years ago)
Last Synced: 2025-04-04T09:44:48.654Z (over 1 year ago)
Topics: analytics, data, pyspark, streaming, wordcount
Homepage:
Size: 20.5 KB
Stars: 1
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# Pyspark Streaming

**An example PySpark project.**

The project demonstrate pyspark structured programming using template design pattern

**Upgrade python**

curl https://bootstrap.pypa.io/get-pip.py | python

**Running Spark job with spark-submit on Command line**

mkdir ./dist
cp ./src/main.py ../dist/
cd ./src && zip -x main.py -r ../dist/wordcount .

source venv/bin/activate

**To run this PySpark Streaming application, execute the following command from your $SPARK_HOME folder:**

./bin/spark-submit wordcount.py localhost 9999

**To begin the streaming, on the command line, type the following netcat command:**

nc -lk 9999

**Then, start typing your events, for example:**

For the first second, type

apple apple apple

For the second second, type

orange orange apple

Wait a second; for the fourth second, type

mango mango mang

**Reference**

Structured Streaming Programming Guide: https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#creating-streaming-dataframes-and-streaming-datasets

Creating a PySpark project with pytest, pyenv, and egg files: https://medium.com/@mrpowers/creating-a-pyspark-project-with-pytest-pyenv-and-egg-files-d2709eb1604c

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/taiwotman/pysparkstreaming

Awesome Lists containing this project

README