https://github.com/korniichuk/pyspark
The pyspark stack of ready-to-run Apache PySpark in Docker
https://github.com/korniichuk/pyspark
Last synced: 7 months ago
JSON representation
The pyspark stack of ready-to-run Apache PySpark in Docker
- Host: GitHub
- URL: https://github.com/korniichuk/pyspark
- Owner: korniichuk
- License: unlicense
- Created: 2016-11-11T02:53:28.000Z (almost 9 years ago)
- Default Branch: master
- Last Pushed: 2018-05-31T18:11:07.000Z (over 7 years ago)
- Last Synced: 2025-01-17T08:28:42.447Z (9 months ago)
- Homepage:
- Size: 8.79 KB
- Stars: 2
- Watchers: 2
- Forks: 4
- Open Issues: 1
-
Metadata Files:
- Readme: README.rst
- License: LICENSE
Awesome Lists containing this project
README
.. contents:: Table of contents
:depth: 2Short Description
=================
Apache PySparkFull description
================
The `ubuntu:xenial `_ Docker image with Apache PySpark for the dataops utility.GitHub
======
The `korniichuk/pyspark `_ repo.Docker Hub
==========
The `korniichuk/pyspark `_ repo.Quickstart
==========
Bash
----
Start a container with interactive Bash shell::$ docker run -it korniichuk/pyspark bash
PySpark
-------
Start a container with interactive PySpark shell::$ docker run -it korniichuk/pyspark \
/usr/local/src/spark-2.0.1-bin-hadoop2.7/bin/pysparkTry the following command, which should return 1000::
>>> sc.parallelize(range(1000)).count()
Python
------
Start a container with interactive Python shell::$ docker run -it korniichuk/pyspark python
>>> from pyspark import SparkConf, SparkContext
>>> conf = SparkConf().setMaster("local[*]")
>>> sc = SparkContext(conf=conf)And run the following command, which should also return 1000::
>>> sc.parallelize(range(1000)).count()