https://github.com/korniichuk/pyspark

The pyspark stack of ready-to-run Apache PySpark in Docker
https://github.com/korniichuk/pyspark

Last synced: 7 months ago
JSON representation

The pyspark stack of ready-to-run Apache PySpark in Docker

Host: GitHub
URL: https://github.com/korniichuk/pyspark
Owner: korniichuk
License: unlicense
Created: 2016-11-11T02:53:28.000Z (almost 9 years ago)
Default Branch: master
Last Pushed: 2018-05-31T18:11:07.000Z (over 7 years ago)
Last Synced: 2025-01-17T08:28:42.447Z (9 months ago)
Homepage:
Size: 8.79 KB
Stars: 2
Watchers: 2
Forks: 4
Open Issues: 1
Metadata Files:
- Readme: README.rst
- License: LICENSE

Awesome Lists containing this project

README

          .. contents:: Table of contents

   :depth: 2

Short Description

=================

Apache PySpark

Full description

================

The `ubuntu:xenial `_ Docker image with Apache PySpark for the dataops utility.

GitHub

======

The `korniichuk/pyspark `_ repo.

Docker Hub

==========

The `korniichuk/pyspark `_ repo.

Quickstart

==========

Bash

----

Start a container with interactive Bash shell::

    $ docker run -it korniichuk/pyspark bash

PySpark

-------

Start a container with interactive PySpark shell::

    $ docker run -it korniichuk/pyspark \

            /usr/local/src/spark-2.0.1-bin-hadoop2.7/bin/pyspark

Try the following command, which should return 1000::

    >>> sc.parallelize(range(1000)).count()

Python

------

Start a container with interactive Python shell::

    $ docker run -it korniichuk/pyspark python

    >>> from pyspark import SparkConf, SparkContext

    >>> conf = SparkConf().setMaster("local[*]")

    >>> sc = SparkContext(conf=conf)

And run the following command, which should also return 1000::

    >>> sc.parallelize(range(1000)).count()

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/korniichuk/pyspark

Awesome Lists containing this project

README