https://github.com/datitran/spark-tdd-example

A simple Spark TDD example
https://github.com/datitran/spark-tdd-example

pyspark python spark tdd

Last synced: about 2 months ago
JSON representation

A simple Spark TDD example

Host: GitHub
URL: https://github.com/datitran/spark-tdd-example
Owner: datitran
License: mit
Created: 2016-03-20T16:57:38.000Z (over 9 years ago)
Default Branch: master
Last Pushed: 2017-09-19T03:34:07.000Z (almost 8 years ago)
Last Synced: 2025-05-01T02:48:20.557Z (about 2 months ago)
Topics: pyspark, python, spark, tdd
Language: Jupyter Notebook
Size: 30.3 KB
Stars: 26
Watchers: 5
Forks: 12
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

        # A simple PySpark example using TDD

This is a very basic example of how to use Test Driven Development (TDD) in the context of PySpark, Spark's Python API.

### Getting Started

1. Use brew to install Apache Spark: `brew install apache-spark`

2. Change logging settings:

  - `cd /usr/local/Cellar/apache-spark/2.1.0/libexec/conf`

  - `cp log4j.properties.template log4j.properties`

  - Set info to error: `log4j.rootCategory=ERROR, console`

3. Add this to your bash profile: `export SPARK_HOME="/usr/local/Cellar/apache-spark/2.1.0/libexec/"`

4. Use nosetests to run the test: `nosetests -vs test_clustering.py`

## Dependencies

- [Apache Spark](http://spark.apache.org/) Spark 2.1.0

- [Python](https://www.python.org/) Python 3.5

- [nosetests](http://nose.readthedocs.io/en/latest/) nose 1.3.7

## Copyright

See [LICENSE](LICENSE) for details.

Copyright (c) 2017 [Dat Tran](http://www.dat-tran.com/).

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/datitran/spark-tdd-example

Awesome Lists containing this project

README