An open API service indexing awesome lists of open source software.

https://github.com/maprihoda/learning-spark


https://github.com/maprihoda/learning-spark

apache-spark data-analysis data-science data-wrangling machine-learning pyspark python

Last synced: about 1 month ago
JSON representation

Awesome Lists containing this project

README

          

Following [Learning Spark](https://github.com/databricks/LearningSparkV2), while adding adding my own modifications, code snippets and elaborating exercises.

Download the datasets from https://github.com/databricks/LearningSparkV2/subfolder and place them somewhere on your local hard drive, then modify DATA_DIRECTORY in setting.py. Mine is

```python
DATA_DIRECTORY = os.path.join(os.environ["HOME"], "data", "learning-spark").
```