https://github.com/maprihoda/learning-spark
https://github.com/maprihoda/learning-spark
apache-spark data-analysis data-science data-wrangling machine-learning pyspark python
Last synced: about 1 month ago
JSON representation
- Host: GitHub
- URL: https://github.com/maprihoda/learning-spark
- Owner: maprihoda
- Created: 2020-12-21T19:28:50.000Z (over 5 years ago)
- Default Branch: master
- Last Pushed: 2020-12-21T19:30:36.000Z (over 5 years ago)
- Last Synced: 2025-03-29T17:44:03.436Z (about 1 year ago)
- Topics: apache-spark, data-analysis, data-science, data-wrangling, machine-learning, pyspark, python
- Language: Python
- Homepage:
- Size: 37.1 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
Following [Learning Spark](https://github.com/databricks/LearningSparkV2), while adding adding my own modifications, code snippets and elaborating exercises.
Download the datasets from https://github.com/databricks/LearningSparkV2/subfolder and place them somewhere on your local hard drive, then modify DATA_DIRECTORY in setting.py. Mine is
```python
DATA_DIRECTORY = os.path.join(os.environ["HOME"], "data", "learning-spark").
```