Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/maprihoda/learning-spark
https://github.com/maprihoda/learning-spark
apache-spark data-analysis data-science data-wrangling machine-learning pyspark python
Last synced: 8 days ago
JSON representation
- Host: GitHub
- URL: https://github.com/maprihoda/learning-spark
- Owner: maprihoda
- Created: 2020-12-21T19:28:50.000Z (about 4 years ago)
- Default Branch: master
- Last Pushed: 2020-12-21T19:30:36.000Z (about 4 years ago)
- Last Synced: 2024-12-31T17:01:50.442Z (25 days ago)
- Topics: apache-spark, data-analysis, data-science, data-wrangling, machine-learning, pyspark, python
- Language: Python
- Homepage:
- Size: 37.1 KB
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
Following [Learning Spark](https://github.com/databricks/LearningSparkV2), while adding adding my own modifications, code snippets and elaborating exercises.
Download the datasets from https://github.com/databricks/LearningSparkV2/subfolder and place them somewhere on your local hard drive, then modify DATA_DIRECTORY in setting.py. Mine is
```python
DATA_DIRECTORY = os.path.join(os.environ["HOME"], "data", "learning-spark").
```