https://github.com/fpopic/hf-interview-challenge
(Interview) Mixin Data Engineering & Data Science with PySpark
https://github.com/fpopic/hf-interview-challenge
data-engineering data-science pyspark python recipes spark
Last synced: 3 months ago
JSON representation
(Interview) Mixin Data Engineering & Data Science with PySpark
- Host: GitHub
- URL: https://github.com/fpopic/hf-interview-challenge
- Owner: fpopic
- Created: 2017-09-02T08:08:26.000Z (over 7 years ago)
- Default Branch: master
- Last Pushed: 2017-09-02T08:09:11.000Z (over 7 years ago)
- Last Synced: 2025-01-10T19:42:13.499Z (4 months ago)
- Topics: data-engineering, data-science, pyspark, python, recipes, spark
- Language: HTML
- Homepage:
- Size: 282 KB
- Stars: 1
- Watchers: 3
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
1. Run the shopping cart python code (local):
```
cd shop
python3 main.py
```2. Run the recipes job with Spark (standalone):
```
cd recipes_etl
spark-submit spark_recipes_job.py --input data/recipes.json --output beefs.parquet
```3. Run the recipes job with Spark (YARN):
```
cd recipes_etl
spark-submit spark_recipes_job.py --master yarn --deploy-mode client --input data/recipes.json --output beefs.parquet
```
This should be tested on a real cluster!4. Run the recipes job in a Jupyter notebook:
```
cd recipes_etl
jupyter notebook recipes_notebook.ipynb
```