https://github.com/hiejulia/data-engineer
https://github.com/hiejulia/data-engineer
Last synced: 6 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/hiejulia/data-engineer
- Owner: hiejulia
- Created: 2020-09-02T14:43:27.000Z (about 5 years ago)
- Default Branch: master
- Last Pushed: 2020-10-04T19:32:51.000Z (about 5 years ago)
- Last Synced: 2025-02-08T21:46:22.079Z (8 months ago)
- Language: Jupyter Notebook
- Size: 1.43 MB
- Stars: 0
- Watchers: 1
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
## Format data
- CSV, json, avro, RCFile, Parquet## jars
- avro
- pig-udf
- piggybank
- json-simple
- jackson-mapper-asl
- elephant-bird-hadoop##
## Join
- broadcast join## Links
- localhost:50070
- localhost:50070/explorer.html/user/hive/## How to run
- Run python script
- Run Spark program
- spark-submit .py
- Run MR job## Batch processing
## Real time processing
## ML model API