An open API service indexing awesome lists of open source software.

https://github.com/hiejulia/data-engineer


https://github.com/hiejulia/data-engineer

Last synced: 6 months ago
JSON representation

Awesome Lists containing this project

README

          

Buy Me A Coffee


# data-engineer

## Format data
- CSV, json, avro, RCFile, Parquet

## jars
- avro
- pig-udf
- piggybank
- json-simple
- jackson-mapper-asl
- elephant-bird-hadoop

##

## Join
- broadcast join

## Links
- localhost:50070
- localhost:50070/explorer.html/user/hive/

## How to run
- Run python script
- Run Spark program
- spark-submit .py
- Run MR job

## Batch processing

## Real time processing

## ML model API