Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/JagadeeshwaranM/Data_Engineering_Simplified
https://github.com/JagadeeshwaranM/Data_Engineering_Simplified
Last synced: about 2 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/JagadeeshwaranM/Data_Engineering_Simplified
- Owner: JagadeeshwaranM
- Created: 2023-04-16T08:48:55.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2023-05-01T20:49:43.000Z (over 1 year ago)
- Last Synced: 2024-07-27T17:55:21.162Z (about 2 months ago)
- Language: Python
- Size: 10.1 MB
- Stars: 624
- Watchers: 28
- Forks: 141
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
**Data Engineering Roadmap**
1. Learn SQL...
Aggregations with GROUP BY
Joins (INNER, LEFT, FULL OUTER)
Window functions
Common table expressions etc.You can learn from https://www.w3schools.com/
2. Learn python/Scala.....
Learn basics for/while/if loops,
functional programming, abstract methods, traits
Learn libraries like numpy, pandas, scikit-learn etc.you can learn https://lnkd.in/gSz45km5
3. Learn distributed computing...
Hadoop versions/hadoop architecture
fault tolerance in hadoop
Read/understand about Mapreduce processing.
learn optimizations used in mapreduce etc.4. Learn data ingestion tools...
Learn Sqoop/ Kafka/NIFi
Understand their functionality and job running mechanism.5. Learn data processing/NOSQL....
Spark architecture/ RDD/Dataframes/datasets.
lazy evaluation, DAGs/ Lineage graph/optimization techniques
YARN utilization/ spark streaming etc.6. Learn data warehousing.....
Understand how HIve store and process the data
different File formats/ compression Techniques.
partitioning/ Bucketing.
different UDF's available in Hive.
SCD concepts.
Ex Hbase. cassandra7. Learn job Orchestration...
Learn Airflow/Oozie
learn about workflow/ CRON etc.8. Learn Cloud Computing....
Learn Azure/AWS/ GCP.
understand the significance of Cloud in #dataengineering
Learn Azure synapse/Redshift/Big query
Learn Ingestion tools/pipeline tools like ADF etc.9. Learn basics of CI/ CD and Linux commands....
Read about Kubernetes/Docker. And how crucial they are in data.