Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/JagadeeshwaranM/Data_Engineering_Simplified


https://github.com/JagadeeshwaranM/Data_Engineering_Simplified

Last synced: about 2 months ago
JSON representation

Awesome Lists containing this project

README

        

**Data Engineering Roadmap**

1. Learn SQL...
Aggregations with GROUP BY
Joins (INNER, LEFT, FULL OUTER)
Window functions
Common table expressions etc.

You can learn from https://www.w3schools.com/

2. Learn python/Scala.....
Learn basics for/while/if loops,
functional programming, abstract methods, traits
Learn libraries like numpy, pandas, scikit-learn etc.

you can learn https://lnkd.in/gSz45km5

3. Learn distributed computing...
Hadoop versions/hadoop architecture
fault tolerance in hadoop
Read/understand about Mapreduce processing.
learn optimizations used in mapreduce etc.

4. Learn data ingestion tools...
Learn Sqoop/ Kafka/NIFi
Understand their functionality and job running mechanism.

5. Learn data processing/NOSQL....
Spark architecture/ RDD/Dataframes/datasets.
lazy evaluation, DAGs/ Lineage graph/optimization techniques
YARN utilization/ spark streaming etc.

6. Learn data warehousing.....
Understand how HIve store and process the data
different File formats/ compression Techniques.
partitioning/ Bucketing.
different UDF's available in Hive.
SCD concepts.
Ex Hbase. cassandra

7. Learn job Orchestration...
Learn Airflow/Oozie
learn about workflow/ CRON etc.

8. Learn Cloud Computing....
Learn Azure/AWS/ GCP.
understand the significance of Cloud in #dataengineering
Learn Azure synapse/Redshift/Big query
Learn Ingestion tools/pipeline tools like ADF etc.

9. Learn basics of CI/ CD and Linux commands....
Read about Kubernetes/Docker. And how crucial they are in data.