Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/sibendud/csi_2024_dataengineering

Internship on Data Engineering where below topics are applied skills that are used to complete the given tasks through out 8 weeks including the project.
https://github.com/sibendud/csi_2024_dataengineering

adf adls azure-pipelines databricks docker ipynb json numpy pandas-python parquet-avro pipeline pyspark python sql sql-server

Last synced: about 1 month ago
JSON representation

Internship on Data Engineering where below topics are applied skills that are used to complete the given tasks through out 8 weeks including the project.

Awesome Lists containing this project

README

        

# Celebal Summer Internship - 2024 (MAY - JULY)

# WEEK1 & WEEK2
**Task Assigned**
- `Problem Solving in Python on Hackerrank.`
- `Total 10 + 10 Questions.`
- `Saving file in local machine and uploading a .zip file.`

# WEEK3 & WEEK4
**Task Assigned**
- `Problem Solving in SQL on Hackerrank.`
- `Total 10 + 10 Questions.`
- `Saving file in local machine and uploading a .zip file.`

# WEEK5
**Task Assigned**
- `Advanced concept using existing SQL databases , Scheduling Triggers and using of Pipelines to automate the process.`
- `Full task uploaded` [week5_task](https://github.com/SibenduD/CSI_2024_DataEngineering/tree/main/week5) `use the method to run the scheduler`

# WEEK6
**Task Assigned**
- `Concept on ADF - Azure Data Factory`
- `Configuring FTP and creating incremental Pipeline to automate the given task.`
- `Full task uploaded` [week6_task](https://github.com/SibenduD/CSI_2024_DataEngineering/tree/main/week6) `do the same explained in the DOC`

# WEEK7
**Task Assigned**
- `Explained task into the file , the task1 directs to load the file`
- `Task2 is impossible to do without any subscription of Azure , that is also explained why throudh a doc file`
- `Both python script & DOC file` [week7_tasks](https://github.com/SibenduD/CSI_2024_DataEngineering/tree/main/week7)

# WEEK8
**Task Assigned**
- `NYC Taxi Dataset Analysis`
- `Loading dataset into DBFS , Flatten JSON fields , Writing flattened file as external parquet table`
- `Full process` [week8_tasks](https://github.com/SibenduD/CSI_2024_DataEngineering/tree/main/week8)

# PROJECT
- `Info about project` [Project_Info](https://drive.google.com/drive/folders/1AUrcdQkk6MW2v-fvkAVDlOhEKolb7Z1I)
- `Done` [Project](https://github.com/SibenduD/CSI_2024_DataEngineering/tree/main/Project)