Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/sibendud/csi_2024_dataengineering
Internship on Data Engineering where below topics are applied skills that are used to complete the given tasks through out 8 weeks including the project.
https://github.com/sibendud/csi_2024_dataengineering
adf adls azure-pipelines databricks docker ipynb json numpy pandas-python parquet-avro pipeline pyspark python sql sql-server
Last synced: about 1 month ago
JSON representation
Internship on Data Engineering where below topics are applied skills that are used to complete the given tasks through out 8 weeks including the project.
- Host: GitHub
- URL: https://github.com/sibendud/csi_2024_dataengineering
- Owner: SibenduD
- Created: 2024-07-16T19:00:03.000Z (4 months ago)
- Default Branch: main
- Last Pushed: 2024-07-18T14:53:51.000Z (4 months ago)
- Last Synced: 2024-10-11T15:24:52.161Z (about 1 month ago)
- Topics: adf, adls, azure-pipelines, databricks, docker, ipynb, json, numpy, pandas-python, parquet-avro, pipeline, pyspark, python, sql, sql-server
- Language: Jupyter Notebook
- Homepage:
- Size: 3.58 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Celebal Summer Internship - 2024 (MAY - JULY)
# WEEK1 & WEEK2
**Task Assigned**
- `Problem Solving in Python on Hackerrank.`
- `Total 10 + 10 Questions.`
- `Saving file in local machine and uploading a .zip file.`# WEEK3 & WEEK4
**Task Assigned**
- `Problem Solving in SQL on Hackerrank.`
- `Total 10 + 10 Questions.`
- `Saving file in local machine and uploading a .zip file.`# WEEK5
**Task Assigned**
- `Advanced concept using existing SQL databases , Scheduling Triggers and using of Pipelines to automate the process.`
- `Full task uploaded` [week5_task](https://github.com/SibenduD/CSI_2024_DataEngineering/tree/main/week5) `use the method to run the scheduler`# WEEK6
**Task Assigned**
- `Concept on ADF - Azure Data Factory`
- `Configuring FTP and creating incremental Pipeline to automate the given task.`
- `Full task uploaded` [week6_task](https://github.com/SibenduD/CSI_2024_DataEngineering/tree/main/week6) `do the same explained in the DOC`# WEEK7
**Task Assigned**
- `Explained task into the file , the task1 directs to load the file`
- `Task2 is impossible to do without any subscription of Azure , that is also explained why throudh a doc file`
- `Both python script & DOC file` [week7_tasks](https://github.com/SibenduD/CSI_2024_DataEngineering/tree/main/week7)# WEEK8
**Task Assigned**
- `NYC Taxi Dataset Analysis`
- `Loading dataset into DBFS , Flatten JSON fields , Writing flattened file as external parquet table`
- `Full process` [week8_tasks](https://github.com/SibenduD/CSI_2024_DataEngineering/tree/main/week8)# PROJECT
- `Info about project` [Project_Info](https://drive.google.com/drive/folders/1AUrcdQkk6MW2v-fvkAVDlOhEKolb7Z1I)
- `Done` [Project](https://github.com/SibenduD/CSI_2024_DataEngineering/tree/main/Project)