{"id":15195474,"url":"https://github.com/sibendud/csi_2024_dataengineering","last_synced_at":"2026-03-07T09:32:25.165Z","repository":{"id":248754356,"uuid":"829599606","full_name":"SibenduD/CSI_2024_DataEngineering","owner":"SibenduD","description":"Internship on Data Engineering where below topics are applied skills that are used to complete the given tasks through out 8 weeks including the project.","archived":false,"fork":false,"pushed_at":"2024-07-18T14:53:51.000Z","size":3750,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-02-13T01:34:58.635Z","etag":null,"topics":["adf","adls","azure-pipelines","databricks","docker","ipynb","json","numpy","pandas-python","parquet-avro","pipeline","pyspark","python","sql","sql-server"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/SibenduD.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-07-16T19:00:03.000Z","updated_at":"2024-07-18T14:53:54.000Z","dependencies_parsed_at":"2024-07-22T01:33:57.461Z","dependency_job_id":null,"html_url":"https://github.com/SibenduD/CSI_2024_DataEngineering","commit_stats":null,"previous_names":["sibendud/csi_2024_dataengineering"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SibenduD%2FCSI_2024_DataEngineering","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SibenduD%2FCSI_2024_DataEngineering/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SibenduD%2FCSI_2024_DataEngineering/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SibenduD%2FCSI_2024_DataEngineering/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/SibenduD","download_url":"https://codeload.github.com/SibenduD/CSI_2024_DataEngineering/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":242751909,"owners_count":20179421,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["adf","adls","azure-pipelines","databricks","docker","ipynb","json","numpy","pandas-python","parquet-avro","pipeline","pyspark","python","sql","sql-server"],"created_at":"2024-09-27T23:24:13.937Z","updated_at":"2026-03-07T09:32:25.124Z","avatar_url":"https://github.com/SibenduD.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Celebal Summer Internship - 2024 (MAY - JULY)\n\n# WEEK1 \u0026 WEEK2 \n**Task Assigned**\n- `Problem Solving in Python on Hackerrank.`\n- `Total 10 + 10 Questions.`\n- `Saving file in local machine and uploading a .zip file.`\n\n\n# WEEK3 \u0026 WEEK4\n**Task Assigned**\n- `Problem Solving in SQL on Hackerrank.`\n- `Total 10 + 10 Questions.`\n- `Saving file in local machine and uploading a .zip file.`\n\n\n# WEEK5\n**Task Assigned**\n- `Advanced concept using existing SQL databases , Scheduling Triggers and using of Pipelines to automate the process.`\n- `Full task uploaded` [week5_task](https://github.com/SibenduD/CSI_2024_DataEngineering/tree/main/week5) `use the method to run the scheduler`\n\n# WEEK6\n**Task Assigned**\n- `Concept on ADF - Azure Data Factory`\n- `Configuring FTP and creating incremental Pipeline to automate the given task.`\n- `Full task uploaded`  [week6_task](https://github.com/SibenduD/CSI_2024_DataEngineering/tree/main/week6) `do the same explained in the DOC`\n\n# WEEK7\n**Task Assigned**\n- `Explained task into the file , the task1 directs to load the file`\n- `Task2 is impossible to do without any subscription of Azure , that is also explained why throudh a doc file`\n- `Both python script \u0026 DOC file` [week7_tasks](https://github.com/SibenduD/CSI_2024_DataEngineering/tree/main/week7)\n\n# WEEK8\n**Task Assigned**\n- `NYC Taxi Dataset Analysis`\n- `Loading dataset into DBFS , Flatten JSON fields , Writing flattened file as external parquet table`\n- `Full process` [week8_tasks](https://github.com/SibenduD/CSI_2024_DataEngineering/tree/main/week8)\n\n# PROJECT\n- `Info about project` [Project_Info](https://drive.google.com/drive/folders/1AUrcdQkk6MW2v-fvkAVDlOhEKolb7Z1I)\n- `Done` [Project](https://github.com/SibenduD/CSI_2024_DataEngineering/tree/main/Project)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsibendud%2Fcsi_2024_dataengineering","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsibendud%2Fcsi_2024_dataengineering","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsibendud%2Fcsi_2024_dataengineering/lists"}