{"id":19078599,"url":"https://github.com/ac-gomes/data-engineering-with-databricks","last_synced_at":"2025-04-30T04:50:20.806Z","repository":{"id":159067808,"uuid":"477940790","full_name":"ac-gomes/data-engineering-with-databricks","owner":"ac-gomes","description":"A simple boilerplate for data engineering and data analysis training in Databricks.","archived":false,"fork":false,"pushed_at":"2023-09-30T00:08:37.000Z","size":74,"stargazers_count":3,"open_issues_count":0,"forks_count":3,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-04-30T04:50:04.983Z","etag":null,"topics":["data-analysis","data-engineering","databricks","databricks-notebooks","pyspark","python","unit-testing"],"latest_commit_sha":null,"homepage":"","language":null,"has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ac-gomes.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2022-04-05T01:59:43.000Z","updated_at":"2024-11-12T12:34:25.000Z","dependencies_parsed_at":"2024-11-09T02:11:01.029Z","dependency_job_id":"e2bd3efe-d0ed-4e2b-a42b-f685471a47b9","html_url":"https://github.com/ac-gomes/data-engineering-with-databricks","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ac-gomes%2Fdata-engineering-with-databricks","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ac-gomes%2Fdata-engineering-with-databricks/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ac-gomes%2Fdata-engineering-with-databricks/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ac-gomes%2Fdata-engineering-with-databricks/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ac-gomes","download_url":"https://codeload.github.com/ac-gomes/data-engineering-with-databricks/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":251644827,"owners_count":21620630,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["data-analysis","data-engineering","databricks","databricks-notebooks","pyspark","python","unit-testing"],"created_at":"2024-11-09T02:10:55.240Z","updated_at":"2025-04-30T04:50:20.774Z","avatar_url":"https://github.com/ac-gomes.png","language":null,"funding_links":[],"categories":[],"sub_categories":[],"readme":"## Project Overview\nThis template was developed to help me in my learning when I started studying Databrick/Spark, but now I'm making it available to provide a good experience for other absolute beginners. You can start it even if you don't know how to create a dataframe or even a new directory to write files, tables and databases.\n\n## What does this template do?\n- Create 3 pySpark DataFrames for relational data transformation practice\n- Create 4 folders to write data ```[current user directory, raw, structured, curated]```\n- Create 1 databese in the strutured zone\n- You can see the source code and learn from it\n- Reset the Environment (has the functions to clean your environment)\n- How to create tables (in Hive database), see the 04-Table_Reference notebook\n- Python Unit Testing with unittest on Databricks\n\n## Notebooks\n1. [Config-DataFrame](https://github.com/ac-gomes/data-engineering-with-databricks/blob/master/data-engineering/Includes/Config-DataFrame.ipynb)\n1. [Config-Directories](https://github.com/ac-gomes/data-engineering-with-databricks/blob/master/data-engineering/Includes/Config-Directories.ipynb)\n1. [Config-Database](https://github.com/ac-gomes/data-engineering-with-databricks/blob/master/data-engineering/Includes/Config-Database.ipynb)\n1. [common](https://github.com/ac-gomes/data-engineering-with-databricks/blob/master/data-engineering/Includes/common.ipynb)\n1. [Reset-Environment](https://github.com/ac-gomes/data-engineering-with-databricks/blob/master/data-engineering/Includes/Reset-Environment.ipynb)\n1. [Helpers](https://github.com/ac-gomes/data-engineering-with-databricks/blob/master/data-engineering/Includes/Helpers.ipynb)\n1. [Test](https://github.com/ac-gomes/data-engineering-with-databricks/blob/master/data-engineering/Test/Test.ipynb)\n1. [Test_Runner](https://github.com/ac-gomes/data-engineering-with-databricks/blob/master/data-engineering/Test/Test_Runner.ipynb)\n1. [01-Training_Python](https://github.com/ac-gomes/data-engineering-with-databricks/blob/master/data-engineering/Training/01-Training_Python.ipynb)\n1. [02-Table-Reference](https://github.com/ac-gomes/data-engineering-with-databricks/blob/master/data-engineering/02-Table-Reference.ipynb)\n\n\n## How to use it?\n- Just import the data-engineering.dbc file in your [Databricks Community account ](https://community.cloud.databricks.com/) and run the ```01-Training_Python``` notebook.\n\n\n## Feel free to contribute 😃\n\n\n## Enjoy it!","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fac-gomes%2Fdata-engineering-with-databricks","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fac-gomes%2Fdata-engineering-with-databricks","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fac-gomes%2Fdata-engineering-with-databricks/lists"}