Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/ac-gomes/data-engineering-with-databricks
A simple boilerplate for data engineering and data analysis training in Databricks.
https://github.com/ac-gomes/data-engineering-with-databricks
data-analysis data-engineering databricks databricks-notebooks pyspark python unit-testing
Last synced: about 2 months ago
JSON representation
A simple boilerplate for data engineering and data analysis training in Databricks.
- Host: GitHub
- URL: https://github.com/ac-gomes/data-engineering-with-databricks
- Owner: ac-gomes
- License: mit
- Created: 2022-04-05T01:59:43.000Z (almost 3 years ago)
- Default Branch: master
- Last Pushed: 2023-09-30T00:08:37.000Z (over 1 year ago)
- Last Synced: 2023-09-30T03:46:15.623Z (over 1 year ago)
- Topics: data-analysis, data-engineering, databricks, databricks-notebooks, pyspark, python, unit-testing
- Homepage:
- Size: 72.3 KB
- Stars: 2
- Watchers: 1
- Forks: 2
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
## Project Overview
This template was developed to help me in my learning when I started studying Databrick/Spark, but now I'm making it available to provide a good experience for other absolute beginners. You can start it even if you don't know how to create a dataframe or even a new directory to write files, tables and databases.## What does this template do?
- Create 3 pySpark DataFrames for relational data transformation practice
- Create 4 folders to write data ```[current user directory, raw, structured, curated]```
- Create 1 databese in the strutured zone
- You can see the source code and learn from it
- Reset the Environment (has the functions to clean your environment)
- How to create tables (in Hive database), see the 04-Table_Reference notebook
- Python Unit Testing with unittest on Databricks## Notebooks
1. [Config-DataFrame](https://github.com/ac-gomes/data-engineering-with-databricks/blob/master/data-engineering/Includes/Config-DataFrame.ipynb)
1. [Config-Directories](https://github.com/ac-gomes/data-engineering-with-databricks/blob/master/data-engineering/Includes/Config-Directories.ipynb)
1. [Config-Database](https://github.com/ac-gomes/data-engineering-with-databricks/blob/master/data-engineering/Includes/Config-Database.ipynb)
1. [common](https://github.com/ac-gomes/data-engineering-with-databricks/blob/master/data-engineering/Includes/common.ipynb)
1. [Reset-Environment](https://github.com/ac-gomes/data-engineering-with-databricks/blob/master/data-engineering/Includes/Reset-Environment.ipynb)
1. [Helpers](https://github.com/ac-gomes/data-engineering-with-databricks/blob/master/data-engineering/Includes/Helpers.ipynb)
1. [Test](https://github.com/ac-gomes/data-engineering-with-databricks/blob/master/data-engineering/Test/Test.ipynb)
1. [Test_Runner](https://github.com/ac-gomes/data-engineering-with-databricks/blob/master/data-engineering/Test/Test_Runner.ipynb)
1. [01-Training_Python](https://github.com/ac-gomes/data-engineering-with-databricks/blob/master/data-engineering/Training/01-Training_Python.ipynb)
1. [02-Table-Reference](https://github.com/ac-gomes/data-engineering-with-databricks/blob/master/data-engineering/02-Table-Reference.ipynb)## How to use it?
- Just import the data-engineering.dbc file in your [Databricks Community account ](https://community.cloud.databricks.com/) and run the ```01-Training_Python``` notebook.## Feel free to contribute 😃
## Enjoy it!