Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/ac-gomes/data-engineering-with-databricks

A simple boilerplate for data engineering and data analysis training in Databricks.
https://github.com/ac-gomes/data-engineering-with-databricks

data-analysis data-engineering databricks databricks-notebooks pyspark python unit-testing

Last synced: about 2 months ago
JSON representation

A simple boilerplate for data engineering and data analysis training in Databricks.

Awesome Lists containing this project

README

        

## Project Overview
This template was developed to help me in my learning when I started studying Databrick/Spark, but now I'm making it available to provide a good experience for other absolute beginners. You can start it even if you don't know how to create a dataframe or even a new directory to write files, tables and databases.

## What does this template do?
- Create 3 pySpark DataFrames for relational data transformation practice
- Create 4 folders to write data ```[current user directory, raw, structured, curated]```
- Create 1 databese in the strutured zone
- You can see the source code and learn from it
- Reset the Environment (has the functions to clean your environment)
- How to create tables (in Hive database), see the 04-Table_Reference notebook
- Python Unit Testing with unittest on Databricks

## Notebooks
1. [Config-DataFrame](https://github.com/ac-gomes/data-engineering-with-databricks/blob/master/data-engineering/Includes/Config-DataFrame.ipynb)
1. [Config-Directories](https://github.com/ac-gomes/data-engineering-with-databricks/blob/master/data-engineering/Includes/Config-Directories.ipynb)
1. [Config-Database](https://github.com/ac-gomes/data-engineering-with-databricks/blob/master/data-engineering/Includes/Config-Database.ipynb)
1. [common](https://github.com/ac-gomes/data-engineering-with-databricks/blob/master/data-engineering/Includes/common.ipynb)
1. [Reset-Environment](https://github.com/ac-gomes/data-engineering-with-databricks/blob/master/data-engineering/Includes/Reset-Environment.ipynb)
1. [Helpers](https://github.com/ac-gomes/data-engineering-with-databricks/blob/master/data-engineering/Includes/Helpers.ipynb)
1. [Test](https://github.com/ac-gomes/data-engineering-with-databricks/blob/master/data-engineering/Test/Test.ipynb)
1. [Test_Runner](https://github.com/ac-gomes/data-engineering-with-databricks/blob/master/data-engineering/Test/Test_Runner.ipynb)
1. [01-Training_Python](https://github.com/ac-gomes/data-engineering-with-databricks/blob/master/data-engineering/Training/01-Training_Python.ipynb)
1. [02-Table-Reference](https://github.com/ac-gomes/data-engineering-with-databricks/blob/master/data-engineering/02-Table-Reference.ipynb)

## How to use it?
- Just import the data-engineering.dbc file in your [Databricks Community account ](https://community.cloud.databricks.com/) and run the ```01-Training_Python``` notebook.

## Feel free to contribute 😃

## Enjoy it!