An open API service indexing awesome lists of open source software.

https://github.com/nabilshadman/etl-python-sql-airflow

Build ETL pipelines using Python, Pandas, and SQL to extract, transform, and load data from sources, with exercises and job scheduling practices
https://github.com/nabilshadman/etl-python-sql-airflow

airflow data-engineering data-pipeline etl python sql

Last synced: about 2 months ago
JSON representation

Build ETL pipelines using Python, Pandas, and SQL to extract, transform, and load data from sources, with exercises and job scheduling practices

Awesome Lists containing this project

README

          

# ETL in Python and SQL

## Course Overview
This repository contains the course materials for [**ETL in Python and SQL**](https://www.linkedin.com/learning/etl-in-python-and-sql/), taught by Jennifer Ebe, a data engineer with over 5 years of experience. The course is designed to help you build robust systems that gather, transform, and store data efficiently for actionable insights. It focuses on:

- **Extracting Data**: Using Python to gather data from various sources.
- **Transforming Data**: Leveraging tools like pandas and SQL to explore, clean, and standardize data.
- **Loading Data**: Storing processed data in target systems.
- **Scheduling Jobs**: Automating ETL workflows with Python.

Each chapter includes hands-on challenges to reinforce learning through practical experience.

---

## Repository Structure

```plaintext
etl-python-sql-airflow/
├── Chapter_1/ # Introduction to ETL concepts and Python basics
│ ├── *.ipynb # Jupyter notebooks for code demonstrations
│ └── sample_data.csv # Sample data for exercises
├── Chapter_2/ # Extracting and transforming data
│ ├── *.ipynb # Notebooks for practical ETL tasks
│ └── *.xlsx, *.csv # Example datasets
├── Chapter_3/ # Data validation and relational modeling
│ └── *.ipynb # Advanced ETL tasks and data checks
├── Chapter_4/ # Scheduling and automation
│ ├── *.ipynb # Scheduling tutorials
│ └── *.sh # Shell scripts for job automation
├── .git/ # Git repository metadata
└── README.md # Course overview and instructions
```

---

## Getting Started

### Prerequisites
- Python 3.8+
- Jupyter Notebook
- pandas, openpyxl, and SQL-related libraries
- Basic understanding of Python and SQL

### Installation
1. Clone this repository:
```bash
git clone https://github.com/your-repo-url.git
```
2. Install dependencies:
```bash
pip install -r requirements.txt
```

### Running Notebooks
1. Navigate to a chapter folder:
```bash
cd Chapter_1
```
2. Open the desired notebook:
```bash
jupyter notebook 01_03_end.ipynb
```

---

## Course Content
1. **Chapter 1**: Understanding ETL and exploring data with Python
2. **Chapter 2**: Data extraction and transformation using pandas and SQL
3. **Chapter 3**: Validating and modeling data in relational systems
4. **Chapter 4**: Automating and scheduling ETL workflows

---

## License
This repository is for educational purposes only. All materials are copyrighted by Jennifer Ebe.

---