https://github.com/nabilshadman/etl-python-sql-airflow
Build ETL pipelines using Python, Pandas, and SQL to extract, transform, and load data from sources, with exercises and job scheduling practices
https://github.com/nabilshadman/etl-python-sql-airflow
airflow data-engineering data-pipeline etl python sql
Last synced: about 2 months ago
JSON representation
Build ETL pipelines using Python, Pandas, and SQL to extract, transform, and load data from sources, with exercises and job scheduling practices
- Host: GitHub
- URL: https://github.com/nabilshadman/etl-python-sql-airflow
- Owner: nabilshadman
- Created: 2025-01-21T08:56:08.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2025-01-21T09:19:39.000Z (over 1 year ago)
- Last Synced: 2025-03-21T04:15:01.127Z (over 1 year ago)
- Topics: airflow, data-engineering, data-pipeline, etl, python, sql
- Language: Jupyter Notebook
- Homepage:
- Size: 417 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# ETL in Python and SQL
## Course Overview
This repository contains the course materials for [**ETL in Python and SQL**](https://www.linkedin.com/learning/etl-in-python-and-sql/), taught by Jennifer Ebe, a data engineer with over 5 years of experience. The course is designed to help you build robust systems that gather, transform, and store data efficiently for actionable insights. It focuses on:
- **Extracting Data**: Using Python to gather data from various sources.
- **Transforming Data**: Leveraging tools like pandas and SQL to explore, clean, and standardize data.
- **Loading Data**: Storing processed data in target systems.
- **Scheduling Jobs**: Automating ETL workflows with Python.
Each chapter includes hands-on challenges to reinforce learning through practical experience.
---
## Repository Structure
```plaintext
etl-python-sql-airflow/
├── Chapter_1/ # Introduction to ETL concepts and Python basics
│ ├── *.ipynb # Jupyter notebooks for code demonstrations
│ └── sample_data.csv # Sample data for exercises
├── Chapter_2/ # Extracting and transforming data
│ ├── *.ipynb # Notebooks for practical ETL tasks
│ └── *.xlsx, *.csv # Example datasets
├── Chapter_3/ # Data validation and relational modeling
│ └── *.ipynb # Advanced ETL tasks and data checks
├── Chapter_4/ # Scheduling and automation
│ ├── *.ipynb # Scheduling tutorials
│ └── *.sh # Shell scripts for job automation
├── .git/ # Git repository metadata
└── README.md # Course overview and instructions
```
---
## Getting Started
### Prerequisites
- Python 3.8+
- Jupyter Notebook
- pandas, openpyxl, and SQL-related libraries
- Basic understanding of Python and SQL
### Installation
1. Clone this repository:
```bash
git clone https://github.com/your-repo-url.git
```
2. Install dependencies:
```bash
pip install -r requirements.txt
```
### Running Notebooks
1. Navigate to a chapter folder:
```bash
cd Chapter_1
```
2. Open the desired notebook:
```bash
jupyter notebook 01_03_end.ipynb
```
---
## Course Content
1. **Chapter 1**: Understanding ETL and exploring data with Python
2. **Chapter 2**: Data extraction and transformation using pandas and SQL
3. **Chapter 3**: Validating and modeling data in relational systems
4. **Chapter 4**: Automating and scheduling ETL workflows
---
## License
This repository is for educational purposes only. All materials are copyrighted by Jennifer Ebe.
---