https://github.com/airscholar/footballdataengineering
An end-to-end data engineering pipeline that fetches data from Wikipedia, cleans and transforms it with Apache Airflow and saves it on Azure Data Lake. Other processing takes place on Azure Data Factory, Azure Synapse and Tableau.
https://github.com/airscholar/footballdataengineering
apache-airflow azure-data-factory azure-data-lake-gen2 azure-databricks azure-synapse-analytics data-engineering dataengineering
Last synced: about 1 year ago
JSON representation
An end-to-end data engineering pipeline that fetches data from Wikipedia, cleans and transforms it with Apache Airflow and saves it on Azure Data Lake. Other processing takes place on Azure Data Factory, Azure Synapse and Tableau.
- Host: GitHub
- URL: https://github.com/airscholar/footballdataengineering
- Owner: airscholar
- Created: 2023-10-02T19:11:25.000Z (over 2 years ago)
- Default Branch: main
- Last Pushed: 2023-10-02T19:22:44.000Z (over 2 years ago)
- Last Synced: 2025-03-24T02:21:58.018Z (about 1 year ago)
- Topics: apache-airflow, azure-data-factory, azure-data-lake-gen2, azure-databricks, azure-synapse-analytics, data-engineering, dataengineering
- Language: Python
- Homepage: https://www.youtube.com/watch?v=tKIXUqz17W8
- Size: 469 KB
- Stars: 22
- Watchers: 2
- Forks: 19
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Football Data Engineering
This Python-based project crawls data from Wikipedia using Apache Airflow, cleans it and pushes it Azure Data Lake for processing.
## Table of Contents
1. [System Architecture](#system-architecture)
2. [Requirements](#requirements)
3. [Getting Started](#getting-started)
4. [Running the Code With Docker](#running-the-code-with-docker)
5. [How It Works](#how-it-works)
6. [Video](#video)
## System Architecture

## Requirements
- Python 3.9 (minimum)
- Docker
- PostgreSQL
- Apache Airflow 2.6 (minimum)
## Getting Started
1. Clone the repository.
```bash
git clone https://github.com/airscholar/FootballDataEngineering.git
```
2. Install Python dependencies.
```bash
pip install -r requirements.txt
```
## Running the Code With Docker
1. Start your services on Docker with
```bash
docker compose up -d
```
2. Trigger the DAG on the Airflow UI.
## How It Works
1. Fetches data from Wikipedia.
2. Cleans the data.
3. Transforms the data.
4. Pushes the data to Azure Data Lake.
## Video
[](https://www.youtube.com/watch?v=tKIXUqz17W8)