https://github.com/an4pdm/sportstats-pipeline
SportStats Pipeline é um projeto de ETL que coleta, processa e armazena dados de eventos esportivos diários utilizando a API TheSportsDB. O pipeline é orquestrado com Apache Airflow e armazena os dados em um banco de dados MySQL.
https://github.com/an4pdm/sportstats-pipeline
airflow airflow-dags api etl mysql pandas pipeline python
Last synced: 3 days ago
JSON representation
SportStats Pipeline é um projeto de ETL que coleta, processa e armazena dados de eventos esportivos diários utilizando a API TheSportsDB. O pipeline é orquestrado com Apache Airflow e armazena os dados em um banco de dados MySQL.
- Host: GitHub
- URL: https://github.com/an4pdm/sportstats-pipeline
- Owner: An4PDM
- Created: 2025-02-14T19:52:39.000Z (8 months ago)
- Default Branch: main
- Last Pushed: 2025-02-14T19:54:06.000Z (8 months ago)
- Last Synced: 2025-02-14T20:29:03.598Z (8 months ago)
- Topics: airflow, airflow-dags, api, etl, mysql, pandas, pipeline, python
- Language: Python
- Homepage:
- Size: 0 Bytes
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# SportStats Pipeline 📊 (in progress)
SportStats Pipeline is a data pipeline project designed to collect, process, and store sports-related data. This project aims to automate the extraction and transformation of sports events, results, and statistics, providing structured and reliable datasets for analysis.
## 🚀 Features
- ⏱️ Daily automated extraction of sports events
- 🔄 ETL process (Extract, Transform, Load)
- 💾 Storage of structured data in a database or file system
- 🧹 Data cleaning and transformation logic
- 🔁 Scheduled execution with Apache Airflow## 🛠️ Technologies Used
- **Python**
- **Apache Airflow**
- **Pandas / NumPy**
- **Requests**
- **MySQL** (adaptable)
- **TheSportsDB API** *(or other public sports API)*## ⚙️ How to Run
1. Clone the repository:
```bash
git clone https://github.com/An4PDM/SportStats-Pipeline.git
cd SportStats-Pipeline
```2. Create and activate a virtual environment:
```bash
python -m venv venv
source venv/bin/activate # or venv\Scripts\activate on Windows
```3. Install dependencies:
```bash
pip install -r requirements.txt
```4. Set up and start Apache Airflow (basic setup):
```bash
export AIRFLOW_HOME=$(pwd)/airflow
airflow db init
airflow users create \
--username admin \
--password admin \
--firstname First \
--lastname Last \
--role Admin \
--email admin@example.com
airflow webserver --port 8080
airflow scheduler
```
5. Open http://localhost:8080 and trigger the sports DAG.## Tests
You can test individual components (e.g., scripts in scripts/) by running them directly:
```bash
python scripts/extract.py
python scripts/transform.py
```## 📄 License
This project is licensed under the MIT License - see the LICENSE file for details.