An open API service indexing awesome lists of open source software.

https://github.com/m77rahman/uk-transit-weather

TfL + Open-Meteo ETL → DuckDB → Streamlit. Hourly GitHub Actions.
https://github.com/m77rahman/uk-transit-weather

data-engineering duckdb etl open-meteo python streamlit tfl

Last synced: about 1 month ago
JSON representation

TfL + Open-Meteo ETL → DuckDB → Streamlit. Hourly GitHub Actions.

Awesome Lists containing this project

README

          

# UK Transit + Weather

## Overview

This project is a small end-to-end data workflow that combines live transit and weather data into a structured reporting pipeline.

It pulls data from:
- **TfL API** for transit information
- **Open-Meteo API** for hourly weather data

The workflow then:
- extracts and cleans the data
- applies basic validation and reliability checks
- stores the data in **DuckDB**
- presents the outputs through a **Streamlit dashboard**
- runs on a schedule using **GitHub Actions**

The project was built to strengthen practical skills in ETL-style thinking, API integration, structured data handling, dashboard delivery, and workflow reliability.

---

## Project Goal

The goal of this project is to show how live external data can be collected, transformed, stored, and presented in a way that is structured and repeatable.

Rather than building a one-off script, this project was designed as a lightweight data workflow with:
- clear project structure
- scheduled execution
- validation and error handling
- dashboard output
- tests and documentation

---

## Data Sources

### TfL API
Used to retrieve transit-related data.

### Open-Meteo API
Used to retrieve hourly weather data.

---

## Workflow

### 1. Extract
The pipeline requests data from the TfL and Open-Meteo APIs.

### 2. Transform
The data is cleaned and structured into a more consistent format for downstream use.

### 3. Validate
Basic checks are applied to improve reliability and reduce the chance of poor-quality outputs.

Examples include:
- missing value checks
- consistency checks
- handling failed or incomplete API responses

### 4. Load
The cleaned data is stored in **DuckDB** for structured querying and dashboard use.

### 5. Present
A **Streamlit dashboard** displays the processed outputs in a user-friendly way.

### 6. Automate
The workflow is scheduled through **GitHub Actions** so it can run on a repeatable basis.

---

## Tech Stack

- **Python**
- **TfL API**
- **Open-Meteo API**
- **DuckDB**
- **Streamlit**
- **GitHub Actions**
- **Pytest** (for testing)

---

## Repository Structure

```text
uk-transit-weather/

├── .github/workflows/ # Scheduled GitHub Actions workflow
├── src/ # Pipeline and application source code
├── tests/ # Test files
├── .env.example # Example environment variables
├── requirements.txt # Project dependencies
└── README.md # Project documentation