Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/antoinegiraud/dbt_bixi_opendata
dbt model & transformation on Montreal's bixi bikeshare opendata (rentals & GBFS)
https://github.com/antoinegiraud/dbt_bixi_opendata
Last synced: 4 days ago
JSON representation
dbt model & transformation on Montreal's bixi bikeshare opendata (rentals & GBFS)
- Host: GitHub
- URL: https://github.com/antoinegiraud/dbt_bixi_opendata
- Owner: AntoineGiraud
- Created: 2024-08-22T09:28:42.000Z (5 months ago)
- Default Branch: main
- Last Pushed: 2025-01-20T12:36:43.000Z (4 days ago)
- Last Synced: 2025-01-20T13:45:10.349Z (4 days ago)
- Size: 4.95 MB
- Stars: 2
- Watchers: 1
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Bixi's OpenData Modelisation
Here is a [dbt-core](https://github.com/dbt-labs/dbt-core) project that loads & transform [bixi OpenData](https://bixi.com/fr/donnees-ouvertes/) thanks to [DuckDB](https://duckdb.org/) π¦π
### Viz' exploration
I used Power BI to explore the transformed data offloaded to `.parquet` *(~ 4.7 times lighter than `.csv`)*
After the pandemic, Montrealers realy went back to bixi π₯³
![Explore MontrΓ©al bixi rentals with Power BI](./montreal_bixi_rentals.png)
## Data sources
### Bixi Rentals OpenData ([link](https://bixi.com/fr/donnees-ouvertes/))
- π² **Rentals V1** : from 2014 to 2021
> for station info, join to station
yearly file with station_code
- β½ **Stations V1** : from 2014 to 2021
> 1 station code per year
- π² **Rentals V2** : from 2022 to 2024+
> start/end station info on each rentals\
*-> 2.7 times heavier `.csv`* 1.4Gb -> 0.5Gb\
*-> 2.3 times heavier `.parquet` 250Mb -> 106Mb*### GIS referential
- π§ **Municipal sectors** : from the OD 2013 survey (cf. [donnees quebec](https://www.donneesquebec.ca/recherche/dataset/artm-secteurs-municipaux-od13/resource/95ab084b-727e-4322-9433-0fed7baa690d))
### GBFS scrapping (one day)
> GBFS means *General Bikeshare Feed Specification*, it's a standardized data feed for shared mobility system availability (cf. [Github > MobilityData/gbfs](https://github.com/MobilityData/gbfs))
#### Max Halford's GBFS scrapping
Max Halford launch a web scrapping on 76 bikeshares around the globe at summer 2023. (cf. it's [bike sharing forecasting training set](https://maxhalford.github.io/blog/bike-sharing-forecasting-training-set/) article)
MontrΓ©al was added at the end of spring.To be added to rework & explore those bixi's station avalability overtime
```sql
-- example fetch toulouse station_status π¦
SET s3_endpoint='storage.googleapis.com';
FROM READ_PARQUET('s3://bike-sharing-history/toulouse/**/*.parquet');
```## Schema/DB steps :
- **raw** : raw tables loaded as is from .csv
- **stg** : intermediate tables
- **dtm** : tables ready for analytics & reporting use![dbt lineage](./dbt_lineage.png)
if needed : πΌ [DBeaver MLD](./dbeaver_table_mld.png)
### Loading
DuckDB realy shines by it's speed & local OLAP capabilities π
Here is π² v1 rentals (2014 - 2021) load & offload to .parquet
- `.csv` is **4.5** times heavier than `.parquet`
- `.json` is **2.7** times heavier than `.csv`![bixi rentals loading with DuckDB ππ¦](./load_and_offload.png)