Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/robinlovelace/foss4g22

Paper testing different jittering and routing options for converting OD data to route networks
https://github.com/robinlovelace/foss4g22

openstreetmap origin-destination r reproducible-research routing-engine transport-model transport-network transportation-planning

Last synced: about 2 months ago
JSON representation

Paper testing different jittering and routing options for converting OD data to route networks

Awesome Lists containing this project

README

        

---
title: "Exploring jittering and routing options for converting origin-destination data into route networks: towards accurate estimates of movement at the street level"
bibliography: foss4g2022.bib
author: Robin Lovelace, Rosa Félix, Dustin Carlino
output: github_document
# output:
# bookdown::pdf_document2:
# template: ISPRStemplate.tex
# keep_tex: true
editor_options:
markdown:
wrap: sentence
csl: ispr-from-harvard.csl
---

```{r, eval=FALSE, echo=FALSE}
unzip("ISPRSguidelines_authors_fullpaper_latex_2021_09_09.zip")
tinytex::pdflatex("ISPRSguidelines_authors_fullpaper.tex")
rmarkdown::render("README.Rmd")

file.rename("README.pdf", "foss4g-paper-jittering.pdf")
browseURL("foss4g-paper-jittering.pdf")
piggyback::pb_upload("foss4g-paper-jittering.pdf")
system("gh release upload v1 foss4g-paper-jittering.pdf --clobber")
system("gh release download 1")
rbbt::bbt_update_bib(path_rmd = "README.Rmd", path_bib = "foss4g2022.bib")
file.edit("README.tex")
```

```{r, include=FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
echo = FALSE,
message = FALSE,
cache = TRUE,
warning = FALSE,
fig.align = "center"
# eval = FALSE
)
```

```{r, include=FALSE}
# devtools::install_github("itsleeds/od")
library(sf)
library(tmap)
library(tidyverse)
library(stplanr)
library(cyclestreets)
# rbbt::bbt_update_bib(path_rmd = "README.Rmd", path_bib = "foss4g2022.bib")
```

Note: this has been submitted to the academic track of FOSS4G. See https://osf.io/4yxj7/ for the preprint.

# Introduction

Origin-destination (OD) datasets provide information on aggregate travel patterns between zones and geographic entities, and can be obtained from a wide range of sources making them one of the most commonly used geographic inputs in applied transport planning [@alexander_validation_2015].
OD datasets are often 'implicitly geographic', containing identification codes of the geographic objects from which trips start and end.
Exact coordinates of origins and destinations are provided in this way for good reasons: historically computational resources constrained analysis options, meaning that data reduction (by converting thousands of travel survey responses into a more compact aggregate OD dataset) was important; and privacy considerations prevent the disclosure of exact trip start and end points [@boyce_forecasting_2015].

A common approach to converting OD datasets to geographic entities, for example represented using the simple features standard [@ogcopengeospatialconsortiuminc_opengis_2011] and saved in file formats such as GeoPackage and GeoJSON, is to represent each OD record as a straight line between zone centroids.
This approach to representing OD datasets on the map has been since at least the 1950s [@boyce_forecasting_2015] and --- despite the development of various methods to add value to OD datasets by sampling start and end points and 'connectors' withing each zone [@lovelace_jittering_2022b], discussed below --- centroid-based geographic representations of OD data are still dominant [@rae_spatial_2009; @tennekes_design_2021].
Before explaining the methods, it is worth defining terms:

- **Origins**: locations of trip departure, typically stored as ID codes linking to zones

- **Destinations**: trip destinations, also stored as ID codes linking to zones

- **Attributes**: the number of trips made between each 'OD pair' and additional attributes such as route distance between each OD pair

- **Jittering**: The combined process of 'splitting' OD pairs representing many trips into multiple 'sub OD' pairs (disaggregation) and assigning origins and destinations to multiple unique points within each zone

Beyond simply visualising aggregate travel patterns, centroid-based geographic desire lines are also used as the basis of many transport modelling processes.
The following steps can be used to convert OD datasets into route networks, in a process that can generate nationally scalable results [@morgan_travel_2020]:

- OD data converted into centroid-based geographic desire lines

- Calculation of routes for each desire line, with start and end points at zone centroids

- Aggregation of routes into route networks, with values on each segment representing the total amount of travel ('flow') on that part of the network, using functions such as `overline()` in the open source R package `stplanr` [@lovelace_stplanr_2018]

This approach is tried and tested:
the OD $\rightarrow$ desire line $\rightarrow$ route $\rightarrow$ route network processing pipeline forms the basis of the route network results in the Propensity to Cycle Tool, an open source and publicly available map-based web application for informing strategic cycle network investment, 'visioning' and prioritisation [@lovelace_propensity_2017; @goodman_scenarios_2019].
However, the approach has some key limitations:

- Flows are concentrated on transport network segments leading to zone centroids, creating distortions in the results and preventing the simulation of the diffuse networks that are particularly important for walking and cycling

- The results are highly dependent on the size and shape of geographic zones used to define OD data

- The approach is inflexible, providing few options to people who want to use valuable OD datasets in different ways

To overcome these limitations, methods of 'jittering' OD data have been developed [@lovelace_jittering_2022b].
While the results from analysis of route networks generated from jittered OD data in that paper were promising, the input datasets were small and technique was not evaluated with reference to ground truth data.
This raised the question "Are the jittered results measurably better when compared with counter datasets on the network?" [@lovelace_jittering_2022b].

This question was partially addressed during a presentation and subsequent proceedings published as part of the GISRUK conference [@lovelace_assessing_2022].
However, the input dataset used for that conference paper was small and overly focussed on Edinburgh.
Furthermore, only a single routing option was used, raising the question:
what is the relative importance of geographic OD data pre-processing (jittering) and routing options when preparing route networks to support strategic sustainable transport plans?
We set out to address this question in this paper.

## Software and reproducibility

In this paper present results generated using the `odjitter` Rust crate.
We developed an interface to R in the `odjitter` R package (not on CRAN at the time of writing) that can form the basis of a implementations in other languages that interface with the highly efficient Rust implementation.
The results presented in this paper are fully reproducible.
See the paper's GitHub repository at https://github.com/Robinlovelace/foss4g22/ for implementation details and to reproduce the results.

# Approach

## Jittering

Jittering represents a comparatively simple --- compared with 'connector' based methods [@jafari_investigation_2015] --- approach is to OD data preprocessing.
For each OD pair, the jittering approach consists of the following steps for each OD pair (provided it has required inputs of a disaggregation threshold, a single number greater than one, and sub-points from which origin and destination points are located):

1. Checks if the number of trips (for a given 'disaggregation key', e.g. 'walking') is greater than the disaggregation threshold.
2. If so, the OD pair is disaggregated. This means being divided into as many pieces ('sub-OD pairs') as is needed, with trip counts divided by the number of sub-OD pairs, for the total to be below the disaggregation threshold.
3. For each sub-OD pair (or each original OD pair if no disaggregation took place) origin and destination locations are randomly sampled from sub-points which optionally have weights representing relative probability of trips starting and ending there.

This approach has been implemented efficiently in the Rust crate `odjitter`, the source code of which can be found at .

## Case study

Lisbon, Portugal, is a city with about half million residents. By 2018, when a mobility survey was carried on, and only about 0.5% of trips were made by bicycle. However, the investments in cycling infrastructure, reaching 150 km of cycling network in 2021, and the implementation of a dock-based bike-sharing system had a major impact on cycling levels [@felix_build_2020a].

Cyclists’ counts are performed yearly from 2017 to 2021 at more than 65 locations in Lisbon during morning and afternoon peak hours (8-10 am and 5-7 pm). In 2021, these were carried out in October.
The 67 locations, shown in Figure \ref{lisbonmap}, were chosen considering to the existent and planned
cycling infrastructure, and places where there was no cycling infrastructure, but had already some presence
of cyclists.

```{r lisbon1, include=FALSE, cache=TRUE, fig.cap="Illustration of jittered (left) compared with unjittered (right) origin-destination data.", out.width="100%"}
od_all = readRDS(url("https://github.com/U-Shift/biclar/releases/download/0.0.1/TRIPSmode_freguesias.Rds"))
zones = readRDS(url("https://github.com/U-Shift/biclar/releases/download/0.0.1/FREGUESIASgeo.Rds"))
osm_data_region = readRDS(url("https://github.com/U-Shift/biclar/releases/download/0.0.1/osm_data_region.Rds"))
lisbon_limit = st_read("Lisboa_limite.gpkg") %>% st_transform(4326)
```

```{r include=FALSE, cache=TRUE}
## For Lisbon only
lisbon_zones = zones %>% filter(Concelho == "Lisboa")
od_lisbon = od_all %>%
filter(DICOFREor11 %in% lisbon_zones$Dicofre & DICOFREde11 %in% lisbon_zones$Dicofre)
od_lisbon_with_bikes = od_lisbon %>% filter(Bike > 0)
od_lisbon_sf = od::od_to_sf(od_lisbon_with_bikes, lisbon_zones) #desire lines

set.seed(42)
od_lisbon_jittered = odjitter::jitter( #jitter
od = od_lisbon_with_bikes,
zones = lisbon_zones,
subpoints = osm_data_region,
disaggregation_key = "Total",
disaggregation_threshold = round(max(od_lisbon_with_bikes$Total) + 1) ##30? 50? 100?
)
od_lisbon_jittered_500 = odjitter::jitter( #jitter
od = od_lisbon_with_bikes,
zones = lisbon_zones,
subpoints = osm_data_region,
disaggregation_key = "Total",
disaggregation_threshold = 500 ##30? 50? 100?
)
od_lisbon_jittered_200 = odjitter::jitter( #jitter
od = od_lisbon_with_bikes,
zones = lisbon_zones,
subpoints = osm_data_region,
disaggregation_key = "Total",
disaggregation_threshold = 200 ##30? 50? 100?
)
# nrow(od_lisbon_jittered) # 9042 (17784 with 100 disagreg_thr)
```

```{r}
counters = readxl::read_excel("DadosAbertos_IST_CML_ContagensCiclistas_20172021.xlsx", sheet = "Out2021")
counters_sf = counters %>%
filter(TurnoNor %in% c("M1", "M2", "T1", "T2")) %>%
group_by(Ponto) %>%
summarise(SumCiclistas = sum(SumCiclistas, na.rm = TRUE), lon = mean(lon), lat = mean(lat)) %>%
select(-Ponto) %>%
sf::st_as_sf(coords = c("lon", "lat"), crs = 4326)
```

```{r lisbonmap, fig.cap="\\label{lisbonmap}Cycling infrastructure in Lisbon as October 2021 and location of cyclists' counters.", fig.ncol=2, out.width="100%"}
bikelanes = readRDS("Ciclovias2021out.Rds")
bikelanes = bikelanes %>% filter(TIPOLOGIA == "Ciclovia segregada") #do not show the sharrow ones with traffic
plot(lisbon_limit$geom, border="grey60")
plot(bikelanes, lwd = 0.6, col = "darkgreen", add = TRUE)
plot(counters_sf, col = "red", add = TRUE)
```

## Methods

We use data from a mobility survey [@IMOB] at district level (Lisbon has 24 districts), including `r round(sum(od_lisbon_with_bikes$Bike))` daily bicycle trips, represented by 122 desire lines.
Cycling count data includes `r as.integer(sum(counters_sf$SumCiclistas))` passings in the total of the 67 locations (one trip may pass at more than one location).

Routes were computed using [_CycleStreets_](https://cyclestreets.net), which relies on 2022 road network from OpenStreetMap, using the [`r5` engine](https://ipeagit.github.io/r5r/) [@pereira_r5r_2021], and using _Google Maps_ service, for routing comparison.
Routes were calculated using reproducible code available in the GitHub repo associated with this paper thanks to the `stplanr`, `r5r` and `cyclestreets` R packages that provide interfaces to these routing engines.

Regarding the routing options, CycleStreets provides 3 options of cycling routes: "fastest", "balanced" and "quietest", while r5r uses the Level of Traffic Stress (LTS), ranging from 1 --- less bicycle friendly, to 4 --- more bicycle friendly [@mekuria2012low]. Google Maps does not provide such profile options for bicycle routing.
In this research we compared CycleStreets' "quietest" and "fastest" modes, and LTS 2 and 4 [@mekuria2012low; @desjardins_correlates_2022].

This was an iterative process, an not all options were tested due to the computational requirements. We started by generating routes with CycleStreets for the 3 routing profiles and for unjittered, jittered with no disagregation, and jittered with disagregation level of 500 trips. Then we compared the results with routes generated by r5r, for 2 levels of traffic stress (2 and 4), and with routes generated by Google. Other jittering disagregation level of 200 trips was also compared with the previous results, for routes generated with CycleStreets ("quietest" profile) and for routes generated with r5r (LTS 2).

Results were then assessed. Count data was compared with the resulting route networks (with information on bike trips at each segment level, from the mobility survey data) by taking the value of the nearest segment, and using a R^2^ correlation fit.

# Results

We generated route networks based on a range of different jittering parameters and routing options.
The results presented in this section not only report estimates of model-counter fit but also provide indication of the type of networks generated, though route network maps.
Figures \ref{poltlisbon1}, \ref{poltlisbon2} and \ref{poltlisbon3} show the difference between desire lines with centroids approach and the jittering approach, for bike trips in Lisbon.

```{r jitteredoverview1, echo=FALSE, fig.cap="\\label{poltlisbon1}Trips represented with desire lines from centroids of 24 areas. The red circles represent the counters locations.", fig.ncol=2, out.width="100%"}
plot(lisbon_limit$geom, border="grey60")
plot(od_lisbon_sf$geometry, lwd = 0.2, add = TRUE)
plot(counters_sf, col = "red", add = TRUE)
```

```{r jitteredoverview2, echo=FALSE, fig.cap="\\label{poltlisbon2}Trips represented with jittered desire lines, with no disagregation.", fig.ncol=2, out.width="100%"}
plot(lisbon_limit$geom, border="grey60")
plot(od_lisbon_jittered$geometry, lwd = 0.2, add = TRUE)
plot(counters_sf, col = "red", add = TRUE)
```

```{r jitteredoverview3, echo=FALSE, fig.cap="\\label{poltlisbon3}Trips represented with jittered desire lines, with disagregation of 500 trips.", fig.ncol=2, out.width="100%"}
plot(lisbon_limit$geom, border="grey60")
plot(od_lisbon_jittered_500$geometry, lwd = 0.1, add = TRUE)
plot(counters_sf, col = "red", add = TRUE)
```

```{r, eval=FALSE, echo=FALSE}
# Routing unjittered:
routes_unjittered_quietest = route(l = od_lisbon_sf , route_fun = journey, plan = "quietest")
write_rds(routes_unjittered_quietest, "routes_unjittered_quietest.Rds")
routes_unjittered_balanced = route(l = od_lisbon_sf , route_fun = journey, plan = "balanced")
write_rds(routes_unjittered_balanced, "routes_unjittered_balanced.Rds")
routes_unjittered_fastest = route(l = od_lisbon_sf , route_fun = journey, plan = "fastest")
write_rds(routes_unjittered_fastest, "routes_unjittered_fastest.Rds")
# Routing unjittered:
routes_jittered_quietest = route(l = od_lisbon_jittered , route_fun = journey, plan = "quietest")
write_rds(routes_jittered_quietest, "routes_jittered_quietest.Rds")
routes_jittered_balanced = route(l = od_lisbon_jittered , route_fun = journey, plan = "balanced")
write_rds(routes_jittered_balanced, "routes_jittered_balanced.Rds")
routes_jittered_fastest = route(l = od_lisbon_jittered , route_fun = journey, plan = "fastest")
write_rds(routes_jittered_fastest, "routes_jittered_fastest.Rds")
# Routing jittered 500:
routes_jittered_500_quietest = route(l = od_lisbon_jittered_500 , route_fun = journey, plan = "quietest")
write_rds(routes_jittered_500_quietest, "routes_jittered_500_quietest.Rds")
routes_jittered_500_balanced = route(l = od_lisbon_jittered_500 , route_fun = journey, plan = "balanced")
write_rds(routes_jittered_500_balanced, "routes_jittered_500_balanced.Rds")
routes_jittered_500_fastest = route(l = od_lisbon_jittered_500 , route_fun = journey, plan = "fastest")
write_rds(routes_jittered_500_fastest, "routes_jittered_500_fastest.Rds")

routes_jittered_500_google = route(l = od_lisbon_jittered_500 , route_fun = stplanr::route_google, mode = "bicycling")
write_rds(routes_jittered_500_google, "routes_jittered_500_google.Rds")

#Routind jittered 200:
routes_jittered_200_quietest = route(l = od_lisbon_jittered_200 , route_fun = journey, plan = "quietest")
write_rds(routes_jittered_200_quietest, "routes_jittered_200_quietest.Rds")

```

```{r eval=FALSE, include=FALSE}
#routes with r5r
options(java.parameters = '-Xmx8G') #memory max 8GB
options(java.home="C:/Program Files/Java/jdk-11.0.11/")
library(r5r)
library(stplanr)

r5r_lts = setup_r5(data_path = "r5r_paper/", overwrite = TRUE) #includes osm from june 2022

#jittered routes with r5r, selection of LTS 2 and 4

od_lisbon_jittered_500_points = line2df(od_lisbon_jittered_500)
od_lisbon_jittered_500_OR = od_lisbon_jittered_500_points[,c(1,2,3)]
names(od_lisbon_jittered_500_OR) = c("id", "lon", "lat")
od_lisbon_jittered_500_DE = od_lisbon_jittered_500_points[,c(1,4,5)]
names(od_lisbon_jittered_500_DE) = c("id", "lon", "lat")

od_lisbon_jittered_500_r5r = od_lisbon_jittered_500
od_lisbon_jittered_500_r5r$id = 1:nrow(od_lisbon_jittered_500_r5r)

routes_jittered_500_lts1 = detailed_itineraries(
r5r_lts,
origins = od_lisbon_jittered_500_OR,
destinations = od_lisbon_jittered_500_DE,
mode = "BICYCLE",
# mode_egress = "WALK",
# departure_datetime = Sys.time(),
# time_window = 1L,
# suboptimal_minutes = 0L,
fare_structure = NULL,
max_fare = Inf,
max_walk_time = Inf,
max_bike_time = Inf,
max_trip_duration = 180L, #in minutes
# walk_speed = 3.6,
bike_speed = 12,
# max_rides = 3,
max_lts = 1, #1 - quietest, 4 - hardcore
shortest_path = TRUE, #FALSE?
all_to_all = FALSE,
n_threads = Inf,
verbose = FALSE,
progress = TRUE,
drop_geometry = FALSE,
output_dir = NULL
)
routes_jittered_500_lts1 = routes_jittered_500_lts1 %>% mutate(id = as.integer(from_id)) %>%
select(id, total_duration, total_distance, route) %>%
left_join(od_lisbon_jittered_500_r5r %>% st_drop_geometry(), by="id")
routes_jittered_500_lts1 = sf::st_as_sf(
as.data.frame(sf::st_drop_geometry(routes_jittered_500_lts1)),
geometry = routes_jittered_500_lts1$geometry
)
write_rds(routes_jittered_500_lts1, "routes_jittered_500_lts1.Rds")

routes_jittered_500_lts2 = detailed_itineraries(
r5r_lts,
origins = od_lisbon_jittered_500_OR,
destinations = od_lisbon_jittered_500_DE,
mode = "BICYCLE",
fare_structure = NULL,
max_fare = Inf,
max_walk_time = Inf,
max_bike_time = Inf,
max_trip_duration = 180L, #in minutes
bike_speed = 12,
max_lts = 2, #1 - quietest, 4 - hardcore
shortest_path = TRUE, #FALSE?
all_to_all = FALSE,
n_threads = Inf,
verbose = FALSE,
progress = TRUE,
drop_geometry = FALSE,
output_dir = NULL
)
routes_jittered_500_lts2 = routes_jittered_500_lts2 %>% mutate(id = as.integer(from_id)) %>%
select(id, total_duration, total_distance, route) %>%
left_join(od_lisbon_jittered_500_r5r %>% st_drop_geometry(), by="id")
routes_jittered_500_lts2 = sf::st_as_sf(
as.data.frame(sf::st_drop_geometry(routes_jittered_500_lts2)),
geometry = routes_jittered_500_lts2$geometry
)
write_rds(routes_jittered_500_lts2, "routes_jittered_500_lts2.Rds")

routes_jittered_500_lts3 = detailed_itineraries(
r5r_lts,
origins = od_lisbon_jittered_500_OR,
destinations = od_lisbon_jittered_500_DE,
mode = "BICYCLE",
fare_structure = NULL,
max_fare = Inf,
max_walk_time = Inf,
max_bike_time = Inf,
max_trip_duration = 180L, #in minutes
bike_speed = 12,
max_lts = 3, #1 - quietest, 4 - hardcore
shortest_path = TRUE, #FALSE?
all_to_all = FALSE,
n_threads = Inf,
verbose = FALSE,
progress = TRUE,
drop_geometry = FALSE,
output_dir = NULL
)
routes_jittered_500_lts3 = routes_jittered_500_lts3 %>% mutate(id = as.integer(from_id)) %>%
select(id, total_duration, total_distance, route) %>%
left_join(od_lisbon_jittered_500_r5r %>% st_drop_geometry(), by="id")
routes_jittered_500_lts3 = sf::st_as_sf(
as.data.frame(sf::st_drop_geometry(routes_jittered_500_lts3)),
geometry = routes_jittered_500_lts3$geometry
)
write_rds(routes_jittered_500_lts3, "routes_jittered_500_lts3.Rds")

routes_jittered_500_lts4 = detailed_itineraries(
r5r_lts,
origins = od_lisbon_jittered_500_OR,
destinations = od_lisbon_jittered_500_DE,
mode = "BICYCLE",
fare_structure = NULL,
max_fare = Inf,
max_walk_time = Inf,
max_bike_time = Inf,
max_trip_duration = 180L, #in minutes
bike_speed = 12,
max_lts = 4, #1 - quietest, 4 - hardcore
shortest_path = TRUE, #FALSE?
all_to_all = FALSE,
n_threads = Inf,
verbose = FALSE,
progress = TRUE,
drop_geometry = FALSE,
output_dir = NULL
)
routes_jittered_500_lts4 = routes_jittered_500_lts4 %>% mutate(id = as.integer(from_id)) %>%
select(id, total_duration, total_distance, route) %>%
left_join(od_lisbon_jittered_500_r5r %>% st_drop_geometry(), by="id")
routes_jittered_500_lts4 = sf::st_as_sf(
as.data.frame(sf::st_drop_geometry(routes_jittered_500_lts4)),
geometry = routes_jittered_500_lts4$geometry
)
write_rds(routes_jittered_500_lts4, "routes_jittered_500_lts4.Rds")

# unjittered routes with r5r, selection of LTS 2 and 4

od_lisbon_unjittered_points = line2df(od_lisbon_with_bikes)
od_lisbon_unjittered_OR = od_lisbon_unjittered_points[,c(1,2,3)]
names(od_lisbon_unjittered_OR) = c("id", "lon", "lat")
od_lisbon_unjittered_DE = od_lisbon_unjittered_points[,c(1,4,5)]
names(od_lisbon_unjittered_DE) = c("id", "lon", "lat")

od_lisbon_unjittered_r5r = od_lisbon_with_bikes
od_lisbon_unjittered_r5r$id = 1:nrow(od_lisbon_unjittered_r5r)

routes_unjittered_lts2 = detailed_itineraries(
r5r_lts,
origins = od_lisbon_unjittered_OR,
destinations = od_lisbon_unjittered_DE,
mode = "BICYCLE",
fare_structure = NULL,
max_fare = Inf,
max_walk_time = Inf,
max_bike_time = Inf,
max_trip_duration = 180L, #in minutes
bike_speed = 12,
max_lts = 2, #1 - quietest, 4 - hardcore
shortest_path = TRUE, #FALSE?
all_to_all = FALSE,
n_threads = Inf,
verbose = FALSE,
progress = TRUE,
drop_geometry = FALSE,
output_dir = NULL
)
routes_unjittered_lts2 = routes_unjittered_lts2 %>% mutate(id = as.integer(from_id)) %>%
select(id, total_duration, total_distance, route) %>%
left_join(od_lisbon_unjittered_r5r %>% st_drop_geometry(), by="id")
routes_unjittered_lts2 = sf::st_as_sf(
as.data.frame(sf::st_drop_geometry(routes_unjittered_lts2)),
geometry = routes_unjittered_lts2$geometry
)
write_rds(routes_unjittered_lts2, "routes_unjittered_lts2.Rds")

routes_unjittered_lts4 = detailed_itineraries(
r5r_lts,
origins = od_lisbon_unjittered_OR,
destinations = od_lisbon_unjittered_DE,
mode = "BICYCLE",
fare_structure = NULL,
max_fare = Inf,
max_walk_time = Inf,
max_bike_time = Inf,
max_trip_duration = 180L, #in minutes
bike_speed = 12,
max_lts = 4, #1 - quietest, 4 - hardcore
shortest_path = TRUE, #FALSE?
all_to_all = FALSE,
n_threads = Inf,
verbose = FALSE,
progress = TRUE,
drop_geometry = FALSE,
output_dir = NULL
)
routes_unjittered_lts4 = routes_unjittered_lts4 %>% mutate(id = as.integer(from_id)) %>%
select(id, total_duration, total_distance, route) %>%
left_join(od_lisbon_unjittered_r5r %>% st_drop_geometry(), by="id")
routes_unjittered_lts4 = sf::st_as_sf(
as.data.frame(sf::st_drop_geometry(routes_unjittered_lts4)),
geometry = routes_unjittered_lts4$geometry
)
write_rds(routes_unjittered_lts4, "routes_unjittered_lts4.Rds")

#jittered routes with r5r for 200, selection of LTS 2

od_lisbon_jittered_200_points = line2df(od_lisbon_jittered_200)
od_lisbon_jittered_200_OR = od_lisbon_jittered_200_points[,c(1,2,3)]
names(od_lisbon_jittered_200_OR) = c("id", "lon", "lat")
od_lisbon_jittered_200_DE = od_lisbon_jittered_200_points[,c(1,4,5)]
names(od_lisbon_jittered_200_DE) = c("id", "lon", "lat")

od_lisbon_jittered_200_r5r = od_lisbon_jittered_200
od_lisbon_jittered_200_r5r$id = 1:nrow(od_lisbon_jittered_200_r5r)

routes_jittered_200_lts2 = detailed_itineraries(
r5r_lts,
origins = od_lisbon_jittered_200_OR,
destinations = od_lisbon_jittered_200_DE,
mode = "BICYCLE",
fare_structure = NULL,
max_fare = Inf,
max_walk_time = Inf,
max_bike_time = Inf,
max_trip_duration = 180L, #in minutes
bike_speed = 12,
max_lts = 2, #1 - quietest, 4 - hardcore
shortest_path = TRUE, #FALSE?
all_to_all = FALSE,
n_threads = Inf,
verbose = FALSE,
progress = TRUE,
drop_geometry = FALSE,
output_dir = NULL
)
routes_jittered_200_lts2 = routes_jittered_200_lts2 %>% mutate(id = as.integer(from_id)) %>%
select(id, total_duration, total_distance, route) %>%
left_join(od_lisbon_jittered_200_r5r %>% st_drop_geometry(), by="id")
routes_jittered_200_lts2 = sf::st_as_sf(
as.data.frame(sf::st_drop_geometry(routes_jittered_200_lts2)),
geometry = routes_jittered_200_lts2$geometry
)
write_rds(routes_jittered_200_lts2, "routes_jittered_200_lts2.Rds")

```

```{r}
routes_unjittered_quietest = readRDS("routes_unjittered_quietest.Rds")
routes_unjittered_balanced = readRDS("routes_unjittered_balanced.Rds")
routes_unjittered_fastest = readRDS("routes_unjittered_fastest.Rds")
routes_jittered_quietest = readRDS("routes_jittered_quietest.Rds")
routes_jittered_balanced = readRDS("routes_jittered_balanced.Rds")
routes_jittered_fastest = readRDS("routes_jittered_fastest.Rds")
routes_jittered_500_quietest = readRDS("routes_jittered_500_quietest.Rds")
routes_jittered_500_balanced = readRDS("routes_jittered_500_balanced.Rds")
routes_jittered_500_fastest = readRDS("routes_jittered_500_fastest.Rds")
routes_jittered_500_lts2 = readRDS("routes_jittered_500_lts2.Rds")
routes_jittered_500_lts4 = readRDS("routes_jittered_500_lts4.Rds")
routes_unjittered_lts2 = readRDS("routes_unjittered_lts2.Rds")
routes_unjittered_lts4 = readRDS("routes_unjittered_lts4.Rds")
routes_jittered_500_google = readRDS("routes_jittered_500_google.Rds")
routes_jittered_200_quietest = readRDS("routes_jittered_200_quietest.Rds")
routes_jittered_200_lts2 = readRDS("routes_jittered_200_lts2.Rds")

rnet_unjittered_quietest = overline(routes_unjittered_quietest, attrib = "Bike")
rnet_unjittered_balanced = overline(routes_unjittered_balanced, attrib = "Bike")
rnet_unjittered_fastest = overline(routes_unjittered_fastest, attrib = "Bike")

rnet_jittered_quietest = overline(routes_jittered_quietest, attrib = "Bike")
rnet_jittered_balanced = overline(routes_jittered_balanced, attrib = "Bike")
rnet_jittered_fastest = overline(routes_jittered_fastest, attrib = "Bike")

rnet_jittered_500_quietest = overline(routes_jittered_500_quietest, attrib = "Bike")
rnet_jittered_500_balanced = overline(routes_jittered_500_balanced, attrib = "Bike")
rnet_jittered_500_fastest = overline(routes_jittered_500_fastest, attrib = "Bike")
rnet_jittered_200_quietest = overline(routes_jittered_200_quietest, attrib = "Bike")

rnet_jittered_500_lts2 = overline(routes_jittered_500_lts2, attrib = "Bike")
rnet_jittered_500_lts4 = overline(routes_jittered_500_lts4, attrib = "Bike")
rnet_unjittered_lts2 = overline(routes_unjittered_lts2, attrib = "Bike")
rnet_unjittered_lts4 = overline(routes_unjittered_lts4, attrib = "Bike")
rnet_jittered_200_lts2 = overline(routes_jittered_200_lts2, attrib = "Bike")

rnet_jittered_500_google = overline(routes_jittered_500_google, attrib = "Bike")
```

```{r, echo=FALSE}
# rnet_quiet = readRDS(url("https://github.com/U-Shift/biclar/releases/download/0.0.1/rnet_enmac_region_quietest_top_20000.Rds"))

counters_sf_joined = st_join(counters_sf,
rnet_unjittered_quietest %>% rename(Bikes_unjittered_quietest = Bike),
join = sf::st_nearest_feature)
counters_sf_joined = st_join(counters_sf_joined,
rnet_unjittered_balanced %>% rename(Bikes_unjittered_balanced = Bike),
join = sf::st_nearest_feature)
counters_sf_joined = st_join(counters_sf_joined,
rnet_unjittered_fastest %>% rename(Bikes_unjittered_fastest = Bike),
join = sf::st_nearest_feature)
counters_sf_joined = st_join(counters_sf_joined,
rnet_jittered_quietest %>% rename(Bikes_jittered_quietest = Bike),
join = sf::st_nearest_feature)
counters_sf_joined = st_join(counters_sf_joined,
rnet_jittered_balanced %>% rename(Bikes_jittered_balanced = Bike),
join = sf::st_nearest_feature)
counters_sf_joined = st_join(counters_sf_joined,
rnet_jittered_fastest %>% rename(Bikes_jittered_fastest = Bike),
join = sf::st_nearest_feature)
counters_sf_joined = st_join(counters_sf_joined,
rnet_jittered_500_quietest %>% rename(Bikes_jittered_500_quietest = Bike),
join = sf::st_nearest_feature)
counters_sf_joined = st_join(counters_sf_joined,
rnet_jittered_500_balanced %>% rename(Bikes_jittered_500_balanced = Bike),
join = sf::st_nearest_feature)
counters_sf_joined = st_join(counters_sf_joined,
rnet_jittered_500_fastest %>% rename(Bikes_jittered_500_fastest = Bike),
join = sf::st_nearest_feature)
counters_sf_joined = st_join(counters_sf_joined,
rnet_jittered_500_lts2 %>% rename(Bikes_jittered_500_lts2 = Bike),
join = sf::st_nearest_feature)
counters_sf_joined = st_join(counters_sf_joined,
rnet_jittered_500_lts4 %>% rename(Bikes_jittered_500_lts4 = Bike),
join = sf::st_nearest_feature)
counters_sf_joined = st_join(counters_sf_joined,
rnet_unjittered_lts2 %>% rename(Bikes_unjittered_lts2 = Bike),
join = sf::st_nearest_feature)
counters_sf_joined = st_join(counters_sf_joined,
rnet_unjittered_lts4 %>% rename(Bikes_unjittered_lts4 = Bike),
join = sf::st_nearest_feature)
counters_sf_joined = st_join(counters_sf_joined,
rnet_jittered_500_google %>% rename(Bikes_jittered_500_google = Bike),
join = sf::st_nearest_feature)
counters_sf_joined = st_join(counters_sf_joined,
rnet_jittered_200_quietest %>% rename(Bikes_jittered_200_quietest = Bike),
join = sf::st_nearest_feature)
counters_sf_joined = st_join(counters_sf_joined,
rnet_jittered_200_lts2 %>% rename(Bikes_jittered_200_lts2 = Bike),
join = sf::st_nearest_feature)
# # head(counters_sf_joined)
# corrplot::corrplot(counters_sf_joined %>% sf::st_drop_geometry())
# counters_sf_joined %>%
# sf::st_drop_geometry() %>%
# plot()
```

Figures \ref{map1}, \ref{map2}, \ref{map3} and \ref{map4} show examples of route networks from unjittered OD pairs, and jittered OD pairs with disagregation level of 500 trips, for differen routing providers, and the counters location.

```{r map1, echo=FALSE, message=FALSE, warning=FALSE, fig.ncol=2, out.width="100%", fig.cap="\\label{map1}Route network from unjittered desire lines, with routes from CycleStreets, in quietest routing option."}
library(tmap)
library(biclar)

#map with route network and couters location. rnet lwd = Bikes

tm_shape(rnet_unjittered_quietest) +
tmap::tm_lines(
id = NULL,
lwd = "Bike",
scale = 15,
col = "Bike",
palette = cols4all::c4a(palette = "mako") #choose a darker one!
) +
tm_shape(counters_sf) + tm_bubbles(
size = "SumCiclistas",
alpha = 0,
border.col = "red",
col = NA,
border.lwd = 1.5
)
```

```{r map2, echo=FALSE, message=FALSE, warning=FALSE, fig.ncol=2, out.width="100%", fig.cap="\\label{map2}Route network from jittered desire lines with disagregation of 500 trips, with routes from CycleStreets, in quietest routing option."}

tm_shape(rnet_jittered_500_quietest) +
tmap::tm_lines(
id = NULL,
lwd = "Bike",
scale = 15,
col = "Bike",
palette = cols4all::c4a(palette = "mako")
)+
tm_shape(counters_sf) + tm_bubbles(
size = "SumCiclistas",
alpha = 0,
border.col = "red",
col = NA,
border.lwd = 1.5
)
```

```{r map3, echo=FALSE, message=FALSE, warning=FALSE, fig.ncol=2, out.width="100%", fig.cap="\\label{map3}Route network from jittered desire lines with disagregation of 500 trips, with routes from r5r, level of traffic stress 2 (quiet) routing option."}

tm_shape(rnet_jittered_500_lts2) +
tmap::tm_lines(
id = NULL,
lwd = "Bike",
scale = 15,
col = "Bike",
palette = cols4all::c4a(palette = "mako")
)+
tm_shape(counters_sf) + tm_bubbles(
size = "SumCiclistas",
alpha = 0,
border.col = "red",
col = NA,
border.lwd = 1.5
)
```

```{r map4, echo=FALSE, message=FALSE, warning=FALSE, fig.ncol=2, out.width="100%", fig.cap="\\label{map4}Route network from jittered desire lines with disagregation of 500 trips, with routes from Google."}

tm_shape(rnet_jittered_500_google) +
tmap::tm_lines(
id = NULL,
lwd = "Bike",
scale = 15,
col = "Bike",
palette = cols4all::c4a(palette = "mako")
)+
tm_shape(counters_sf) + tm_bubbles(
size = "SumCiclistas",
alpha = 0,
border.col = "red",
col = NA,
border.lwd = 1.5
)
```

```{r map5, echo=FALSE, message=FALSE, warning=FALSE, fig.ncol=2, out.width="100%", fig.cap="\\label{map5}Route network from jittered desire lines with disagregation of 200 trips, with routes from r5r, level of traffic stress 2 (quiet) routing option."}

tm_shape(rnet_jittered_200_lts2) +
tmap::tm_lines(
id = NULL,
lwd = "Bike",
scale = 15,
col = "Bike",
palette = cols4all::c4a(palette = "mako")
)+
tm_shape(counters_sf) + tm_bubbles(
size = "SumCiclistas",
alpha = 0,
border.col = "red",
col = NA,
border.lwd = 1.5
)
```

When comparing the route network with unjittered desire lines (Figure \ref{map1}) with the jittered ones (Figures \ref{map2}, \ref{map3} and \ref{map4}), we may find that the route networks from jittered desire lines are more diffuse, and not concentrated in a few routes. For cycling and walking, this bring more realistic routes for this transport modes. Nevertheless, we are aware that routing options "quiet", and LTS 2 (quieter than LTS4), have a higher weight in using the existing cycling network infrastructure, and then the resulting route network can be similar to the cycling network silhouette (see Figure \ref{lisbonmap}). In fact, cyclists tend to opt for a cycling infrastructure when it is available, even if it compromises the directness of their trips [@Broach2012].
It is also noticed that "Fastest" and LTS4 routing option does not have a good fit with the counting data, when compared with the "Quietest" and LTS2.

Regarding the different disagregation levels, a route network build from a jittering disagregation of 200 trips is shown in Figure \ref{map5}, with a more diffuse network.

Although useful for visualizing the complex and spatially diffuse reality of travel patterns, we found that the most valuable use of jittering is as a pre-processing stage before routing and route network generation.
Route networks generated from jittered desire lines are more diffuse, and potentially more realistic, than centroid-based desire lines.

We also found that the approach, implemented in Rust and with bindings to R and Python (in progress), is fast.
Benchmarks show that the approach can 'jitter' desire lines representing millions of trips in a major city in less than a minute on consumer hardware.

We also found that the results of jittering depend on the geographic input datasets representing start points and trip attractors, and the use of weights.

Table \ref{tableresults} shows the counter data vs modeled route network fit, with different routing and jittering parameters. We can observe that jittered OD pairs provide a better fit result, with disagregation.

```{r}
results = tibble::tribble(
~`Jittering`, ~`Routing`, ~`Nrow`, ~`R-Squared`,
"Unjittered", "quietest", nrow(od_lisbon_sf), cor(counters_sf_joined$SumCiclistas, counters_sf_joined$Bikes_unjittered_quietest),
"Unjittered", "balanced", nrow(od_lisbon_sf), cor(counters_sf_joined$SumCiclistas, counters_sf_joined$Bikes_unjittered_balanced),
"Unjittered", "fastest", nrow(od_lisbon_sf), cor(counters_sf_joined$SumCiclistas, counters_sf_joined$Bikes_unjittered_fastest),
"Unjittered", "LTS2", nrow(od_lisbon_sf), cor(counters_sf_joined$SumCiclistas, counters_sf_joined$Bikes_unjittered_lts2),
"Unjittered", "LTS4", nrow(od_lisbon_sf), cor(counters_sf_joined$SumCiclistas, counters_sf_joined$Bikes_unjittered_lts4),
"Jittered, no disaggregation", "quietest", nrow(od_lisbon_jittered), cor(counters_sf_joined$SumCiclistas, counters_sf_joined$Bikes_jittered_quietest),
"Jittered, no disaggregation", "balanced", nrow(od_lisbon_jittered), cor(counters_sf_joined$SumCiclistas, counters_sf_joined$Bikes_jittered_balanced),
"Jittered, no disaggregation", "fastest", nrow(od_lisbon_jittered), cor(counters_sf_joined$SumCiclistas, counters_sf_joined$Bikes_jittered_fastest),
"Jittered, 500 disaggregation", "quietest", nrow(od_lisbon_jittered_500), cor(counters_sf_joined$SumCiclistas, counters_sf_joined$Bikes_jittered_500_quietest),
"Jittered, 500 disaggregation", "balanced", nrow(od_lisbon_jittered_500), cor(counters_sf_joined$SumCiclistas, counters_sf_joined$Bikes_jittered_500_balanced),
"Jittered, 500 disaggregation", "fastest", nrow(od_lisbon_jittered_500), cor(counters_sf_joined$SumCiclistas, counters_sf_joined$Bikes_jittered_500_fastest),
"Jittered, 500 disaggregation", "LTS2", nrow(od_lisbon_jittered_500), cor(counters_sf_joined$SumCiclistas, counters_sf_joined$Bikes_jittered_500_lts2),
"Jittered, 500 disaggregation", "LTS4", nrow(od_lisbon_jittered_500), cor(counters_sf_joined$SumCiclistas, counters_sf_joined$Bikes_jittered_500_lts4),
"Jittered, 500 disaggregation", "Google", nrow(od_lisbon_jittered_500), cor(counters_sf_joined$SumCiclistas, counters_sf_joined$Bikes_jittered_500_google),
"Jittered, 200 disaggregation", "quietest", nrow(od_lisbon_jittered_200), cor(counters_sf_joined$SumCiclistas, counters_sf_joined$Bikes_jittered_200_quietest),
"Jittered, 200 disaggregation", "LTS2", nrow(od_lisbon_jittered_200), cor(counters_sf_joined$SumCiclistas, counters_sf_joined$Bikes_jittered_200_lts2),
)
knitr::kable(results, digits = 2, booktabs = TRUE, caption = "\\label{tableresults}Results showing counter/model fit for route networks generated from different routing and jittering parameters",
linesep = c("", "", "","", "\\addlinespace","","", "\\addlinespace","", "", "","","", "\\addlinespace"))
```

A higher jittered disagregation level (200 trips) does not bring a better fit against a lower disagregation level of 500 trips. This might be explained but the routing profile used in the routing engines, and the location of the cycling counters --- most of them at the existing cycling infrastructure.
Although a more diffuse route network is expected in active transportation modes, the available data and computed routes are usually closer to where cycling infrastructure exists. Other data should be used to validate this hypothesis, such as a more diffuse cyclists' counters location, or/and the actual cyclist's routes --- for example, bike sharing trips routes, despite their access is not usually guaranteed for research purposed.

The results from our analysis suggest that investment in cycle infrastructure is particularly important in a few key locations where cycling potential is high yet provision is poor.
These locations are highlighted in Figure \@ref(fig:segments), which was generated using information from three key sources:

- Estimates of cycling potential, generated using the jittering $\rightarrow$ routing $\rightarrow$ route network methods presented in this paper.
- Estimates of quietness of links on the network, computed with the open source cyclestreets R package [@desjardins_correlates_2022].
- Local knowledge, which was used to visually inspect the resulting networks and identify key 'severance' points in the network [@mindell_chapter_2020].

```{r segments, fig.cap="Segments on the transport network of Lisbon where investment in new cycling infrastructure should be prioritised according to the route networks generated using methods presented in this paper, alongside local knowledge.", out.width="100%"}
knitr::include_graphics("figures/priority-segments.jpeg")
```

Figure \@ref(fig:segments) highlights the policy relevant nature of this research.
A key finding is that, combined with local knowledge and detailed data on existing transport infrastructure, which can be used to generate metrics such as Level of Traffic Stress (LTS) [@wang_does_2016] and Cycling Level of Service (CLoS) [@deegan_cycling_2015], route networks generated from jittered, disaggregated, and appropriated routed OD data can help prioritise investment where it is most needed.
Results were presented to stakeholders working in the local area who said that these new results would support their investment plans.

The overall result was the finding that OD jittering methods first developed by @lovelace_jittering_2022b are not enough on their own to generate accurate route networks.
Jittering leads to more spatially diffuse route networks than networks generated from the common approach of routing from and to zone centroids.
However, the results presented in this section show that careful consideration of routing options is needed in addition to evidence-based selection of jittering parameters.

# Conclusion

Building on previous work [@lovelace_jittering_2022b], we have explored the relative importance of jittering and routing options for generating accurate route network level estimates of movement, down to the street level.
In corroboration with previous research, we found that jittering leads to more spatially diverse geographic representations of travel between zones and estimates of flow down to the link level [@lovelace_assessing_2022].
A new finding was that jittering alone cannot be guaranteed to generate accurate route network levels results: appropriate routing options should be tested and identified.

The results were generated only for a single city and we did not explore the full parameter space (alternative subpoint weighting parameters in the jittering process are discussed below).
For these reasons, we cannot draw specific and universally applicable conclusions about the optimal settings for accurate route network generation in other cities: t should be remembered that route networks and cycling preferences vary from city to city [@buehler_bikeway_2016].
However, although our findings were based on a single case study, Lisbon, Portugal, the findings have implications for future work using OD data to support evidence-based investment in sustainable transport infrastructure [e.g. @vybornova_automated_2022a].
The main conclusion is that both careful translation of OD data to geographic start and end locations and disaggregation and careful selection of routing options are needed *in combination* to ensure that route networks derived from OD data are diffuse and accurate.

Accurate route network representations of transport systems are needed to support investment in a variety of transport interventions [@morgan_travel_2020].
We have focused in this study on cycleway network because a complete cycle network represents one of the most cost-effective ways to reduce car dependence and associated environmental, economic, social and health costs [@waldykowski_sustainable_2022].
Cycleway *networks*, rather than simply isolated routes or other geographically sparse interventions, are vital for successful active travel investment [@buehler_bikeway_2016].
Our results are therefore highly policy relevant, adding value to established methods of adding value to OD data to support sustainable transport planning [@lovelace_propensity_2017; @larsen_build_2013; @mohammed_origindestination_2022].

The research presented in this paper is not without limitations.
We did not explore the full range of jittering and routing options available due to time and computational resource constraints.
Specifically, varying the type and weights of origin and destination subpoints, as advocated in @lovelace_jittering_2022b, could lead to improved fit.
This would require filtering the subpoints used to include only certain types of nodes on the road network (all vertices on the road network were used as the basis for both origin subpoints and destination subpoints in this study, see [documentation](https://github.com/dabreegster/odjitter) in the `odjitter` Rust crate for details).
Future work could explore the use of including only residential roads, or increasing the weight associated with residential roads, in the origin subpoints, for example.
Likewise, destination subpoints and associated weights could be altered to prioritise key trip attractors such as schools and commercial centres.
Another limitation is the simplistic measure of accuracy used in this study.
Accuracy was inferred from goodness-of-fit between aggregated flow values at 67 counter locations and modeled flow on nearest segment on the network.
Future work could use alternative measures of fit such as root-mean-square error (RMSE) and more sophisticated ways of comparing observed counter values to modeled networked values, e.g. using inverse distance weighted measures associated with links in close proximity to each counter, with empirically derived bandwidths.

More broadly, the quality of the underlying route network data is imperfect.
Efforts to improve the underlying OpenStreetMap data will continue to overcome this limitation, not just in Lisbon but worldwide [@barrington-leigh_world_2017].
This will improve the results over time because all routing engines used in this study, except for Google's routing service, use OSM data.
Furthermore, alternative data sources and methods could be used to generate more accurate road networks [e.g. @leninisha_water_2015].
Future work should seek to test a wider range of jittering parameters in multiple case study areas with larger ground truth datasets.
Other fit measures, such as GEH or SQV statistics, may also be used to compare count data with simulated traffic volumes.

Despite these limitations, and the need for future academic work, the results are already useful.
Imperfect data-driven evidence is better than no systematic evidence, especially when practitioners are aware of the mechanisms underlying route network level estimates of travel behavior such as those presented in this paper.
A benefit of the approach is that it based on open source software and reproducible code, allowing others to build on the methods [@lovelace_open_2020].
Indeed, a next step building on directly on the research presented in this paper is to use the results to support strategic cycle network planning in Lisbon and the wider area.
In parallel to efforts to improve route network representations of transport systems we therefore advocate for the use of the approach presented in this paper, and related methods [e.g. @cooper_predictive_2018; @vybornova_automated_2022a], to be implemented in support of more evidence-based investment in sustainable transport infrastructure at city, regional and national scales worldwide.

\section*{ACKNOWLEDGEMENTS}\label{ACKNOWLEDGEMENTS}

We thank Lisbon Municipal Government and Transport Infrastructure Ireland for funding this research.

# References