https://github.com/robinlovelace/odjitter

Paper outlining the concept of 'jittering' methods for adding value to origin-destination data
https://github.com/robinlovelace/odjitter
Last synced: 3 months ago
JSON representation
Paper outlining the concept of 'jittering' methods for adding value to origin-destination data
Host: GitHub
URL: https://github.com/robinlovelace/odjitter
Owner: Robinlovelace
Created: 2021-05-11T07:39:14.000Z (about 4 years ago)
Default Branch: main
Last Pushed: 2022-04-23T11:19:48.000Z (about 3 years ago)
Last Synced: 2025-02-14T13:24:11.211Z (5 months ago)
Language: TeX
Homepage: https://osf.io/qux6g/
Size: 14.4 MB
Stars: 4
Watchers: 5
Forks: 2
Open Issues: 0
Metadata Files:
- Readme: README.Rmd
Awesome Lists containing this project

README

        ---

title: "Jittering: A computationally efficient method for generating realistic route networks from origin-destination data"

output: github_document

# output:

#   bookdown::pdf_document2:

#     keep_tex: true

bibliography: references.bib

author: Robin Lovelace, Rosa Félix, Dustin Carlino

---

```{r, include=FALSE}

knitr::opts_chunk$set(

  collapse = TRUE,

  comment = "#>",

  echo = FALSE,

  message = FALSE,

  cache = TRUE,

  warning = FALSE,

  fig.align = "center"

)

# devtools::install_github("itsleeds/od")

library(sf)

library(tmap)

library(dplyr)

library(stplanr)

```

```{r, eval=FALSE}

file.copy("README.html", "jittering-paper.html")

file.copy("README.pdf", "jittering-paper-resubmission3.pdf", TRUE)

# file.copy("README.tex", "jittering-paper-resubmission2.tex")

piggyback::pb_upload("jittering-paper-resubmission3.pdf")

piggyback::pb_download_url("jittering-paper-resubmission3.pdf")

# https://github.com/Robinlovelace/odjitter/releases/download/1/jittering-paper-resubmission2.pdf

# Generate citations (requires Zotero)

rbbt::bbt_update_bib("README.Rmd", "references.bib")

# trackdown::upload_file(file = "README.Rmd", path_output = "README.docx")

# trackdown::update_file(file = "README.Rmd", path_output = "README.docx")

# create diff

system("latexdiff jittering-paper-resubmission2.tex README.tex > diff-resubmission3.tex")

tinytex::pdflatex("diff-resubmission3.tex")

browseURL("diff-resubmission3.pdf")

piggyback::pb_upload("diff-resubmission3.pdf")

piggyback::pb_download_url("diff-resubmission3.pdf")

zip("jittering-paper-code.zip", files = c("odjitter-paper.R", "odjitter-paper.sh"))

piggyback::pb_upload("jittering-paper-code.zip")

```

[![.github/workflows/render-rmarkdown.yaml](https://github.com/Robinlovelace/odjitter/actions/workflows/render-rmarkdown.yaml/badge.svg)](https://github.com/Robinlovelace/odjitter/actions/workflows/render-rmarkdown.yaml)

# Abstract {-}

Origin-destination (OD) datasets are often represented as 'desire lines' between zone centroids.

This paper presents a 'jittering' approach to pre-processing and conversion of OD data into geographic desire lines that (1) samples unique origin and destination locations for each OD pair, and (2) splits 'large' OD pairs into 'sub-OD' pairs.

Reproducible findings, based on the open source odjitter Rust crate, show that route networks generated from jittered desire lines are more geographically diffuse than route networks generated by 'unjittered' data.

We conclude that the approach is a computationally efficient and flexible way to simulate transport patterns, particularly relevant for modelling active modes.

Further work is needed to validate the approach and to find optimal settings for sampling and disaggregation.

# Questions

Origin-destination (OD) datasets are widely used in transport planning to efficiently represent aggregate travel behavior.

Despite the emergence of 'big data' sources such as massive GPS datasets, OD data continues to play an established --- if not central --- role in 21^st^ century transport planning and modelling.

Recent applications range from analysis of the evolution of urban activity and shared mobility services over time [e.g. @shi_exploring_2019; @li_effects_2019] to inference of congestion and mode split [@bachir_inferring_2019; @gao_method_2021].

There has been much written on optimal zoning systems for and geographic representations of OD data [e.g. @openshaw_optimal_1977; @boyce_forecasting_2015].

Recent papers have presented new methods for OD dataset validation [@alexander_validation_2015], aggregation [@he_simple_2018; @liu_snn_2021], disaggregation [@katranji_mobility_2016] and location of 'connectors' joining zone center points (centroids) with the surrounding network [@jafari_investigation_2015].

Broadly, there are two approaches to converting OD data into geographic representations for transport modelling:

1. Centroid to centroid representations, a common approach involving the simplifying assumption that all trip destinations and origins can be represented by (sometimes population weighted or aggregated) zone centroids [@guo_origindestination_2014; @martin_origindestination_2018].

2. Subdividing zones (also referred to as transport analysis zones, TAZ) at which data is available to subzones centroids [@opie_commodityspecific_2009] or 'centroid connectors' or simply 'connectors' "between trip ends and zonal anchors" using stochastic or deterministic approaches [@leurent_stochastic_2011; @friedrich_methods_2009].

In this paper we present a new approach which, unlike established approaches which convert centroid based desire lines to routes and then route networks [@morgan_travel_2020], allows the user to adjust start and end locations based on variables such as transport network density, residential density or size of commercial buildings acting as trip attractors.

This 'jittering' approach is flexible, enabling the user to adjust the level of disaggregation, the location of start and end points from which disaggregate OD pairs are sampled, and weights representing the importance of different trip 'originators' and 'attractors'.

 

OD data jittering can is a simple, transparent and flexible pre-processing stage that aims to represent the diffuse nature of travel patterns.

This is particularly important when designing for active travel [@buehler_bikeway_2016], explaining the choice of input data to illustrate the technique in this paper: it was developed in response to feedback from Edinburgh City Council (who funded a project based on the research) that route networks based on the Propensity to Cycle Tool approach [@lovelace_propensity_2017] were too sparse.

Jittered OD data can be used with existing transport modelling workflows developed around centroid-based methods, as the basis of route network assignment, uptake modelling, and route network generation workflows [@morgan_travel_2020].

We refer to the approach as jittering, which means adding random noise for data visualization [@wickham_ggplot2_2016].

```{r haystack, fig.cap="Illustration of how geographic visualisation and routing can add value to OD datasets and make them more policy relevant."}

# todo: update this if needed, commented out to save space and cut to the chase!

# knitr::include_graphics("https://user-images.githubusercontent.com/1825120/142071229-81358e26-5e8d-437e-9ef8-91704a4e690f.png")

```

The jittering approach presented in this paper was motivated by the following question:

> How can OD data representing trips between large geographic zones be used more effectively, to generate diffuse route networks of current or potential flow to inform local interventions?

# Methods {#methods}

The approach was developed to support public sector transport planning in Edinburgh, UK.

The original study area was Edinburgh City Council, a major economic hub with ambitious [plans](https://www.edinburgh.gov.uk/downloads/file/30073/active-travel-investment-programme-update-october-2021) for investment in active travel, making evidence to support investment where it will be most beneficial key.

For the purposes of this study we focus on a comparatively small area around central Edinburgh.

We focus in this paper on walking trips in this central area because much research into route networks has focused on cycling and, because walking trips tend to be short, they create a need to convert aggregated OD datasets into diffuse route network representations of travel.

Input datasets developed for this paper can be downloaded using reproducible code that accompanies the paper; see code at [url to be included on publication] to fully reproduce the findings.

```{r get-osm-data, eval=FALSE}

road_network_area = osmextract::oe_get_network(place = "Scotland", mode = "cycling")

road_network = road_network_area[edinburgh_region, ]

saveRDS(road_network, "road_network.Rds")

road_network_touching = road_network[zones_touching, ]

nrow(road_network) # 35k

nrow(road_network_touching) # 6k

table(road_network_touching$highway)

road_network_min = road_network_touching %>% 

  # filter(str_detect(string = highway, pattern = "cycle|prim|sec|tert"))

  filter(highway %in% c("primary", "secondary", "tertiary"))

nrow(road_network_min) # 800

plot(road_network_min["highway"])

saveRDS(road_network_min, "road_network_min.Rds")

piggyback::pb_upload("road_network_min.Rds", repo = "itsleeds/od")

piggyback::pb_download_url("road_network_min.Rds", repo = "itsleeds/od")

```

```{r read-inputs}

# u = "https://github.com/ITSLeeds/od/releases/download/0.2.1/od_iz_ed.Rds"

# f = basename(u)

# if(!file.exists(f)) download.file(u, f)

# od = readRDS("od_iz_ed.Rds")

# readr::write_csv(od, "od_iz_ed.csv")

# piggyback::pb_upload("od_iz_ed.csv", repo = "itsleeds/od")

# piggyback::pb_download_url("od_iz_ed.csv", repo = "itsleeds/od")

# od = readr::read_csv("https://github.com/ITSLeeds/od/releases/download/v0.3.1/od_iz_ed.csv")

od = readr::read_csv("https://github.com/Robinlovelace/odjitter/releases/download/1/od_central.csv")

# head(od)

# u = "https://github.com/ITSLeeds/od/releases/download/0.2.1/iz_zones11_ed.Rds"

# f = basename(u)

# if(!file.exists(f)) download.file(u, f)

# zones = readRDS("iz_zones11_ed.Rds")

# sf::write_sf(zones, "iz_zones11_ed.geojson")

# piggyback::pb_upload("iz_zones11_ed.geojson", repo = "itsleeds/od")

# zones = sf::read_sf("https://github.com/ITSLeeds/od/releases/download/v0.3.1/iz_zones11_ed.geojson")

zones = sf::read_sf("https://github.com/Robinlovelace/odjitter/releases/download/1/zones.geojson")

# head(zones)

# u = "https://github.com/ITSLeeds/od/releases/download/0.2.1/iz_cents11_ed.Rds"

# f = basename(u)

# if(!file.exists(f)) download.file(u, f)

# centroids = readRDS(f)

# sf::write_sf(centroids, "iz_centroids11_ed.geojson")

# piggyback::pb_upload("iz_centroids11_ed.geojson", repo = "itsleeds/od")

# centroids = sf::read_sf("https://github.com/ITSLeeds/od/releases/download/v0.3.1/iz_centroids11_ed.geojson")

# centroids = centroids[zones, ]

# nrow(centroids) # 71

# sf::write_sf(centroids, "centroids.geojson")

# piggyback::pb_upload("centroids.geojson")

centroids = sf::read_sf("https://github.com/Robinlovelace/odjitter/releases/download/1/centroids.geojson")

# u = "https://github.com/ITSLeeds/od/releases/download/v0.3.1/road_network_min.Rds"

# f = basename(u)

# if(!file.exists(f)) download.file(u, f)

# road_network_min = readRDS(f)

# sf::write_sf(road_network_min, "road_network_min.geojson")

# piggyback::pb_upload("road_network_min.geojson", repo = "itsleeds/od")

road_network_min = sf::read_sf("https://github.com/ITSLeeds/od/releases/download/v0.3.1/road_network_min.geojson")

road_network_buffer = sf::read_sf("https://github.com/Robinlovelace/odjitter/releases/download/1/road_network_buffer.geojson")

```

```{r read-region}

# lads_uk = ukboundaries::lad2018

# # # names(lads_uk)

# lads_scotland = lads_uk %>%

#   filter(str_detect(lau118cd, "S"))

# # saveRDS(lads_scotland, "lads_scotland.Rds")

# # library(dplyr)

# # lads_scotland = readRDS("lads_scotland.Rds")

# # piggyback::pb_upload("lads_scotland.Rds")

# 

# edinburgh_region = lads_scotland %>%

#   dplyr::filter(lau118nm == "Edinburgh, City of")

# # saveRDS(edinburgh_region, "edinburgh_region.Rds")

# sf::write_sf(edinburgh_region, "edinburgh_region.geojson")

# piggyback::pb_upload("edinburgh_region.geojson", repo = "itsleeds/od")

central_edinburgh = tmaptools::geocode_OSM(q = "edinburgh", as.sf = TRUE)

central_edinburgh_5km = sf::st_buffer(central_edinburgh, dist = 5000)

edinburgh_region = sf::read_sf("https://github.com/ITSLeeds/od/releases/download/v0.3.1/edinburgh_region.geojson")

zones_centroids = sf::st_centroid(zones)

zones_centroids_5km = zones_centroids[central_edinburgh_5km, ]

# zones = zones %>% 

#   filter(InterZone %in% zones_centroids_5km$InterZone)

# road_network_buffer = road_network_touching[zones, ]

# saveRDS(road_network_buffer, "road_network_buffer.Rds")

# sf::write_sf(road_network_buffer, "road_network_buffer.geojson")

# piggyback::pb_upload("road_network_buffer.geojson")

# sum(zones$TotPop2011) # 476626

m1 = tm_shape(zones) + tm_polygons("TotPop2011", title = "Population", palette = "viridis") +

  tm_scale_bar() +

  tm_minimap(zoomLevelOffset = -7)

# tmap_leaflet(m1)

```

```{r izs, fig.cap="Overview of the study region with the population from the 2011 Census at the level of Intermediate Zones corresponding to fill colour.", out.width="50%", eval=FALSE}

# See interactive map online at https://rpubs.com/RobinLovelace/843442

knitr::include_graphics("figures/overview-zones-central.png")

```

Beyond the zone data illustrated in Figure \@ref(fig:od), the input dataset consisted of open access OD data from the 2011 census.

The OD data can be represented as both tabular and, when start and end points are assigned to centroids within each zone, as geographic entities, as illustrated in a sample of three OD pairs presented in Figure \@ref(fig:od).

To generate the route networks presented in Figure \@ref(fig:rneted) we used the OpenStreetMap Routing Machine (OSRM) with the profile set to 'foot'.

```{r odsf}

# head(centroids)

od_sf = od::od_to_sf(od, centroids)

# od_sf_central = od_sf %>% 

#   filter(geo_code1 %in% zones$InterZone) %>% 

#   filter(geo_code2 %in% zones$InterZone) 

# od_sf = od_sf_central %>%

#   top_n(n = 500, wt = foot)

# od_central = od_sf %>% sf::st_drop_geometry()

# nrow(od_central) # 514

# write_csv(od_central, "od_central.csv")

# piggyback::pb_upload("od_central.csv")

od_sf_top3 = od_sf %>%

  filter(geo_code1 != geo_code2) %>% 

  top_n(n = 3, wt = all) %>%

  select(geo_code1, geo_code2, all, foot, bicycle, bus, car_driver) %>% 

  arrange(desc(all))

centroids_top = centroids %>% 

  filter(InterZone %in% c(od_sf_top3$geo_code1, od_sf_top3$geo_code2))

# tmap_mode("view")

zones_in_top3 = zones %>% 

  filter(InterZone %in% c(od_sf_top3$geo_code1, od_sf_top3$geo_code2))

k = od_sf_top3 %>%

  sf::st_drop_geometry() %>%

  kableExtra::kable()

# k

# zones_touching = zones[zones_in_top3, ]

# saveRDS(zones_touching, "zones_touching.Rds")

bbox = sf::st_bbox(zones_in_top3)

m1 = tm_shape(od_sf_top3, bbox = bbox) +

  tm_lines("foot", palette = "Set1", breaks = c(100, 200, 300, 400), lwd = 6) +

  tm_shape(zones) +

  tm_borders() +

  tm_shape(zones_in_top3) +

  tm_text("InterZone", size = 0.8) +

  tm_scale_bar()

# m1

# writeLines(k, "/tmp/kable.html")

# browseURL("/tmp/kable.html")

```

```{r od, fig.cap="Illustration of input data in tabular (bottom right, inset) and geographic form (in the map). Note how the ID codes in the first two columns of the table correspond with IDs in the zone data and how the cells in the 'foot' column are represented geographically on the map.", fig.show='hold', out.width="80%"}

knitr::include_graphics(c(

  "figures/od-top-3-zones-metafigure.png"

  # "figures/od-top-3-table.png",

  # "figures/od-top-3.png"

))

```

The key elements of the jittering approach outlined in this paper are described in the following sub-sections, and are perhaps best understood visually, as illustrated in each of the facetted maps in Figure \@ref(fig:jitters).

The subfigures show the flexibility of approach, with C) and D) demonstrating the use of vertices on the road network as start and end points, building on the observation from spatial network analysis that the density of the transport network is a reasonable proxy for travel demand [@cooper_predictive_2018].

Other refinements including weighted subpoints could be used when data sources (e.g. building footprint areas) are available.

```{r jitters, fig.cap="Illustration of jittering and disaggregation of origin-destination (OD) data with a minimal input dataset. Subfigure A) shows the conventional way of representing OD data as desire lines between zone centroids. Subfigures B) and C) show the same desire lines but with jittered origin and destination locations based on simple random sampling of points and sampling locations on the road network. Subfigure D) shows the combined impact of disaggregation and jittering. Zone limits are represented in grey, while road network is in green.", out.width="80%"}

set.seed(2021)

fn = c(

  "A) Centroid based desire lines",

  "B) Jittered desire lines (random point sampling)",

  "C) Jittered desire lines (sample from network)",

  "D) Jittered desire lines (with disaggregation)"

)

od_top3_jittered = od::od_jitter(od_sf_top3, z = zones)

od_top3_road = od::od_jitter(od_sf_top3, z = zones, road_network_min, max_per_od = 1000)

od_top3_disaggregated = od::od_jitter(od_sf_top3, z = zones, road_network_min, max_per_od = 100)

# od_top3_disaggregated = od::od_jitter(od_sf_top3, z = zones, road_network_min, max_per_od = 200, population_column = "foot")

m1 =  tm_shape(zones, bbox = bbox) +

  tm_borders(col = "grey") +

  tm_shape(od_sf_top3, bbox = bbox) +

  tm_lines("foot", palette = "Set1", breaks = c(100, 200, 300, 400), lwd = 6, title.col = "Walking trips per day")  +

  tm_shape(zones_in_top3) +

  tm_text("InterZone", size = 0.4) +

  tm_layout(title = "A) Single origin and destination point per zone", title.bg.color =  "white")

m2 =  tm_shape(zones, bbox = bbox) +

  tm_borders(col = "grey") +

  tm_shape(od_top3_jittered, bbox = bbox) +

  tm_lines("foot", palette = "Set1", breaks = c(100, 200, 300, 400), lwd = 6, title.col = "Walking trips per day") +

  tm_layout(title = "B) Randomised origin and destination points", title.bg.color =  "white")

m3 =  tm_shape(zones, bbox = bbox) +

  tm_borders(col = "grey") +

  tm_shape(road_network_min, bbox = bbox) +

  tm_lines(col = "darkgreen") +

  tm_shape(od_top3_road) +

  tm_lines("foot", palette = "Set1", breaks = c(100, 200, 300, 400), lwd = 6, title.col = "Walking trips per day") +

  tm_layout(title = "C) Randomised points sampled from transport network", title.bg.color =  "white", legend.position = c("right", "bottom"))

m4 = tm_shape(zones, bbox = bbox) +

  tm_borders(col = "grey") +

  tm_shape(road_network_min, bbox = bbox) +

  tm_lines(col = "darkgreen") +

  tm_shape(od_top3_disaggregated, bbox = bbox) +

  tm_lines("foot", palette = "Set1", lwd = 4, breaks = c(50, 60, 70, 80), title.col = "Walking trips per day") +

  tm_layout(title = "D) Jittered result with disaggregation and points on network", title.bg.color =  "white", legend.position = c("right", "bottom"))

# todo: add a 4th figure showing sampling on the network

# m3

tmap_arrange(m1, m2, m3, m4, nrow = 2)

od_combined = rbind(

  od_sf_top3 %>% transmute(foot, type = fn[1]),

  od_top3_jittered %>% transmute(foot, type = fn[2]),

  od_top3_road %>% transmute(foot, type = fn[3]),

  od_top3_disaggregated %>% transmute(foot, type = fn[4])

)

```

```{r desire}

# od_top_100 = od %>% 

#   top_n(n = 100, wt = all)

# desire_lines = od::od_to_sf(x = od_top_100, z = centroids)

# nrow(desire_lines)

# plot(desire_lines)

# subpoints = sf::st_sample(x = zones, size = 10000)

```

```{r jittered}

# desire_lines_jittered = od::od_jitter(od = desire_lines, z = zones)

# plot(desire_lines$geometry)

# plot(desire_lines_jittered$geometry)

```

## Sampling origin and destination points

Key to jittering is ensuring that each trip starts and ends in a different place.

To do this, there must be 'sub-points' within each zone, one for each trip originating and departing.

The simplest approach is simple random spatial sampling, as illustrated in Figure \@ref(fig:jitters) (B), which involves generating random coordinate pairs.

This approach has the advantages of simplicity, requiring no additional datasets, but has the disadvantage that it may lead to unrealistic start and end points, e.g. with trips being simulated to start in rivers and in uninhabited wilderness areas.

To overcome the limitations of the simple random sampling approach, the universe of possible coordinates from which trips can originate and end can be reduced by providing another geographic input dataset.

This dataset could contain known trip attractors such as city centers and work places, as well as tightly defined residential 'subzones'.

For highly disaggregated flows in cases where accurate building datasets are available, building footprints could also be used.

A useful, and widely available [@barrington-leigh_world_2017], input for subsampling is a transport road network, as illustrated in Figure \@ref(fig:jitters) (C).

Additional refinements to the stochastic selection of origin and destination based on weights relating to other datasets are possible, as discussed in the final section.

## Disaggregation

Both of the jittering techniques outlined above generate more diffuse route networks.

However, a problem with OD datasets is that they are often highly variable: one OD pair could represent 1 trip, while another could represent 1000 trips.

To overcome this problem a process of disaggregation can be used, resulting in additional OD pairs within each pair of zones. 

The results of disaggregation are illustrated geographically in Figure \@ref(fig:jitters) (D) and in terms of changes to attributes, in Tables \@ref(tab:dis1) and \@ref(tab:dis2).

As shown in those tables, updated attributes can be calculated by dividing previous trip counts by the number of OD pairs in the disaggregated representation of the data, 5 in this case.

To determine how many disaggregated OD pairs each original OD pair is split into, a maximum threshold was set: an OD pairs with a total trip count exceeding this threshold (set at 100 in this case) is split into the minimum number of disaggregated OD pairs that reduce the total number of trips below the threshold.

```{r}

od_to_disag = od_sf_top3 %>%

  sf::st_drop_geometry() %>% 

  slice(1) %>%

  transmute(representation = "original", geo_code1, geo_code2, all, foot)

od_disaggregated = od_top3_disaggregated %>%

  sf::st_drop_geometry() %>% 

  filter(o_agg == od_to_disag$geo_code1) %>% 

  filter(d_agg == od_to_disag$geo_code2) %>% 

  transmute(representation = "disaggregated", geo_code1 = o_agg, geo_code2 = d_agg, all, foot)

```

```{r dis1}

# kableExtra::kable(od_to_disag, caption = "Attribute data associated with an OD pair before disaggregation.")

# gt::gt(od_to_disag, caption = "Attribute data associated with an OD pair before disaggregation.")

knitr::kable(od_to_disag, caption = "Attribute data associated with an OD pair before disaggregation.", booktabs = TRUE)

```

```{r dis2}

# knitr::kable(od_disaggregated, caption = "Attribute data associated with an OD pair after disaggregation.")

# gt::gt(od_disaggregated, caption = "Attribute data associated with an OD pair after disaggregation.")

knitr::kable(od_disaggregated, caption = "Attribute data associated with an OD pair after disaggregation.", booktabs = TRUE)

```

# Findings

We found that jittering generates desire lines, and route networks, that are more geographically diffuse than those resulting from the established centroid-based approach.

Figure \@ref(fig:jittered514) shows the use of simple random sampling and sampling nodes on transport networks with reference to a real world example.

While the simple random sampling method of jittering presented in Figure \@ref(fig:jittered514) (B) may be appropriate in some specific cases, we advocate using pre-defined sub-points.

Using sub-points representing vertices on the transport network, as illustrated in Figures \@ref(fig:jittered514) C and D, is supported by 'spatial network analysis' (SNA) approaches to transport modelling [e.g. @cooper_predictive_2018].

Weighted points representing trip origins and destinations such as houses and commercial buildings could also be used.

```{r jittered514, fig.cap="Results showing the conversion of OD data to geographic desire lines using population weighted centroids for origins and destinations (A) and jittered results. The jittered results illustrate jittering with simple random sampling of origin and destination locations (B), sampling on the network (C), and sampling on the network plus disaggregation of OD pairs representing more than 100 trips (D)."}

# sum(od_sf$foot) / sum(od_sf_central$foot) # 80%

# qtm(zones) +

# tm_shape(od_sf) +

#   tm_lines() 

bbox = tmaptools::bb(od_sf, ext = 1.2)

od_sf_jittered = od::od_jitter(od_sf, z = zones)

od_sf_road = od::od_jitter(od_sf, z = zones, road_network_buffer, max_per_od = 1000)

od_sf_disaggregated = od::od_jitter(od_sf, z = zones, road_network_buffer, max_per_od = 100)

# od_sf_disaggregated = od::od_jitter(od_sf, z = zones, road_network_min, max_per_od = 200, population_column = "foot")

m1 =  tm_shape(zones, bbox = bbox) +

  tm_borders(col = "grey") +

  tm_shape(od_sf, bbox = bbox) +

  tm_lines(lwd = "foot", scale = 3, title.lwd = "Walking trips per day", legend.lwd.show = FALSE) +

  tm_layout(title = "A) Single origin and destination point per zone", title.bg.color =  "white", legend.position = c("right", "bottom"), legend.bg.alpha = 0.3)

m2 =  tm_shape(zones, bbox = bbox) +

  tm_borders(col = "grey") +

  tm_shape(od_sf_jittered, bbox = bbox) +

  tm_lines(lwd = "foot", scale = 3, title.lwd = "Walking trips per day", legend.lwd.show = FALSE) +

  tm_layout(title = "B) Randomised origin and destination points", title.bg.color =  "white", legend.position = c("right", "bottom"), legend.bg.alpha = 0.3)

m3 =  tm_shape(zones, bbox = bbox) +

  tm_borders(col = "grey") +

  tm_shape(road_network_buffer, bbox = bbox) +

  tm_lines(col = "darkgreen", lwd = 0.1) +

  tm_shape(od_sf_road) +

  tm_lines(lwd = "foot", scale = 3, title.lwd = "Walking trips per day", legend.lwd.show = FALSE) +

  tm_layout(title = "C) Randomised points sampled from transport network", title.bg.color =  "white", legend.position = c("right", "bottom"), legend.bg.alpha = 0.3)

m4 = tm_shape(zones, bbox = bbox) +

  tm_borders(col = "grey") +

  tm_shape(road_network_buffer, bbox = bbox) +

  tm_lines(col = "darkgreen", lwd = 0.1) +

  tm_shape(od_sf_disaggregated, bbox = bbox) +

  tm_lines(lwd = "foot", scale = 3, title.lwd = "Walking trips per day", legend.lwd.show = FALSE) +

  tm_layout(title = "D) Jittered result with disaggregation and points on network", title.bg.color =  "white", legend.position = c("right", "bottom"), legend.bg.color = "white", legend.bg.alpha = 0.3)

# todo: add a 4th figure showing sampling on the network

# m3

tmap_arrange(m1, m2, m3, m4, nrow = 2)

```

```{r, eval=FALSE}

routes_od = route(l = od_sf, route_fun = route_osrm, osrm.profile = "foot")

sf::write_sf(routes_od, "routes_od.geojson")

routes_jittered = route(l = od_sf_jittered, route_fun = route_osrm, osrm.profile = "foot")

sf::write_sf(routes_jittered, "routes_jittered.geojson")

routes_road = route(l = od_sf_road, route_fun = route_osrm, osrm.profile = "foot")

sf::write_sf(routes_road, "routes_road.geojson")

routes_disaggregated = route(l = od_sf_disaggregated, route_fun = route_osrm, osrm.profile = "foot", wait = 0.01)

sf::write_sf(routes_disaggregated, "routes_disaggregated.geojson")

f = list.files(pattern = "geojson")

piggyback::pb_upload(f)

piggyback::pb_download_url(f)

```

The results of converting the desire lines to routes and then route networks are illustrated in Figure \@ref(fig:rneted), which shows progressively more diffuse networks.

Greater disaggregation leads to more diffuse networks as shown in Figure \@ref(fig:rneted) (D).

The advantages of this approach include simplicity, low computational cost and flexibility, with disaggregation (and network diffusion) levels adjusted depending on requirements.

Disadvantages relate to the use of random number generators (RNG), which can reduce reproducibility (overcome this by setting a 'seed', which makes the findings reproducible) and influence findings (generate more than one set of results and undertake testing to mitigate this drawback).

Jitting is particularly well suited to modelling walking and cycling, which require diffuse networks.

Taking disaggregation further, the approach can generate one desire line per trip that could feed into agent based models (ABM) such as A/B Street and MATSim [@abstreet_2022; @horni_multiagent_2016].

Jittering has few input data requirements, enabling its use in situations where sub-zones are unavailable.

```{r rneted, fig.cap="Route network results derived from non-jittered OD data (A) and OD data that has been jittered (B to D). The route network results correspond to the desire lines shown in Figure 4, with start and end points sampled from: random locations in geographic space (B); nodes on the transport network network (C); and nodes on the network plus disaggregation of OD pairs representing more than 100 trips (D).", out.width="100%"}

# manual approach

# routes_od = sf::read_sf("https://github.com/Robinlovelace/odjitter/releases/download/1/routes_od.geojson")

# routes_jittered = sf::read_sf("https://github.com/Robinlovelace/odjitter/releases/download/1/routes_jittered.geojson")

# routes_road = sf::read_sf("https://github.com/Robinlovelace/odjitter/releases/download/1/routes_road.geojson")

# routes_disaggregated = sf::read_sf("https://github.com/Robinlovelace/odjitter/releases/download/1/routes_disaggregated.geojson")

f = c(

  "routes_od",

  "routes_jittered",

  "routes_road",

  "routes_disaggregated"

)

i = 1

rnets = purrr::map_dfr(seq(length(f)), function(i) {

  fi = f[i]

  u = paste0("https://github.com/Robinlovelace/odjitter/releases/download/1/", fi, ".geojson")

  routes = sf::read_sf(u)

  rnet = overline(routes, attrib = "foot")

  rnet$type = fn[i]

  rnet

})

# summary(rnets$foot)

tm_shape(rnets %>% mutate(foot = case_when(foot < 50 ~ 50, TRUE ~ foot)), bbox = tmaptools::bb(rnets, 0.5)) +

  tm_lines(lwd = "foot", scale = 15, lwd.legend = c(100, 500, 1000, 2000), title.lwd = "Walking trips per day") +

  tm_facets("type", free.scales.line.lwd = FALSE) +

  tm_layout(legend.outside.position = "top", legend.outside.size = 0.2)

# knitr::include_graphics("figures/rneted-updated.png")

```

```{r sumtable, eval=FALSE}

rnets$length_km = as.numeric(sf::st_length(rnets)) / 1000

rnets_summary = rnets %>%

  sf::st_drop_geometry() %>% 

  group_by(type) %>% 

  mutate(type = str_sub(type, 1, 3)) %>% 

  summarise(

    `Network length (km)` = sum(length_km),

    `Average flow per segment` = mean(foot),

    `Standard deviation` = sd(foot)

  )

rnets_summary$`N. OD pairs` = c(rep(nrow(od_sf), 3), nrow(od_sf_disaggregated))

rnets_summary %>%

  select(type, `N. OD pairs`, everything()) %>% 

  knitr::kable(booktabs = TRUE, digits = 0, caption = "Summary of desire line and route network level results.")

```

This is, to the best of our knowledge, the first time that stochastic spatial sampling and disagreggation of OD data has been described in a single approach.

The approach is implemented in the open source Rust crate [`odjitter`](https://github.com/dabreegster/odjitter).

Implementations in R packages [`od`](https://itsleeds.github.io/od/) and [`odjitter`](https://github.com/dabreegster/odjitter/tree/main/r), an interface to the Rust implementation, enable others to reproduces the findings, raising the possibility of interfaces to other languages.

The results also raise research questions, including: 

- Are the jittered results measurably better when compared with counter datasets on the network?

- How would results from jittering OD data compare in other situations, e.g. to model motor traffic?

- Which jittering settings (including sampling strategies and levels of disaggregation) represent the best 'boom for buck' in terms of network accuracy relative to computational requirements?

- And can further refinements, for example sampling with weights to increase the proportion of trips associated with large buildings and commercial centers, or modifying disaggregation threshold values depending on variables such as zone size, improve results?

Before further refinements are made, we advocate empirical research to validate the jittering approach outlined in this paper as a foundation for further work on OD data pre-processing and disaggregation.

Such research requires case studies that have both good open OD data and good observed travel behavior data, for example from manual and automatic counters at point locations on the network [@lindsey_minnesota_2013] and other sources of data such as trajectory datasets from GPS devices [@zheng_big_2016].

# References
ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/robinlovelace/odjitter

Awesome Lists containing this project

README