An open API service indexing awesome lists of open source software.

https://github.com/dkogan/culdesacs


https://github.com/dkogan/culdesacs

Last synced: 13 days ago
JSON representation

Awesome Lists containing this project

README

          

#+OPTIONS: tex:dvipng

* Reference

This is all described better (with pictures!) in a blog post:

http://notes.secretsauce.net/notes/2015/08/16_least-convenient-location-in-los-angeles-from-koreatown.html

* Overview

Talking to a friend, a question came up about finding the point in LA's road
network that's most inconvenient to get to, with /inconvenient/ being a vague
notion describing a closed residential neighborhood full of dead ends; the
furthest of these dead ends would be most inconvenient indeed. This repository
attempts to answer that question.

I want /inconvenient/ to mean

#+BEGIN_QUOTE
Furthest to reach via the road network, but nearest as-the-crow-flies.
#+END_QUOTE

Note that this type of metric is not a universal one, but is relative to a
particular starting point. This makes sense, however: a location that's
inconvenient from one location could be very convenient from another.

This metric could be expressed in many ways. I keep it simple, and compute a
relative inefficiency coefficient:

=(d_road - d_direct) / d_direct=

Thus the goal is to find a location within a given radius of the starting point
that maximizes this relative inefficiency.

* Approach

I use [[http://www.openstreetmap.org][OpenStreetMap]] for the road data. This is all aimed at bicycling, so I'm
looking at all roads except freeways and ones marked private. I /am/ looking at
footpaths, trails, etc.

Once I have the road network, I run [[https://en.wikipedia.org/wiki/Dijkstra's_algorithm][Dijkstra's Algorithm]] to compute the shortest
path from my starting point to every other point on the map. Then I can easily
compute the inefficiency for each such point, and pick the point with the
highest inefficiency. I use OSM nodes as the "points". It is possible that the
location I'm looking for is inbetween a pair of nodes, but the nodes will really
be close enough. Also, the "distance" between adjacent nodes can take into
account terrain type, elevation, road type and so on. I ignore all that, and
simply look at the distance.

* Implementation

Each step in the process lives in its own program. This simplifies
implementation and makes it easy to work on each piece separately.

** Data import

First I query OSM. This is done with the =query.pl= script. It takes in the
center point and the query radius. The query uses the [[http://wiki.openstreetmap.org/wiki/Overpass_API/Overpass_QL][OSM Overpass query
language]]. I use this simple query, filling in the center point and radius:

#+BEGIN_EXAMPLE
[out:json];

way
["highway"]
["highway" !~ "motorway|motorway_link" ]
["access" !~ "private" ]
["access" !~ "no" ]
(around:$rad,$lat,$lon);

(._;>;);

out;
#+END_EXAMPLE

Sample invocation:

#+BEGIN_EXAMPLE
$ ./query.pl --center 34.0690448,-118.292924 --rad 20miles
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 .....

$ ls -lhrt *(om[1])
-rw-r--r-- 1 dima dima 81M Aug 14 00:44 query_34.0690448_-118.292924_20miles.json

#+END_EXAMPLE

** Data massaging

Now I need to take the OSM query results, and manipulate them into a form
readable by the Dijkstra's algorithm solver. This is done by the
=massage_input.pl= script. This script does nothing interesting, but it doesn it
inefficiently, so it's CPU and RAM-hungry and takes a few minutes. Sample
invocation:

#+BEGIN_EXAMPLE
$ ./massage_input.pl query_34.0690448_-118.292924_20miles.json > query_34.0690448_-118.292924_20miles.net
#+END_EXAMPLE

*** Neighbor list representation

An implementation choice here was how to represent the neighbor list for a node.
I want the main computation (next section) to be able to query this very
quickly, and I don't want the list to take much space, and I don't want to
fragment my memory with many small allocations. Thus I have a single contiguous
array of integers =neighbor_pool=. Each node has a single integer index into
this pool. At this index the =neighbor_pool= contains a list of node indices
that are neighbors of the node in question. A special node index of -1 signifies
the end of the neighbor list for that node.

** Inefficiency coefficient computation

I now feed the massaged data to Dijkstra's algorithm implemented in =compute.c=.
I need a priority queue where elements can be inserted, removed and updated.
Apparently most heap implementations don't have an 'update' mechanism, so it
took a little while to find a working one. I ended up using [[https://en.wikipedia.org/wiki/B-heap][phk's b-heap]]
implementation from the [[https://www.varnish-cache.org/trac/browser/lib/libvarnish/binary_heap.c][varnish source tree]]. It stores arbitrary pointers
(64-bit on my box); 32-bit indices into a pool would be more efficient, but this
is fast enough.

Sample invocation:

#+BEGIN_EXAMPLE
$ ./compute < query_34.0690448_-118.292924_20miles.net > query_34.0690448_-118.292924_20miles.out

$ head -n 2 query_34.0690448_-118.292924_20miles.out
34.069046 -118.292923 0.000000 0.000000
34.070034 -118.292931 109.863564 109.863564
#+END_EXAMPLE

The output is all nodes, sorted by the road distance to the node. The columns
are lat,lon,d_road,d_direct.

*** Distance from latitude/longitude pairs

One implementation note here is how to compute the distance between two
latitude/longitude pairs. The most direct way is to convert each
latitude/longitude pair into a unit vector, compute the dot product, take the
arccos and multiply by the radius of the Earth. This requires 9 trigonometric
operations and relies on the arccos of a number close to 1, which is inaccurate.
One could instead compute the arcsin of the magnitude of the cross-product, but
this requires even more computation. I want something simpler:

#+BEGIN_EXAMPLE
dist = Rearth * angle

cos(angle) = dot(v0,v1) = dot( (cos(lon0)*cos(lat0), sin(lon0)*cos(lat0), sin(lat0)),
(cos(lon1)*cos(lat1), sin(lon1)*cos(lat1), sin(lat1)) ) =

= cos(lat0)*cos(lat1) * ( cos(lon0)*cos(lon1) + sin(lon0)*sin(lon1) ) +
sin(lat0)*sin(lat1) =

= cos(lat0)*cos(lat1) * cos(diff_lon) + sin(lat0)*sin(lat1)

cos(diff_lon) ~ 1 - diff_lon^2/2 so

cos(angle) = cos(lat0)*cos(lat1) + sin(lat0)*sin(lat1) - diff_lon^2/2*cos(lat0)*cos(lat1) =
= cos(diff_lat) - cos(lat0)*cos(lat1)*diff_lon^2/2 ~
~ 1 - diff_lat^2/2 - diff_lon^2/2*cos(lat0)*cos(lat1)

cos(angle) ~ 1 - angle^2/2, so

angle^2 ~ diff_lat^2 + diff_lon^2*cos(lat0)*cos(lat1)

angle ~ sqrt(diff_lat^2 + diff_lon^2 * cos(lat0)*cos(lat1))

#+END_EXAMPLE

This is nice and simple. Is it sufficiently accurate? This python script tests
it:

#+BEGIN_SRC python
import numpy as np
lat0,lon0 = 34.0690448,-118.292924 # 3rd/New Hampshire
lat1,lon1 = 33.93,-118.4314 # LAX

lat0,lon0,lat1,lon1 = [x * np.pi/180.0 for x in lat0,lon0,lat1,lon1]

Rearth = 6371000

v0 = np.array((np.cos(lat0)*np.cos(lon0), np.cos(lat0)*np.sin(lon0),np.sin(lat0)))
v1 = np.array((np.cos(lat1)*np.cos(lon1), np.cos(lat1)*np.sin(lon1),np.sin(lat1)))

dist_accurate = np.sqrt( (lat0-lat1)**2 + (lon0-lon1)**2 * np.cos(lat0)*np.cos(lat1) ) * Rearth
dist_approx = np.arccos(np.inner(v0,v1)) * Rearth

print dist_accurate
print dist_approx
print dist_accurate - dist_approx
#+END_SRC

Between Koreatown and LAX there's quite a bit of difference in both latitude and
longitude. Both methods say the distance is about 20km, with a disagreement of
3mm. This is plenty good enough.

* Results

I want to find the least convenient location from the intersection of New
Hampshire and 3rd street in Los Angeles within 20 miles or so.

The output of =compute= is sorted by road distance from the start. I prepend the
coefficient of inconvenience, re-sort the list and take 50 most inconvenient
locations by invoking

#+BEGIN_EXAMPLE