https://github.com/dkogan/culdesacs
https://github.com/dkogan/culdesacs
Last synced: 13 days ago
JSON representation
- Host: GitHub
- URL: https://github.com/dkogan/culdesacs
- Owner: dkogan
- Created: 2015-08-13T03:25:19.000Z (almost 11 years ago)
- Default Branch: master
- Last Pushed: 2015-08-16T08:10:45.000Z (almost 11 years ago)
- Last Synced: 2026-02-11T02:25:37.892Z (5 months ago)
- Language: C
- Size: 137 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.org
Awesome Lists containing this project
README
#+OPTIONS: tex:dvipng
* Reference
This is all described better (with pictures!) in a blog post:
http://notes.secretsauce.net/notes/2015/08/16_least-convenient-location-in-los-angeles-from-koreatown.html
* Overview
Talking to a friend, a question came up about finding the point in LA's road
network that's most inconvenient to get to, with /inconvenient/ being a vague
notion describing a closed residential neighborhood full of dead ends; the
furthest of these dead ends would be most inconvenient indeed. This repository
attempts to answer that question.
I want /inconvenient/ to mean
#+BEGIN_QUOTE
Furthest to reach via the road network, but nearest as-the-crow-flies.
#+END_QUOTE
Note that this type of metric is not a universal one, but is relative to a
particular starting point. This makes sense, however: a location that's
inconvenient from one location could be very convenient from another.
This metric could be expressed in many ways. I keep it simple, and compute a
relative inefficiency coefficient:
=(d_road - d_direct) / d_direct=
Thus the goal is to find a location within a given radius of the starting point
that maximizes this relative inefficiency.
* Approach
I use [[http://www.openstreetmap.org][OpenStreetMap]] for the road data. This is all aimed at bicycling, so I'm
looking at all roads except freeways and ones marked private. I /am/ looking at
footpaths, trails, etc.
Once I have the road network, I run [[https://en.wikipedia.org/wiki/Dijkstra's_algorithm][Dijkstra's Algorithm]] to compute the shortest
path from my starting point to every other point on the map. Then I can easily
compute the inefficiency for each such point, and pick the point with the
highest inefficiency. I use OSM nodes as the "points". It is possible that the
location I'm looking for is inbetween a pair of nodes, but the nodes will really
be close enough. Also, the "distance" between adjacent nodes can take into
account terrain type, elevation, road type and so on. I ignore all that, and
simply look at the distance.
* Implementation
Each step in the process lives in its own program. This simplifies
implementation and makes it easy to work on each piece separately.
** Data import
First I query OSM. This is done with the =query.pl= script. It takes in the
center point and the query radius. The query uses the [[http://wiki.openstreetmap.org/wiki/Overpass_API/Overpass_QL][OSM Overpass query
language]]. I use this simple query, filling in the center point and radius:
#+BEGIN_EXAMPLE
[out:json];
way
["highway"]
["highway" !~ "motorway|motorway_link" ]
["access" !~ "private" ]
["access" !~ "no" ]
(around:$rad,$lat,$lon);
(._;>;);
out;
#+END_EXAMPLE
Sample invocation:
#+BEGIN_EXAMPLE
$ ./query.pl --center 34.0690448,-118.292924 --rad 20miles
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 .....
$ ls -lhrt *(om[1])
-rw-r--r-- 1 dima dima 81M Aug 14 00:44 query_34.0690448_-118.292924_20miles.json
#+END_EXAMPLE
** Data massaging
Now I need to take the OSM query results, and manipulate them into a form
readable by the Dijkstra's algorithm solver. This is done by the
=massage_input.pl= script. This script does nothing interesting, but it doesn it
inefficiently, so it's CPU and RAM-hungry and takes a few minutes. Sample
invocation:
#+BEGIN_EXAMPLE
$ ./massage_input.pl query_34.0690448_-118.292924_20miles.json > query_34.0690448_-118.292924_20miles.net
#+END_EXAMPLE
*** Neighbor list representation
An implementation choice here was how to represent the neighbor list for a node.
I want the main computation (next section) to be able to query this very
quickly, and I don't want the list to take much space, and I don't want to
fragment my memory with many small allocations. Thus I have a single contiguous
array of integers =neighbor_pool=. Each node has a single integer index into
this pool. At this index the =neighbor_pool= contains a list of node indices
that are neighbors of the node in question. A special node index of -1 signifies
the end of the neighbor list for that node.
** Inefficiency coefficient computation
I now feed the massaged data to Dijkstra's algorithm implemented in =compute.c=.
I need a priority queue where elements can be inserted, removed and updated.
Apparently most heap implementations don't have an 'update' mechanism, so it
took a little while to find a working one. I ended up using [[https://en.wikipedia.org/wiki/B-heap][phk's b-heap]]
implementation from the [[https://www.varnish-cache.org/trac/browser/lib/libvarnish/binary_heap.c][varnish source tree]]. It stores arbitrary pointers
(64-bit on my box); 32-bit indices into a pool would be more efficient, but this
is fast enough.
Sample invocation:
#+BEGIN_EXAMPLE
$ ./compute < query_34.0690448_-118.292924_20miles.net > query_34.0690448_-118.292924_20miles.out
$ head -n 2 query_34.0690448_-118.292924_20miles.out
34.069046 -118.292923 0.000000 0.000000
34.070034 -118.292931 109.863564 109.863564
#+END_EXAMPLE
The output is all nodes, sorted by the road distance to the node. The columns
are lat,lon,d_road,d_direct.
*** Distance from latitude/longitude pairs
One implementation note here is how to compute the distance between two
latitude/longitude pairs. The most direct way is to convert each
latitude/longitude pair into a unit vector, compute the dot product, take the
arccos and multiply by the radius of the Earth. This requires 9 trigonometric
operations and relies on the arccos of a number close to 1, which is inaccurate.
One could instead compute the arcsin of the magnitude of the cross-product, but
this requires even more computation. I want something simpler:
#+BEGIN_EXAMPLE
dist = Rearth * angle
cos(angle) = dot(v0,v1) = dot( (cos(lon0)*cos(lat0), sin(lon0)*cos(lat0), sin(lat0)),
(cos(lon1)*cos(lat1), sin(lon1)*cos(lat1), sin(lat1)) ) =
= cos(lat0)*cos(lat1) * ( cos(lon0)*cos(lon1) + sin(lon0)*sin(lon1) ) +
sin(lat0)*sin(lat1) =
= cos(lat0)*cos(lat1) * cos(diff_lon) + sin(lat0)*sin(lat1)
cos(diff_lon) ~ 1 - diff_lon^2/2 so
cos(angle) = cos(lat0)*cos(lat1) + sin(lat0)*sin(lat1) - diff_lon^2/2*cos(lat0)*cos(lat1) =
= cos(diff_lat) - cos(lat0)*cos(lat1)*diff_lon^2/2 ~
~ 1 - diff_lat^2/2 - diff_lon^2/2*cos(lat0)*cos(lat1)
cos(angle) ~ 1 - angle^2/2, so
angle^2 ~ diff_lat^2 + diff_lon^2*cos(lat0)*cos(lat1)
angle ~ sqrt(diff_lat^2 + diff_lon^2 * cos(lat0)*cos(lat1))
#+END_EXAMPLE
This is nice and simple. Is it sufficiently accurate? This python script tests
it:
#+BEGIN_SRC python
import numpy as np
lat0,lon0 = 34.0690448,-118.292924 # 3rd/New Hampshire
lat1,lon1 = 33.93,-118.4314 # LAX
lat0,lon0,lat1,lon1 = [x * np.pi/180.0 for x in lat0,lon0,lat1,lon1]
Rearth = 6371000
v0 = np.array((np.cos(lat0)*np.cos(lon0), np.cos(lat0)*np.sin(lon0),np.sin(lat0)))
v1 = np.array((np.cos(lat1)*np.cos(lon1), np.cos(lat1)*np.sin(lon1),np.sin(lat1)))
dist_accurate = np.sqrt( (lat0-lat1)**2 + (lon0-lon1)**2 * np.cos(lat0)*np.cos(lat1) ) * Rearth
dist_approx = np.arccos(np.inner(v0,v1)) * Rearth
print dist_accurate
print dist_approx
print dist_accurate - dist_approx
#+END_SRC
Between Koreatown and LAX there's quite a bit of difference in both latitude and
longitude. Both methods say the distance is about 20km, with a disagreement of
3mm. This is plenty good enough.
* Results
I want to find the least convenient location from the intersection of New
Hampshire and 3rd street in Los Angeles within 20 miles or so.
The output of =compute= is sorted by road distance from the start. I prepend the
coefficient of inconvenience, re-sort the list and take 50 most inconvenient
locations by invoking
#+BEGIN_EXAMPLE