https://github.com/dkogan/culdesacs

Last synced: 13 days ago
JSON representation
Host: GitHub
URL: https://github.com/dkogan/culdesacs
Owner: dkogan
Created: 2015-08-13T03:25:19.000Z (almost 11 years ago)
Default Branch: master
Last Pushed: 2015-08-16T08:10:45.000Z (almost 11 years ago)
Last Synced: 2026-02-11T02:25:37.892Z (5 months ago)
Language: C
Size: 137 KB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.org
Awesome Lists containing this project

README

          #+OPTIONS: tex:dvipng

* Reference

This is all described better (with pictures!) in a blog post:

http://notes.secretsauce.net/notes/2015/08/16_least-convenient-location-in-los-angeles-from-koreatown.html

* Overview

Talking to a friend, a question came up about finding the point in LA's road

network that's most inconvenient to get to, with /inconvenient/ being a vague

notion describing a closed residential neighborhood full of dead ends; the

furthest of these dead ends would be most inconvenient indeed. This repository

attempts to answer that question.

I want /inconvenient/ to mean

#+BEGIN_QUOTE

Furthest to reach via the road network, but nearest as-the-crow-flies.

#+END_QUOTE

Note that this type of metric is not a universal one, but is relative to a

particular starting point. This makes sense, however: a location that's

inconvenient from one location could be very convenient from another.

This metric could be expressed in many ways. I keep it simple, and compute a

relative inefficiency coefficient:

=(d_road - d_direct) / d_direct=

Thus the goal is to find a location within a given radius of the starting point

that maximizes this relative inefficiency.

* Approach

I use [[http://www.openstreetmap.org][OpenStreetMap]] for the road data. This is all aimed at bicycling, so I'm

looking at all roads except freeways and ones marked private. I /am/ looking at

footpaths, trails, etc.

Once I have the road network, I run [[https://en.wikipedia.org/wiki/Dijkstra's_algorithm][Dijkstra's Algorithm]] to compute the shortest

path from my starting point to every other point on the map. Then I can easily

compute the inefficiency for each such point, and pick the point with the

highest inefficiency. I use OSM nodes as the "points". It is possible that the

location I'm looking for is inbetween a pair of nodes, but the nodes will really

be close enough. Also, the "distance" between adjacent nodes can take into

account terrain type, elevation, road type and so on. I ignore all that, and

simply look at the distance.

* Implementation

Each step in the process lives in its own program. This simplifies

implementation and makes it easy to work on each piece separately.

** Data import

First I query OSM. This is done with the =query.pl= script. It takes in the

center point and the query radius. The query uses the [[http://wiki.openstreetmap.org/wiki/Overpass_API/Overpass_QL][OSM Overpass query

language]]. I use this simple query, filling in the center point and radius:

#+BEGIN_EXAMPLE

[out:json];

way

 ["highway"]

 ["highway" !~ "motorway|motorway_link" ]

 ["access" !~ "private" ]

 ["access" !~ "no" ]

 (around:$rad,$lat,$lon);

(._;>;);

out;

#+END_EXAMPLE

Sample invocation:

#+BEGIN_EXAMPLE

$ ./query.pl --center 34.0690448,-118.292924 --rad 20miles

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current

                                 Dload  Upload   Total   Spent    Left  Speed

100  .....

$ ls -lhrt *(om[1])

-rw-r--r-- 1 dima dima 81M Aug 14 00:44 query_34.0690448_-118.292924_20miles.json

#+END_EXAMPLE

** Data massaging

Now I need to take the OSM query results, and manipulate them into a form

readable by the Dijkstra's algorithm solver. This is done by the

=massage_input.pl= script. This script does nothing interesting, but it doesn it

inefficiently, so it's CPU and RAM-hungry and takes a few minutes. Sample

invocation:

#+BEGIN_EXAMPLE

$ ./massage_input.pl query_34.0690448_-118.292924_20miles.json > query_34.0690448_-118.292924_20miles.net

#+END_EXAMPLE

*** Neighbor list representation

An implementation choice here was how to represent the neighbor list for a node.

I want the main computation (next section) to be able to query this very

quickly, and I don't want the list to take much space, and I don't want to

fragment my memory with many small allocations. Thus I have a single contiguous

array of integers =neighbor_pool=. Each node has a single integer index into

this pool. At this index the =neighbor_pool= contains a list of node indices

that are neighbors of the node in question. A special node index of -1 signifies

the end of the neighbor list for that node.

** Inefficiency coefficient computation

I now feed the massaged data to Dijkstra's algorithm implemented in =compute.c=.

I need a priority queue where elements can be inserted, removed and updated.

Apparently most heap implementations don't have an 'update' mechanism, so it

took a little while to find a working one. I ended up using [[https://en.wikipedia.org/wiki/B-heap][phk's b-heap]]

implementation from the [[https://www.varnish-cache.org/trac/browser/lib/libvarnish/binary_heap.c][varnish source tree]]. It stores arbitrary pointers

(64-bit on my box); 32-bit indices into a pool would be more efficient, but this

is fast enough.

Sample invocation:

#+BEGIN_EXAMPLE

$ ./compute < query_34.0690448_-118.292924_20miles.net > query_34.0690448_-118.292924_20miles.out

$ head -n 2 query_34.0690448_-118.292924_20miles.out

34.069046 -118.292923 0.000000 0.000000

34.070034 -118.292931 109.863564 109.863564

#+END_EXAMPLE

The output is all nodes, sorted by the road distance to the node. The columns

are lat,lon,d_road,d_direct.

*** Distance from latitude/longitude pairs

One implementation note here is how to compute the distance between two

latitude/longitude pairs. The most direct way is to convert each

latitude/longitude pair into a unit vector, compute the dot product, take the

arccos and multiply by the radius of the Earth. This requires 9 trigonometric

operations and relies on the arccos of a number close to 1, which is inaccurate.

One could instead compute the arcsin of the magnitude of the cross-product, but

this requires even more computation. I want something simpler:

#+BEGIN_EXAMPLE

dist = Rearth * angle

cos(angle) = dot(v0,v1) = dot( (cos(lon0)*cos(lat0), sin(lon0)*cos(lat0), sin(lat0)),

                               (cos(lon1)*cos(lat1), sin(lon1)*cos(lat1), sin(lat1)) ) =

           = cos(lat0)*cos(lat1) * ( cos(lon0)*cos(lon1) + sin(lon0)*sin(lon1) ) +

             sin(lat0)*sin(lat1) =

           = cos(lat0)*cos(lat1) * cos(diff_lon) + sin(lat0)*sin(lat1)

cos(diff_lon) ~ 1 - diff_lon^2/2 so

cos(angle) = cos(lat0)*cos(lat1) + sin(lat0)*sin(lat1) - diff_lon^2/2*cos(lat0)*cos(lat1) =

           = cos(diff_lat) - cos(lat0)*cos(lat1)*diff_lon^2/2 ~

           ~ 1 - diff_lat^2/2 - diff_lon^2/2*cos(lat0)*cos(lat1)

cos(angle) ~ 1 - angle^2/2, so

angle^2 ~ diff_lat^2 + diff_lon^2*cos(lat0)*cos(lat1)

angle ~ sqrt(diff_lat^2 + diff_lon^2 * cos(lat0)*cos(lat1))

#+END_EXAMPLE

This is nice and simple. Is it sufficiently accurate? This python script tests

it:

#+BEGIN_SRC python

import numpy as np

lat0,lon0 = 34.0690448,-118.292924  # 3rd/New Hampshire

lat1,lon1 = 33.93,-118.4314         # LAX

lat0,lon0,lat1,lon1 = [x * np.pi/180.0 for x in lat0,lon0,lat1,lon1]

Rearth = 6371000

v0 = np.array((np.cos(lat0)*np.cos(lon0), np.cos(lat0)*np.sin(lon0),np.sin(lat0)))

v1 = np.array((np.cos(lat1)*np.cos(lon1), np.cos(lat1)*np.sin(lon1),np.sin(lat1)))

dist_accurate = np.sqrt( (lat0-lat1)**2 + (lon0-lon1)**2 * np.cos(lat0)*np.cos(lat1) ) * Rearth

dist_approx   = np.arccos(np.inner(v0,v1)) * Rearth

print dist_accurate

print dist_approx

print dist_accurate - dist_approx

#+END_SRC

Between Koreatown and LAX there's quite a bit of difference in both latitude and

longitude. Both methods say the distance is about 20km, with a disagreement of

3mm. This is plenty good enough.

* Results

I want to find the least convenient location from the intersection of New

Hampshire and 3rd street in Los Angeles within 20 miles or so.

The output of =compute= is sorted by road distance from the start. I prepend the

coefficient of inconvenience, re-sort the list and take 50 most inconvenient

locations by invoking

#+BEGIN_EXAMPLE
ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/dkogan/culdesacs

Awesome Lists containing this project

README