https://github.com/dkogan/ntsb-wrecks
Plane-crash locations from NTSB reports
https://github.com/dkogan/ntsb-wrecks
Last synced: about 2 months ago
JSON representation
Plane-crash locations from NTSB reports
- Host: GitHub
- URL: https://github.com/dkogan/ntsb-wrecks
- Owner: dkogan
- Created: 2019-01-07T04:59:50.000Z (over 7 years ago)
- Default Branch: master
- Last Pushed: 2019-02-10T09:09:41.000Z (over 7 years ago)
- Last Synced: 2026-02-11T02:25:37.190Z (4 months ago)
- Language: Perl
- Size: 9.11 MB
- Stars: 2
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.org
Awesome Lists containing this project
README
* Synopsis
Check this out!
[[file:wrecks.png]]
Interactive map at
http://ntsb.secretsauce.net/map/
* Overview
This is a set of scripts to massage NTSB incident reports into a visualization
of crash sites. After an aviation incident occurs, the NTSB investigates each
event, and publishes a report. These often contain information about the crash
site location. Sometimes this exists explicitly: as a latitude, longitude pair.
At other times, the report lists the bearing and range to an "observing
facility". This set of tools extracts both of these into a visualize-able form.
** Caveats
The data isn't particularly accurate: most of the locations I know of are in the
right ballpark (within a few miles), but some are completely off. It is,
however, useful as a jumping-off point for further research.
*If you hike to the exact coordinates given, you're unlikely to find anything*
The data I'm visualizing here comes from the post-1982 incident database. A
database of older incidents exists also, but it doesn't appear to contain any
location information. Finally, while civilian incidents are well-covered by this
dataset, I don't know if military ones are covered as well.
* Details
We download some data, massage it, and produce a single data-store that contains
everything we care about. Below, I describe how I made this file, but a
pre-cooked copy is available in this repo: [[file:joint.vnl]] (plain text) or
[[file:joint.sqlite]] (sqlite database). This can be used to visualize specific
areas, as described [[Visualize-able data output][below]].
This all works /today/ (January 2019) on my up-to-date Debian/sid box. The URLs
and tools and such can change at any time, so interpret all this as a set of
loose guidelines. Make adjustments as required.
** NTSB incident database
The main website that describes the reports is this:
https://www.ntsb.gov/_layouts/ntsb.aviation/index.aspx
It points to a repository of databases:
https://app.ntsb.gov/avdata/
We grab the database of events we care about. Today the post-1982 data can be
obtained and extracted thusly:
#+BEGIN_EXAMPLE
$ wget https://app.ntsb.gov/avdata/Access/avall.zip
$ unzip avall.zip
#+END_EXAMPLE
We now have a Microsoft Access database file: =avall.mdb= with all the data. The
things I care about live in the =events= and =aircraft= tables. I
- extract those into an sqlite database
- query the columns I want as csv
- make the latitude and longitude fields numeric (i.e."360148N 1190347W" becomes
"36.03 -119.063055555556")
- convert to a form I can work with: [[http://github.com/dkogan/vnlog][vnlog]]
#+BEGIN_EXAMPLE
$ ( mdb-schema avall.mdb sqlite -T aircraft;
echo "BEGIN;";
mdb-export -I sqlite avall.mdb aircraft;
echo "COMMIT;";
) | sqlite3 data.sqlite
$ ( mdb-schema avall.mdb sqlite -T events;
echo "BEGIN;";
mdb-export -I sqlite avall.mdb events;
echo "COMMIT;";
) | sqlite3 data.sqlite
$ sqlite3 -header -csv data.sqlite \
'select events.ev_id,ev_year,acft_make,acft_model,acft_series,acft_serial_no,latitude,longitude,wx_obs_fac_id,wx_obs_dist,wx_obs_dir from events,aircraft where events.ev_id==aircraft.ev_id;' |
./csv-to-vnl.pl |
vnl-filter --perl \
--sub 'conv{return undef unless defined $_[0]; return $1 + ($2 + $3/60.)/60. if $_[0] =~ /^([0-9]+)([0-9][0-9])([0-9][0-9])[NE]$/i; return -($1 + ($2 + $3/60.)/60.) if $_[0] =~ /^([0-9]+)([0-9][0-9])([0-9][0-9])[SW]$/i; return undef;}' \
-p '!(latitude|longitude),latitude=conv(latitude),longitude=conv(longitude)' \
> data.vnl
#+END_EXAMPLE
Yes, =vnlog= isn't a real database, so lookups will be much slower than if I
continued with sqlite. But many things become simple, and I'm going to be doing
very few lookups, so this is ok.
** Airport codes
Alright. If we're getting the location in reference to some observing airport,
we need to know where each airport is. There're a number of lists on the
internet of airport codes. For this project I'd like to know the location of a
bunch of tiny, obscure airports, so most lists were insufficient. This list [[https://datahub.io/core/airport-codes][here]]
was sufficiently complete. The data is licensed under the Public Domain
Dedication and License (PDDL). I download the dataset, and convert it to a nice
vnlog containing just the data I need
#+BEGIN_EXAMPLE
$ wget https://datahub.io/core/airport-codes/r/airport-codes.csv
$ ./parse-airport-data.pl > airports.vnl
#+END_EXAMPLE
** Putting it all together
And then I add these location fields to the main dataset
#+BEGIN_EXAMPLE
$ vnl-join -a1 --vnl-sort - -j wx_obs_fac_id \
data.vnl \
<(< airports.vnl vnl-filter -p wx_obs_fac_id=code,lat_observing=lat,lon_observing=lon) \
> joint.vnl
#+END_EXAMPLE
I later decided to make a website that can select subsets of the data, so I
really did need fast lookups. So I can convert this vnlog to an sqlite db:
#+BEGIN_EXAMPLE
./vnl-to-sql.pl --real latitude,longitude,lat_observing,lon_observing \
--index latitude,longitude,lat_observing,lon_observing \
< joint.vnl | sqlite3 joint.sqlite
#+END_EXAMPLE
** Visualize-able data output
Now we have all the information, and we run another script to generate a GeoJSON
file that mapping tools can ingest. I cut it down to a small region because web
browsers are glacially slow. To see all the incidents in the San Gabriel
Mountains:
#+BEGIN_EXAMPLE
$ ./make-map.pl 34.08 -118.52 34.54 -117.38 joint.sqlite > wrecks.json
#+END_EXAMPLE
This file is visualized in that interactive map linked above.
* Acknowledgement
Huge thanks to Erik Price and Chris McKenzie for help in getting the interactive
map working. I'm not a web dev.
* License
All code Copyright 2019 Dima Kogan, released under the terms of the Lesser GNU
Public License (any version)