https://github.com/simonw/irma-scrapers

Screen scrapers relating to natural disasters. See their output in https://github.com/simonw/disaster-data/
https://github.com/simonw/irma-scrapers

civic-hacking git-scraping irma-response scraper slack

Last synced: about 1 year ago
JSON representation

Screen scrapers relating to natural disasters. See their output in https://github.com/simonw/disaster-data/

Host: GitHub
URL: https://github.com/simonw/irma-scrapers
Owner: simonw
License: apache-2.0
Created: 2017-09-09T23:34:29.000Z (almost 9 years ago)
Default Branch: master
Last Pushed: 2023-05-22T21:43:19.000Z (about 3 years ago)
Last Synced: 2024-10-18T07:54:17.693Z (over 1 year ago)
Topics: civic-hacking, git-scraping, irma-response, scraper, slack
Language: Python
Homepage:
Size: 63.5 KB
Stars: 11
Watchers: 4
Forks: 6
Open Issues: 3
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

# irma-scrapers

Screen scrapers relating to hurricane Irma. See their output in
https://github.com/simonw/disaster-data/

## Irma Response

The Irma Response project at https://www.irmaresponse.org/ is a team of
volunteers working together to make information available during and after the
storm. There is a huge amount of information out there, on many different
websites. The Irma API at https://irma-api.herokuapp.com/ is an attempt to
gather key information in one place, verify it and publish it in a reuseable
way.

To aid this effort, I've built a collection of screen scrapers that pull data
from a number of different websites and APIs. That data is then stored in a
Git repository, providing a clear history of changes made to the various
sources that are being tracked.

Some of the scrapers also publish their findings to Slack in a format designed
to make it obvious when key events happen, such as new shelters being added or
removed from public listings.

## Tracking changes over time

A key goal of this screen scraping mechanism is to allow changes to the
underlying data sources to be tracked over time. This is achieved using git,
via the GitHub API. Each scraper pulls down data from a source (an API or a
website) and reformats that data into a sanitized JSON format. That JSON is
then written to the git repository. If the data has changed since the last
time the scraper ran, those changes will be captured by git and made available
in the commit log.

Recent changes tracked by the scraper collection can be seen here:
https://github.com/simonw/disaster-data/commits/master

## Generating useful commit messages

The most complex code for most of the scrapers isn't in fetching the data:
it's in generating useful, human-readable commit messages that summarize the
underlying change. For example, here is a commit message generated by the
scraper that tracks the http://www.floridadisaster.org/shelters/summary.aspx
page:

florida-shelters.json: 2 shelters added

Added shelter: Atwater Elementary School (Sarasota County)
Added shelter: DEBARY ELEMENTARY SCHOOL (Volusia County)
Change detected on http://www.floridadisaster.org/shelters/summary.aspx

The full commit also shows the changes to the underlying JSON, but the human-
readable message provides enough information that people who are not JSON-
literate programmers can still derive value from the commit.

https://github.com/simonw/disaster-data/commit/7919aeff0913ec26d1bea8dc

## Publishing to Slack

The Irma Response team use Slack to co-ordinate their efforts. You can join
their Slack here: https://irma-response-slack.herokuapp.com/

Some of the scrapers publish detected changes in their data source to Slack,
as links to the commits generated for each change. The human-readable message
is posted directly to the channel.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/simonw/irma-scrapers

Awesome Lists containing this project

README