{"id":15433170,"url":"https://github.com/simonw/irma-scrapers","last_synced_at":"2025-04-19T17:53:04.907Z","repository":{"id":66508119,"uuid":"102991243","full_name":"simonw/irma-scrapers","owner":"simonw","description":"Screen scrapers relating to natural disasters. See their output in https://github.com/simonw/disaster-data/","archived":false,"fork":false,"pushed_at":"2023-05-22T21:43:19.000Z","size":65,"stargazers_count":11,"open_issues_count":3,"forks_count":6,"subscribers_count":4,"default_branch":"master","last_synced_at":"2024-10-18T07:54:17.693Z","etag":null,"topics":["civic-hacking","git-scraping","irma-response","scraper","slack"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/simonw.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2017-09-09T23:34:29.000Z","updated_at":"2022-12-01T17:48:36.000Z","dependencies_parsed_at":"2024-10-20T20:19:28.666Z","dependency_job_id":null,"html_url":"https://github.com/simonw/irma-scrapers","commit_stats":{"total_commits":73,"total_committers":3,"mean_commits":"24.333333333333332","dds":0.0273972602739726,"last_synced_commit":"a08af3d6c1ef012d0b1c95fb2a54f26ce87ff097"},"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/simonw%2Firma-scrapers","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/simonw%2Firma-scrapers/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/simonw%2Firma-scrapers/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/simonw%2Firma-scrapers/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/simonw","download_url":"https://codeload.github.com/simonw/irma-scrapers/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":249753127,"owners_count":21320667,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["civic-hacking","git-scraping","irma-response","scraper","slack"],"created_at":"2024-10-01T18:32:17.593Z","updated_at":"2025-04-19T17:53:04.901Z","avatar_url":"https://github.com/simonw.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# irma-scrapers\n\nScreen scrapers relating to hurricane Irma. See their output in\nhttps://github.com/simonw/disaster-data/\n\n## Irma Response\n\nThe Irma Response project at https://www.irmaresponse.org/ is a team of\nvolunteers working together to make information available during and after the\nstorm. There is a huge amount of information out there, on many different\nwebsites. The Irma API at https://irma-api.herokuapp.com/ is an attempt to\ngather key information in one place, verify it and publish it in a reuseable\nway.\n\nTo aid this effort, I've built a collection of screen scrapers that pull data\nfrom a number of different websites and APIs. That data is then stored in a\nGit repository, providing a clear history of changes made to the various\nsources that are being tracked.\n\nSome of the scrapers also publish their findings to Slack in a format designed\nto make it obvious when key events happen, such as new shelters being added or\nremoved from public listings.\n\n## Tracking changes over time\n\nA key goal of this screen scraping mechanism is to allow changes to the\nunderlying data sources to be tracked over time. This is achieved using git,\nvia the GitHub API. Each scraper pulls down data from a source (an API or a\nwebsite) and reformats that data into a sanitized JSON format. That JSON is\nthen written to the git repository. If the data has changed since the last\ntime the scraper ran, those changes will be captured by git and made available\nin the commit log.\n\nRecent changes tracked by the scraper collection can be seen here:\nhttps://github.com/simonw/disaster-data/commits/master\n\n## Generating useful commit messages\n\nThe most complex code for most of the scrapers isn't in fetching the data:\nit's in generating useful, human-readable commit messages that summarize the\nunderlying change. For example, here is a commit message generated by the\nscraper that tracks the http://www.floridadisaster.org/shelters/summary.aspx\npage:\n\n    florida-shelters.json: 2 shelters added\n\n    Added shelter: Atwater Elementary School (Sarasota County)\n    Added shelter: DEBARY ELEMENTARY SCHOOL (Volusia County)\n    Change detected on http://www.floridadisaster.org/shelters/summary.aspx\n\nThe full commit also shows the changes to the underlying JSON, but the human-\nreadable message provides enough information that people who are not JSON-\nliterate programmers can still derive value from the commit.\n\nhttps://github.com/simonw/disaster-data/commit/7919aeff0913ec26d1bea8dc\n\n## Publishing to Slack\n\nThe Irma Response team use Slack to co-ordinate their efforts. You can join\ntheir Slack here: https://irma-response-slack.herokuapp.com/\n\nSome of the scrapers publish detected changes in their data source to Slack,\nas links to the commits generated for each change. The human-readable message\nis posted directly to the channel.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsimonw%2Firma-scrapers","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsimonw%2Firma-scrapers","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsimonw%2Firma-scrapers/lists"}