https://github.com/strugee/https-crawler
Build a SQLite database of per-page HTTPS support
https://github.com/strugee/https-crawler
Last synced: 4 months ago
JSON representation
Build a SQLite database of per-page HTTPS support
- Host: GitHub
- URL: https://github.com/strugee/https-crawler
- Owner: strugee
- License: lgpl-3.0
- Created: 2017-10-08T20:56:10.000Z (over 8 years ago)
- Default Branch: master
- Last Pushed: 2022-08-30T23:58:35.000Z (almost 4 years ago)
- Last Synced: 2025-10-10T08:41:02.042Z (9 months ago)
- Language: JavaScript
- Size: 68.4 KB
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 10
-
Metadata Files:
- Readme: README.md
- License: COPYING
Awesome Lists containing this project
README
# https-crawler
Build a SQLite database of per-page HTTPS support
Originally inspired by https://securethe.news/ and built to crawl the University of Rochester's internal service pages, but can be used for any website.
## Why?
I go to the University of Rochester. The University's pages are somewhat inconsistent in their support for HTTPS, which is unfortunate since I access internal services on `*.rochester.edu` all the time. So I built this to comprehensively evaluate their support for HTTPS.
This crawler is designed to create a comprehensive dataset that can be used for further analysis. It does this on a per-page basis, not per-domain, because sometimes different people are responsible for running different pages under the same (sub)domain, so HTTPS support varies. Also because on other websites administrators will often choose to only protect e.g. login pages with HTTPS, which is a Bad Idea™. So you want to be able to find out about that.
I will probably also build out better analysis tools, eventually.
## Author
AJ Jordan
## License
GPL 3.0 or later