https://github.com/strugee/https-crawler

Build a SQLite database of per-page HTTPS support
https://github.com/strugee/https-crawler

Last synced: 4 months ago
JSON representation

Build a SQLite database of per-page HTTPS support

Host: GitHub
URL: https://github.com/strugee/https-crawler
Owner: strugee
License: lgpl-3.0
Created: 2017-10-08T20:56:10.000Z (over 8 years ago)
Default Branch: master
Last Pushed: 2022-08-30T23:58:35.000Z (almost 4 years ago)
Last Synced: 2025-10-10T08:41:02.042Z (9 months ago)
Language: JavaScript
Size: 68.4 KB
Stars: 1
Watchers: 1
Forks: 0
Open Issues: 10
Metadata Files:
- Readme: README.md
- License: COPYING

Awesome Lists containing this project

README

# https-crawler

Build a SQLite database of per-page HTTPS support

Originally inspired by https://securethe.news/ and built to crawl the University of Rochester's internal service pages, but can be used for any website.

## Why?

I go to the University of Rochester. The University's pages are somewhat inconsistent in their support for HTTPS, which is unfortunate since I access internal services on `*.rochester.edu` all the time. So I built this to comprehensively evaluate their support for HTTPS.

This crawler is designed to create a comprehensive dataset that can be used for further analysis. It does this on a per-page basis, not per-domain, because sometimes different people are responsible for running different pages under the same (sub)domain, so HTTPS support varies. Also because on other websites administrators will often choose to only protect e.g. login pages with HTTPS, which is a Bad Idea™. So you want to be able to find out about that.

I will probably also build out better analysis tools, eventually.

## Author

AJ Jordan

## License

GPL 3.0 or later

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/strugee/https-crawler

Awesome Lists containing this project

README