https://github.com/innovativeinventor/mitmscrape
A tool to filter network resources of sites, with an emphasis on js-heavy sites using mitmproxy and selenium. Mostly used to scrape Secretaries of State websites for raw election data.
https://github.com/innovativeinventor/mitmscrape
Last synced: over 1 year ago
JSON representation
A tool to filter network resources of sites, with an emphasis on js-heavy sites using mitmproxy and selenium. Mostly used to scrape Secretaries of State websites for raw election data.
- Host: GitHub
- URL: https://github.com/innovativeinventor/mitmscrape
- Owner: InnovativeInventor
- Created: 2020-11-24T22:02:22.000Z (over 5 years ago)
- Default Branch: main
- Last Pushed: 2021-02-28T05:18:38.000Z (over 5 years ago)
- Last Synced: 2025-01-09T07:20:47.868Z (over 1 year ago)
- Language: Python
- Homepage:
- Size: 6.99 MB
- Stars: 0
- Watchers: 3
- Forks: 0
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
## mitmscrape
A little utility designed to scrape and find resources fetched on a JS-heavy site.
## Setup
To set this up, get the latest chrome(ium) driver for your version of Google Chrome/chromium: https://chromedriver.chromium.org/downloads and unzip it.
Then, rename the driver to be `chromedriver`.
## Usage
Enviroment setup
```bash
poetry shell
```
Running
```bash
python3 mitmscrape.py [url] [recursion_depth]
```
Filtering urls (needs `ripgrep`)
```bash
cat urls.list | rg "\.json"
```
## Example usage
```
python3 mitmscrape.py https://results.enr.clarityelections.com/GA/105369 2
cat urls.list | rg "\.json"
```