https://github.com/equalitie/ouicrawl
Extending Ouinet to utilize Webrecorder techniques for crawling and archiving entire websites.
https://github.com/equalitie/ouicrawl
Last synced: 18 days ago
JSON representation
Extending Ouinet to utilize Webrecorder techniques for crawling and archiving entire websites.
- Host: GitHub
- URL: https://github.com/equalitie/ouicrawl
- Owner: equalitie
- License: gpl-3.0
- Created: 2022-05-10T20:43:36.000Z (about 4 years ago)
- Default Branch: main
- Last Pushed: 2022-05-13T00:41:02.000Z (about 4 years ago)
- Last Synced: 2025-02-23T00:44:03.082Z (over 1 year ago)
- Language: JavaScript
- Size: 14.6 KB
- Stars: 2
- Watchers: 8
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
## Ouicrawl
## Browser-Based crawling through Ouinet
This repository contains crawling driver for [Browsertrix Crawler](https://github.com/webrecorder/browsertrix-crawler) to allow for crawling through the Ouinet client
### Running a crawl
Assuming the Ouinet client is running an HTTP/S proxy on `localhost:8077` via Docker host network mode, a crawl of `` can be started by running Browsertrix Crawler and specifying the `ouinet-crawl.js` in this repo as the driver. Other Browsertrix Crawler flags can be added as needed.
```
docker run --network host -v $PWD/:/config -i -e PROXY_HOST=localhost -e PROXY_PORT=8077 webrecorder/browsertrix-crawler:latest crawl --driver /config/ouinet-crawl.js --url
```