https://github.com/iansinnott/url-spider
https://github.com/iansinnott/url-spider
Last synced: about 1 year ago
JSON representation
- Host: GitHub
- URL: https://github.com/iansinnott/url-spider
- Owner: iansinnott
- Created: 2020-09-01T04:25:18.000Z (almost 6 years ago)
- Default Branch: master
- Last Pushed: 2020-09-09T03:47:59.000Z (almost 6 years ago)
- Last Synced: 2025-06-04T23:19:20.878Z (about 1 year ago)
- Language: TypeScript
- Homepage:
- Size: 13.7 KB
- Stars: 1
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: readme.org
Awesome Lists containing this project
README
* URL Spider
A simple Script to spider all the URLs for a given domain _and_ its subdomains.
Example:
#+BEGIN_SRC shell
yarn start:run 'https://iansinnott.com'
#+END_SRC
Will spider all the URLs at my site as well as all URLs at =blog.iansinnott.com=, =lab.iansinnott.com=, etc. URLs to external sites will be skipped.
Once the script runs it will dup all the information to a temp file. The location on your system will depend on the built-in =mktemp= util.
** Usage
#+BEGIN_SRC shell
yarn start:run
#+END_SRC`
Will spider the == and all its subdomains.
* FIXME
This script does no stream processing. In other words, it will quite happily eat up all the JS heap memory if the site you're spidering has many URLs.