An open API service indexing awesome lists of open source software.

https://github.com/iansinnott/url-spider


https://github.com/iansinnott/url-spider

Last synced: about 1 year ago
JSON representation

Awesome Lists containing this project

README

          

* URL Spider

A simple Script to spider all the URLs for a given domain _and_ its subdomains.

Example:

#+BEGIN_SRC shell
yarn start:run 'https://iansinnott.com'
#+END_SRC

Will spider all the URLs at my site as well as all URLs at =blog.iansinnott.com=, =lab.iansinnott.com=, etc. URLs to external sites will be skipped.

Once the script runs it will dup all the information to a temp file. The location on your system will depend on the built-in =mktemp= util.

** Usage

#+BEGIN_SRC shell
yarn start:run
#+END_SRC`

Will spider the == and all its subdomains.

* FIXME

This script does no stream processing. In other words, it will quite happily eat up all the JS heap memory if the site you're spidering has many URLs.