https://github.com/lewisakura/spiderboi

A web crawling library written in TypeScript.
https://github.com/lewisakura/spiderboi

spider typescript typescript3 web-crawler web-crawling web-spider webcrawler

Last synced: about 1 year ago
JSON representation

A web crawling library written in TypeScript.

Host: GitHub
URL: https://github.com/lewisakura/spiderboi
Owner: lewisakura
Created: 2019-02-21T15:20:18.000Z (over 7 years ago)
Default Branch: master
Last Pushed: 2023-01-03T16:35:09.000Z (over 3 years ago)
Last Synced: 2025-04-11T10:00:32.644Z (about 1 year ago)
Topics: spider, typescript, typescript3, web-crawler, web-crawling, web-spider, webcrawler
Language: TypeScript
Size: 376 KB
Stars: 7
Watchers: 1
Forks: 1
Open Issues: 12
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

          # Spiderboi

[![NPM](https://nodei.co/npm/spiderboi.png?downloads=true&downloadRank=true&stars=true)](https://nodei.co/npm/spiderboi/)

A web crawling library written in TypeScript.

# Example

```typescript

import Crawler from 'spiderboi';

async function run() {

    const crawler = new Crawler('https://google.com');

    // this gets the site's robots.txt so that the crawler can respect it

    await crawler.readyUp();

    const out = await crawler.crawl('/search/about');

    console.log(out);

}

run();

/**

 * above code should output:

 * [ 'https://google.com/search/about/',

 * 'https://google.com/search/about/',

 * 'https://google.com/#app-store',

 * 'https://google.com/#app-store',

 * 'https://google.com/#image-texts' ]

 * 

 * unless of course google changes the /search/about page and ruins this example.

 */

```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/lewisakura/spiderboi

Awesome Lists containing this project

README