https://github.com/sammwyy/craw

a website-crawler library for nodejs
https://github.com/sammwyy/craw

crawler crawlers html javascript library node nodejs nodejs-module npm npm-module parser spider website

Last synced: 8 months ago
JSON representation

a website-crawler library for nodejs

Host: GitHub
URL: https://github.com/sammwyy/craw
Owner: sammwyy
Created: 2020-08-20T17:50:22.000Z (about 5 years ago)
Default Branch: master
Last Pushed: 2020-08-20T18:22:19.000Z (about 5 years ago)
Last Synced: 2025-01-17T14:54:38.792Z (9 months ago)
Topics: crawler, crawlers, html, javascript, library, node, nodejs, nodejs-module, npm, npm-module, parser, spider, website
Language: JavaScript
Homepage:
Size: 12.7 KB
Stars: 1
Watchers: 3
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

          # CRAW

a website-crawler library for nodejs

## Documentation

Documentation of the library in a summarized and precise way.  

### Usage

```javascript

const craw = require('craw');

async function start () {

  const result = await craw("https://2lstudios.dev/");

  console.log(result.toJSON());

}

start();

```

### result.getContent()

Get the content of the website as headers, paragraphs, paragraphs and all the text in general.  

Output:

```javascript

{

  text: "....", // String

  h1: [], // Array

  h2: []. // Array

  h3: [], // Array

  h4: [], // Array

  h5: [], // Array

  h6: [], // Array

  words: [] // Array

}

```

### result.getFrames()

Get a list with iframes from the website.  

Output:

  ```javascript

  [...]  // Array

```

### result.getImports()

Get a list of imports from the website. (like css, favicon and js)  

Output:

```javascript

{

  scripts: [ // Array

    {

      integrity: "...", // String

      src: "...", // String

      async: ... // Boolean

    }

  ],

  styles: [ // Array

    {

      integrity: "...", // String

      href: "...", // String

      rel: "..." // String

    }

  ],

  

  favicon: {

    type: "...", // String

    href: "..." // String 

  }

}

```

### result.getLinks()

Get a list of hyperlinks from the website.  

Output:

```javascript

[ // Array

  {

    url: "...", // String

    anchor: "...", // String

    rel: [ ... ] // Array of Strings

  }

]

```

### result.getMedia()

Get a list of multimedia elements from the website. (Like images, audios and videos)  

Output:

```javascript

{

  audios: [ // Array

    {

      src: "...", // String

      type: "..." // String

    }

  ],

  images: [ // Array

    {

      src: "...", // String

      alt: "...", // String

      loading: "..." // String

    }

  ],

  videos: [ ... ] // Array of strings

}

```

### result.getMeta()

Get a list of metadata tags from the website.  

Output:

```javascript

{

  author: "...", // String

  viewport: "...", // String

  robots: "...", // String

  description: "...", // String

  keywords: [], // Array of strings

  image: "...", // String (Favicon)

  charset: "...", // String

  ... any other metadata tag like OG or Twitter ...

}

```

### result.getTitle()

Get the title of the website.  

Output:

```javascript

"..." // String	

```

### result.toJSON()

Run all functions and add the results of each one in the same object.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/sammwyy/craw

Awesome Lists containing this project

README