Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/sammwyy/craw

a website-crawler library for nodejs
https://github.com/sammwyy/craw

crawler crawlers html javascript library node nodejs nodejs-module npm npm-module parser spider website

Last synced: 19 days ago
JSON representation

a website-crawler library for nodejs

Awesome Lists containing this project

README

        

# CRAW
a website-crawler library for nodejs

## Documentation
Documentation of the library in a summarized and precise way.

### Usage
```javascript
const craw = require('craw');

async function start () {
const result = await craw("https://2lstudios.dev/");
console.log(result.toJSON());
}

start();
```

### result.getContent()
Get the content of the website as headers, paragraphs, paragraphs and all the text in general.
Output:
```javascript
{
text: "....", // String
h1: [], // Array
h2: []. // Array
h3: [], // Array
h4: [], // Array
h5: [], // Array
h6: [], // Array
words: [] // Array
}
```

### result.getFrames()
Get a list with iframes from the website.
Output:
```javascript
[...] // Array
```

### result.getImports()
Get a list of imports from the website. (like css, favicon and js)
Output:
```javascript
{
scripts: [ // Array
{
integrity: "...", // String
src: "...", // String
async: ... // Boolean
}
],

styles: [ // Array
{
integrity: "...", // String
href: "...", // String
rel: "..." // String
}
],

favicon: {
type: "...", // String
href: "..." // String
}
}
```

### result.getLinks()
Get a list of hyperlinks from the website.
Output:
```javascript
[ // Array
{
url: "...", // String
anchor: "...", // String
rel: [ ... ] // Array of Strings
}
]
```

### result.getMedia()
Get a list of multimedia elements from the website. (Like images, audios and videos)
Output:
```javascript
{
audios: [ // Array
{
src: "...", // String
type: "..." // String
}
],
images: [ // Array
{
src: "...", // String
alt: "...", // String
loading: "..." // String
}
],
videos: [ ... ] // Array of strings
}
```

### result.getMeta()
Get a list of metadata tags from the website.
Output:
```javascript
{
author: "...", // String
viewport: "...", // String
robots: "...", // String
description: "...", // String
keywords: [], // Array of strings
image: "...", // String (Favicon)
charset: "...", // String
... any other metadata tag like OG or Twitter ...
}
```

### result.getTitle()
Get the title of the website.
Output:
```javascript
"..." // String
```

### result.toJSON()
Run all functions and add the results of each one in the same object.