Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/sammwyy/craw
a website-crawler library for nodejs
https://github.com/sammwyy/craw
crawler crawlers html javascript library node nodejs nodejs-module npm npm-module parser spider website
Last synced: 19 days ago
JSON representation
a website-crawler library for nodejs
- Host: GitHub
- URL: https://github.com/sammwyy/craw
- Owner: sammwyy
- Created: 2020-08-20T17:50:22.000Z (over 4 years ago)
- Default Branch: master
- Last Pushed: 2020-08-20T18:22:19.000Z (over 4 years ago)
- Last Synced: 2024-12-24T20:36:37.052Z (about 1 month ago)
- Topics: crawler, crawlers, html, javascript, library, node, nodejs, nodejs-module, npm, npm-module, parser, spider, website
- Language: JavaScript
- Homepage:
- Size: 12.7 KB
- Stars: 1
- Watchers: 3
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# CRAW
a website-crawler library for nodejs## Documentation
Documentation of the library in a summarized and precise way.### Usage
```javascript
const craw = require('craw');async function start () {
const result = await craw("https://2lstudios.dev/");
console.log(result.toJSON());
}start();
```### result.getContent()
Get the content of the website as headers, paragraphs, paragraphs and all the text in general.
Output:
```javascript
{
text: "....", // String
h1: [], // Array
h2: []. // Array
h3: [], // Array
h4: [], // Array
h5: [], // Array
h6: [], // Array
words: [] // Array
}
```### result.getFrames()
Get a list with iframes from the website.
Output:
```javascript
[...] // Array
```### result.getImports()
Get a list of imports from the website. (like css, favicon and js)
Output:
```javascript
{
scripts: [ // Array
{
integrity: "...", // String
src: "...", // String
async: ... // Boolean
}
],styles: [ // Array
{
integrity: "...", // String
href: "...", // String
rel: "..." // String
}
],
favicon: {
type: "...", // String
href: "..." // String
}
}
```### result.getLinks()
Get a list of hyperlinks from the website.
Output:
```javascript
[ // Array
{
url: "...", // String
anchor: "...", // String
rel: [ ... ] // Array of Strings
}
]
```### result.getMedia()
Get a list of multimedia elements from the website. (Like images, audios and videos)
Output:
```javascript
{
audios: [ // Array
{
src: "...", // String
type: "..." // String
}
],
images: [ // Array
{
src: "...", // String
alt: "...", // String
loading: "..." // String
}
],
videos: [ ... ] // Array of strings
}
```### result.getMeta()
Get a list of metadata tags from the website.
Output:
```javascript
{
author: "...", // String
viewport: "...", // String
robots: "...", // String
description: "...", // String
keywords: [], // Array of strings
image: "...", // String (Favicon)
charset: "...", // String
... any other metadata tag like OG or Twitter ...
}
```### result.getTitle()
Get the title of the website.
Output:
```javascript
"..." // String
```### result.toJSON()
Run all functions and add the results of each one in the same object.