Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/gabceb/node-metainspector

Node npm for web scraping purposes. It scrapes a given URL, and returns you its title, meta description, meta keywords, an array with all the links, all the images in it, etc. Inspired by the metainspector Ruby gem
https://github.com/gabceb/node-metainspector

Last synced: 3 months ago
JSON representation

Host: GitHub
URL: https://github.com/gabceb/node-metainspector
Owner: gabceb
License: mit
Created: 2013-04-23T19:30:54.000Z (over 11 years ago)
Default Branch: master
Last Pushed: 2019-02-16T14:25:26.000Z (over 5 years ago)
Last Synced: 2024-04-15T00:18:45.829Z (7 months ago)
Language: JavaScript
Size: 151 KB
Stars: 129
Watchers: 6
Forks: 52
Open Issues: 20
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

        ![status](https://secure.travis-ci.org/gabceb/node-metainspector.png?branch=master)

## Node-Metainspector

MetaInspector is an npm package for web scraping purposes. You give it an URL, and it lets you easily get its title, links, images, description, keywords, meta tags....

Metainspector is inspired by the Metainspector gem by [jaimeiniesta](https://github.com/jaimeiniesta/metainspector)

This version requires node v6 or higher, as some dependencies make use of various bits of ES6 functionality. The 1.x.x versions are compatible with v0.x - v4 releases of node, and should be used instead for older applications.

### Scraped data

```

client.url                  # URL of the page

client.scheme               # Scheme of the page (http, https)

client.host                 # Hostname of the page (like, markupvalidator.com, without the scheme)

client.rootUrl              # Root url (scheme + host, i.e http://simple.com/)

client.title                # title of the page, as string

client.links                # array of strings, with every link found on the page as an absolute URL

client.author               # page author, as string

client.keywords             # keywords from meta tag, as array

client.charset              # page charset from meta tag, as string

client.description          # returns the meta description, or the first long paragraph if no meta description is found

client.image                # Most relevant image, if defined with og:image

client.images               # array of strings, with every img found on the page as an absolute URL

client.feeds                # Get rss or atom links in meta data fields as array

client.ogTitle              # opengraph title

client.ogDescription        # opengraph description

client.ogType               # Open Graph Object Type

client.ogUpdatedTime        # Open Graph Updated Time

client.ogLocale             # Open Graph Locale - for languages

```

### Options

```

timeout - Defines the time Metainspector will wait for the url to respond in ms

maxRedirects - Specifies the number of redirects Metainspector will follow

limit - The limit in the number of bytes Metainspector will download when querying a site

```

## Usage

```javascript

var MetaInspector = require('node-metainspector');

var client = new MetaInspector("http://www.google.com", { timeout: 5000 });

client.on("fetch", function(){

    console.log("Description: " + client.description);

    console.log("Links: " + client.links.join(","));

});

client.on("error", function(err){

	console.log(err);

});

client.fetch();

```

## TO DO

Finish implementation of the properties below:

```

Add absolutify url function to return all urls as an absolute url

client.internal_links     	# array of strings, with every internal link found on the page as an absolute URL

client.external_links     	# array of strings, with every external link found on the page as an absolute URL

```

## ZOMG Fork! Thank you!

You're welcome to fork this project and send pull requests. Just remember to include tests.

Copyright (c) 2009-2012 Gabriel Cebrian, released under the MIT license

[![Bitdeli Badge](https://d2weczhvl823v0.cloudfront.net/gabceb/node-metainspector/trend.png)](https://bitdeli.com/free "Bitdeli Badge")