https://github.com/edsu/metaweb
get metadata for a web page
https://github.com/edsu/metaweb
Last synced: about 1 year ago
JSON representation
get metadata for a web page
- Host: GitHub
- URL: https://github.com/edsu/metaweb
- Owner: edsu
- License: mit
- Created: 2017-11-18T17:27:20.000Z (over 8 years ago)
- Default Branch: master
- Last Pushed: 2023-07-19T09:12:53.000Z (almost 3 years ago)
- Last Synced: 2024-04-26T10:05:24.953Z (about 2 years ago)
- Language: JavaScript
- Homepage:
- Size: 181 KB
- Stars: 8
- Watchers: 2
- Forks: 1
- Open Issues: 3
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# metaweb
[](https://travis-ci.org/edsu/metaweb)
*metaweb* will extract metadata for a web page. Only metadata for the webpage
itself is extracted, not metadata for items within the page. *metaweb* attempts
to extract common metadata from standard HTML, Twitter Cards and Facebook's
[Open Graph Protocol](http://opengraphprotocol.org/). It is not meant to be
perfect, or adhere to any particular overarching standard, but just to scratch a
particular itch I had at the time. If you've got your own itch to scratch please
add an [issue](https://github.com/edsu/metaweb/issues).
The name metaweb pays homage to one of the more forward looking startups of the
[same name](https://en.wikipedia.org/wiki/Metaweb), who created one of the first
community driven entity databases on the web.
## Install
npm install metaweb
## Command Line
When you install *metaweb* you will get a command line program:
```
% metaweb http://www.washingtonpost.com/wp-srv/special/politics/prism-collection-documents/
{
"url": "http://www.washingtonpost.com/wp-srv/special/politics/prism-collection-documents/",
"canonical": "http://www.washingtonpost.com/wp-srv/special/politics/prism-collection-documents/",
"status": 200,
"content_type": "text/html",
"title": "NSA slides explain the PRISM data-collection program - The Washington Post",
"description": "Through a Top-Secret program authorized by federal judges working under the Foreign Intelligence Surveillance Act (FISA), the U.S. intelligence community can gain access to the servers of nine internet companies for a wide range of digital data. Documents describing the previously undisclosed program, obtained by The Washington Post, show the breadth of U.S. electronic surveillance capabilities.",
"image": "http://www.washingtonpost.com/wp-srv/special/politics/prism-collection-documents/images/upstream-promo-296.jpg"
}
```
Use the `--includeRaw` parameter to include all the ran `meta` and `link`
content.
```
metaweb http://www.washingtonpost.com/wp-srv/special/politics/prism-collection-documents/ --includeRaw
{
"url": "http://www.washingtonpost.com/wp-srv/special/politics/prism-collection-documents/",
"canonical": "http://www.washingtonpost.com/wp-srv/special/politics/prism-collection-documents/",
"status": 200,
"content_type": "text/html",
"title": "NSA slides explain the PRISM data-collection program - The Washington Post",
"description": "Through a Top-Secret program authorized by federal judges working under the Foreign Intelligence Surveillance Act (FISA), the U.S. intelligence community can gain access to the servers of nine internet companies for a wide range of digital data. Documents describing the previously undisclosed program, obtained by The Washington Post, show the breadth of U.S. electronic surveillance capabilities.",
"image": "http://www.washingtonpost.com/wp-srv/special/politics/prism-collection-documents/images/upstream-promo-296.jpg",
"raw": {
"link": {
"canonical": [
"http://www.washingtonpost.com/wp-srv/special/politics/prism-collection-documents/"
],
"shorturl": [
"http://www.washingtonpost.com/wp-srv/special/politics/prism-collection-documents/"
],
"stylesheet": [
"http://css.wpdigital.net/wpost/css/combo?context=eidos&c=true&m=true&r=/2.0.0/reset.css&r=/2.0.0/structure.css&r=/2.0.0/header.css&r=/2.0.0/footer.css&r=/2.0.0/right-rail.css&r=/2.0.0/rules.css&r=/2.0.0/forms.css&r=/2.0.0/base.css&r=/2.0.0/flipper.css&r=/2.0.0/modules.css&r=/2.0.0/wsodEWA.css&r=/2.0.0/ads.css&r=/2.0.0/fonts/font_FranklinITCProBold.css",
"http://css.wpdigital.net/wp-srv/graphics/css/pretty-comments.css",
"http://css.wpdigital.net/wp-srv/graphics/css/staticbase-2.0.css",
"http://www.washingtonpost.com/wp-srv/special/politics/prism-collection-documents/css/prism.css"
]
},
"meta": {
"twitter:title": [
"NSA slides explain the PRISM data-collection program"
],
"description": [
"Through a Top-Secret program authorized by federal judges working under the Foreign Intelligence Surveillance Act (FISA), the U.S. intelligence community can gain access to the servers of nine internet companies for a wide range of digital data. Documents describing the previously undisclosed program, obtained by The Washington Post, show the breadth of U.S. electronic surveillance capabilities."
],
"twitter:description": [
"Through a Top-Secret program authorized by federal judges working under the Foreign Intelligence Surveillance Act (FISA), the U.S. intelligence community can gain access to the servers of nine internet companies for a wide range of digital data. Documents describing the previously undisclosed program, obtained by The Washington Post, show the breadth of U.S. electronic surveillance capabilities."
],
"keywords": [
"nsa, security, privacy, government data collection, nsa data collection, nsa prism program, prism data collection, prism program"
],
"twitter:url": [
"http://www.washingtonpost.com/wp-srv/special/politics/prism-collection-documents/"
],
"og:image": [
"http://www.washingtonpost.com/wp-srv/special/politics/prism-collection-documents/images/upstream-promo-296.jpg"
],
"twitter:image": [
"http://www.washingtonpost.com/wp-srv/special/politics/prism-collection-documents/images/upstream-promo-296.jpg"
],
"twitter:site": [
"@postgraphics"
],
"twitter:card": [
"summary"
],
"fb:app_id": [
"41245586762"
],
"og:site_name": [
"The Washington Post"
]
},
"title": "NSA slides explain the PRISM data-collection program - The Washington Post"
}
}
```
## JavaScript
Usually you will probably want to use *metaweb* as a library in your own
JavaScript applications:
```javascript
metaweb = require('metaweb')
metadata = metaweb.get(url).then((metadata) => {
// do something with the metadata
})
```
If you would like to also get the raw `link` and `meta` content use the
`includeRaw` parameter:
```javascript
metaweb.get(url, includeRaw=true)
```