{"id":13725427,"url":"https://github.com/mozilla/page-metadata-parser","last_synced_at":"2025-05-07T20:32:23.472Z","repository":{"id":9314439,"uuid":"61582733","full_name":"mozilla/page-metadata-parser","owner":"mozilla","description":"DEPRECATED - A Javascript library for parsing metadata on a web page.","archived":true,"fork":false,"pushed_at":"2022-02-24T12:30:51.000Z","size":189,"stargazers_count":270,"open_issues_count":0,"forks_count":42,"subscribers_count":20,"default_branch":"master","last_synced_at":"2024-05-22T18:24:56.775Z","etag":null,"topics":["abandoned","unmaintained"],"latest_commit_sha":null,"homepage":"https://www.npmjs.com/package/page-metadata-parser","language":"JavaScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mpl-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/mozilla.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2016-06-20T21:50:32.000Z","updated_at":"2024-03-01T19:11:57.000Z","dependencies_parsed_at":"2022-08-07T05:00:51.266Z","dependency_job_id":null,"html_url":"https://github.com/mozilla/page-metadata-parser","commit_stats":null,"previous_names":[],"tags_count":19,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mozilla%2Fpage-metadata-parser","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mozilla%2Fpage-metadata-parser/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mozilla%2Fpage-metadata-parser/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mozilla%2Fpage-metadata-parser/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/mozilla","download_url":"https://codeload.github.com/mozilla/page-metadata-parser/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":224444086,"owners_count":17312126,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["abandoned","unmaintained"],"created_at":"2024-08-03T01:02:22.992Z","updated_at":"2024-11-14T15:31:18.536Z","avatar_url":"https://github.com/mozilla.png","language":"JavaScript","funding_links":[],"categories":["JavaScript"],"sub_categories":[],"readme":"# Page Metadata Parser\nA Javascript library for parsing metadata in web pages.\n\n[![CircleCI](https://circleci.com/gh/mozilla/page-metadata-parser.svg?style=svg)](https://circleci.com/gh/mozilla/page-metadata-parser)\n\n[![Coverage Status](https://coveralls.io/repos/github/mozilla/page-metadata-parser/badge.svg?branch=master)](https://coveralls.io/github/mozilla/page-metadata-parser?branch=master)\n\n## Overview\n\n### Purpose\n\nThe purpose of this library is to be able to find a consistent set of metadata for any given web page.  Each individual kind of metadata has many rules which define how it may be located.  For example, a description of a page could be found in any of the following DOM elements:\n\n    \u003cmeta name=\"description\" content=\"A page's description\"/\u003e\n\n    \u003cmeta property=\"og:description\" content=\"A page's description\" /\u003e\n\nBecause different web pages represent their metadata in any number of possible DOM elements, the Page Metadata Parser collects rules for different ways a given kind of metadata may be represented and abstracts them away from the caller.\n\nThe output of the metadata parser for the above example would be\n\n    {description: \"A page's description\"}\n\nregardless of which particular kind of description tag was used.\n\n### Supported schemas\n\nThis library employs parsers for the following formats:\n\n[opengraph](http://ogp.me/)\n\n[twitter](https://dev.twitter.com/cards/markup)\n\n[meta tags](https://developer.mozilla.org/en/docs/Web/HTML/Element/meta)\n\n### Requirements\n\nThis library is meant to be used either in the browser (embedded directly in a website or into a browser addon/extension) or on a server (node.js).\n\nThe parser depends only on the [Node URL library](https://nodejs.org/api/url.html) or the [Browser URL library](https://developer.mozilla.org/en-US/docs/Web/API/Document/URL). \n\nEach function expects to be passed a [Document](https://developer.mozilla.org/en-US/docs/Web/API/Document) object, which may be created either directly by a browser or on the server using a [Document](https://developer.mozilla.org/en-US/docs/Web/API/Document) compatible object, such as that provided by [domino](https://github.com/fgnass/domino).\n\n## Usage\n\n### Installation\n\n    npm install --save page-metadata-parser\n\n### Usage in the browser\n\nThe library can be built to be deployed directly to a modern browser by using\n\n    npm run bundle\n\nand embedding the resultant js file directly into a page like so:\n\n    \u003cscript src=\"page-metadata-parser.bundle.js\" type=\"text/javascript\" /\u003e\n\n    \u003cscript\u003e\n\n      const metadata = metadataparser.getMetadata(window.document, window.location);\n\n      console.log(\"The page's title is \", metadata.title);\n\n    \u003c/script\u003e\n\n### Usage in node\n\nTo use the library in node, you must first construct a DOM API compatible object from an HTML string, for example:\n\n    const {getMetadata} = require('page-metadata-parser');\n    const domino = require('domino');\n\n    const url = 'https://github.com/mozilla/page-metadata-parser';\n    const response = await fetch(url);\n    const html = await response.text();\n    const doc = domino.createWindow(html).document;\n    const metadata = getMetadata(doc, url);\n\n## Metadata Rules\n\n### Rules\n\nA single rule instructs the parser on a possible DOM node to locate a specific piece of content.  \n\nFor instance, a rule to parse the title of a page found in a DOM tag like this:\n\n    \u003cmeta property=\"og:title\" content=\"Page Title\" /\u003e\n\nWould be represented with the following rule:\n\n    ['meta[property=\"og:title\"]', element =\u003e element.getAttribute('content')]\n\nA rule consists of two parts, a [query selector](https://developer.mozilla.org/en-US/docs/Web/API/Document/querySelector) compatible string which is used to look up the target content, and a callable which receives an [element](https://developer.mozilla.org/en-US/docs/Web/API/Element) and returns the desired content from that element.\n\nMany rules together form a Rule Set.  This library will apply each rule to a page and choose the 'best' result.  The order in which rules are defined indicate their preference, with the first rule being the most preferred.  A Rule Set can be defined like so:\n\n    const titleRules = {\n      rules: [\n        ['meta[property=\"og:title\"]', node =\u003e node.element.getAttribute('content')],\n        ['title', node =\u003e node.element.text],\n      ]\n    };\n\nIn this case, the OpenGraph title will be preferred over the title tag.\n\nThis library includes many rules for a single desired piece of metadata which should allow it to consistently find metadata across many types of pages.  This library is meant to be a community driven effort, and so if there is no rule to find a piece of information from a particular website, contributors are encouraged to add new rules!\n\n### Built-in Rule Sets \n\nThis library provides rule sets to find the following forms of metadata in a page:\n\nField | Description\n--- | ---\ndescription | A user displayable description for the page.\nicon | A URL which contains an icon for the page.\nimage | A URL which contains a preview image for the page.\nkeywords | The meta keywords for the page.\nprovider | A string representation of the sub and primary domains.\ntitle | A user displayable title for the page.\ntype | The type of content as defined by [opengraph](http://ogp.me/#types).\nurl | A canonical URL for the page.\n\nTo use a single rule set to find a particular piece of metadata within a page, simply pass that rule set, a URL,  and a [Document](https://developer.mozilla.org/en-US/docs/Web/API/Document) object to getMetadata and it will apply each possible rule for that rule set until it finds a matching piece of information and return it.\n\nExample:\n\n    const {getMetadata, metadataRuleSets} = require('page-metadata-parser');\n\n    const pageTitle = getMetadata(doc, url, {title: metadataRuleSets.title});\n\n\n### Extending a single rule\n\nTo add your own additional custom rule to an existing rule set, you can simply push it into that rule sets's array.\n\nExample:\n\n    const {getMetadata, metadataRuleSets} = require('page-metadata-parser');\n\n    const customDescriptionRuleSet = metadataRuleSets.description;\n\n    customDescriptionRuleSet.rules.push([\n      ['meta[name=\"customDescription\"]', element =\u003e element.getAttribute('content')]\n    ]);\n\n    const pageDescription = getMetadata(doc, url, {description: customDescriptionRuleSet});\n\n\n### Using all rules\n\nTo parse all of the available metadata on a page using all of the rule sets provided in this library, simply call getMetadata on the [Document](https://developer.mozilla.org/en-US/docs/Web/API/Document).\n\n    const {getMetadata, metadataRuleSets} = require('page-metadata-parser');\n\n    const pageMetadata = getMetadata(doc, url);\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmozilla%2Fpage-metadata-parser","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmozilla%2Fpage-metadata-parser","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmozilla%2Fpage-metadata-parser/lists"}