{"id":14442678,"url":"https://github.com/wikimedia/html-metadata","last_synced_at":"2025-05-15T20:01:46.433Z","repository":{"id":24725135,"uuid":"28137225","full_name":"wikimedia/html-metadata","owner":"wikimedia","description":"MetaData html scraper and parser for Node.js (supports Promises and callback style)","archived":false,"fork":false,"pushed_at":"2025-03-07T14:59:17.000Z","size":475,"stargazers_count":171,"open_issues_count":12,"forks_count":43,"subscribers_count":27,"default_branch":"main","last_synced_at":"2025-04-01T00:34:01.425Z","etag":null,"topics":["javascript","metadata-extraction","metadata-extractor","node-module","nodejs","web-scraper","web-scraping"],"latest_commit_sha":null,"homepage":"","language":"JavaScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/wikimedia.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.md","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2014-12-17T12:54:28.000Z","updated_at":"2025-03-07T14:59:17.000Z","dependencies_parsed_at":"2024-12-14T10:02:23.319Z","dependency_job_id":"191d2c66-13fa-481a-b5c7-5bf25bbe2e8b","html_url":"https://github.com/wikimedia/html-metadata","commit_stats":{"total_commits":108,"total_committers":22,"mean_commits":4.909090909090909,"dds":0.6018518518518519,"last_synced_commit":"3bedb1715e06d7c3f8ac1426ce2d804fe056a746"},"previous_names":[],"tags_count":16,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/wikimedia%2Fhtml-metadata","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/wikimedia%2Fhtml-metadata/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/wikimedia%2Fhtml-metadata/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/wikimedia%2Fhtml-metadata/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/wikimedia","download_url":"https://codeload.github.com/wikimedia/html-metadata/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247761051,"owners_count":20991531,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["javascript","metadata-extraction","metadata-extractor","node-module","nodejs","web-scraper","web-scraping"],"created_at":"2024-08-31T21:00:49.971Z","updated_at":"2025-04-08T01:35:50.126Z","avatar_url":"https://github.com/wikimedia.png","language":"JavaScript","readme":"html-metadata\n=============\n[![npm](https://img.shields.io/npm/v/html-metadata.svg)](https://www.npmjs.com/package/html-metadata)\n\u003e MetaData html scraper and parser for Node.js (supports Promises only. Callbacks were deprecated in 3.0.0)\n\nThe aim of this library is to be a comprehensive source for extracting all html embedded metadata. Currently it supports Schema.org microdata using a third party library, a native BEPress, Dublin Core, Highwire Press, JSON-LD, Open Graph, Twitter, EPrints, PRISM, and COinS implementation, and some general metadata that doesn't belong to a particular standard (for instance, the content of the title tag, or meta description tags).\n\nPlanned is support for RDFa, AGLS, and other yet unheard of metadata types. Contributions and requests for other metadata types welcome!\n\n## Install\n\n\tnpm install html-metadata\n\n## Usage\n\n```js\nvar scrape = require('html-metadata');\n\nvar url = \"http://blog.woorank.com/2013/04/dublin-core-metadata-for-seo-and-usability/\";\n\nscrape(url).then(function(metadata){\n\tconsole.log(metadata);\n});\n```\n\nThe scrape method used here invokes the parseAll() method, which uses all the available methods registered in method metadataFunctions(), and are available for use separately as well, for example:\n\n```js\nvar cheerio = require('cheerio');\nvar parseDublinCore = require('html-metadata').parseDublinCore;\n\nvar url = \"http://blog.woorank.com/2013/04/dublin-core-metadata-for-seo-and-usability/\";\n\nfetch(url).then(function(response){\n\t$ = cheerio.load(response.body);\n\treturn parseDublinCore($).then(function(metadata){\n\t\tconsole.log(metadata);\n\t});\n});\n```\n\nOptions dictionary:\n\nYou can also pass an [options dictionary](https://developer.mozilla.org/en-US/docs/Web/API/RequestInit) as the first argument containing extra parameters. Some websites require the user-agent or cookies to be set in order to get the response. This is identifical to the RequestInit dictionary except that it should also contain the requested url as part of the dictionary. \n\n```\nvar scrape = require('html-metadata');\n\nvar options =  {\n\turl: \"http://example.com\",\n\theaders: {\n\t\t'User-Agent': 'webscraper'\n\t}\n};\n\nscrape(options, function(error, metadata){\n\tconsole.log(metadata);\n});\n```\n\nThe method parseGeneral obtains the following general metadata:\n\n```html\n\u003clink rel=\"apple-touch-icon\" href=\"\" sizes=\"\" type=\"\"\u003e\n\u003clink rel=\"icon\" href=\"\" sizes=\"\" type=\"\"\u003e\n\u003cmeta name=\"author\" content=\"\"\u003e\n\u003clink rel=\"author\" href=\"\"\u003e\n\u003clink rel=\"canonical\" href=\"\"\u003e\n\u003cmeta name =\"description\" content=\"\"\u003e\n\u003clink rel=\"publisher\" href=\"\"\u003e\n\u003cmeta name =\"robots\" content=\"\"\u003e\n\u003clink rel=\"shortlink\" href=\"\"\u003e\n\u003ctitle\u003e\u003c/title\u003e\n\u003chtml lang=\"en\"\u003e\n\u003chtml dir=\"rtl\"\u003e\n```\n\n## Tests\n\n```npm test``` runs the mocha tests\n\n```npm run-script coverage``` runs the tests and reports code coverage\n\n## Contributing\n\nContributions welcome! All contibutions should use [Promises](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Promise) instead of callbacks.\n","funding_links":[],"categories":["JavaScript"],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fwikimedia%2Fhtml-metadata","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fwikimedia%2Fhtml-metadata","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fwikimedia%2Fhtml-metadata/lists"}