{"id":13565326,"url":"https://github.com/microlinkhq/html-get","last_synced_at":"2026-04-02T16:48:30.062Z","repository":{"id":34885214,"uuid":"138704984","full_name":"microlinkhq/html-get","owner":"microlinkhq","description":"Get the HTML from any website, using prerendering when necessary.","archived":false,"fork":false,"pushed_at":"2025-04-12T09:45:35.000Z","size":2382,"stargazers_count":91,"open_issues_count":1,"forks_count":13,"subscribers_count":3,"default_branch":"master","last_synced_at":"2025-04-12T10:37:28.421Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"JavaScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/microlinkhq.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2018-06-26T08:02:31.000Z","updated_at":"2025-04-12T09:45:38.000Z","dependencies_parsed_at":"2023-01-15T09:55:46.415Z","dependency_job_id":"9faba4ce-53c7-450d-8404-7eefa676ab59","html_url":"https://github.com/microlinkhq/html-get","commit_stats":{"total_commits":629,"total_committers":5,"mean_commits":125.8,"dds":"0.20190779014308424","last_synced_commit":"b138c05463fa3f3c9c056c9f1cb7b072debe1a81"},"previous_names":[],"tags_count":296,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/microlinkhq%2Fhtml-get","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/microlinkhq%2Fhtml-get/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/microlinkhq%2Fhtml-get/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/microlinkhq%2Fhtml-get/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/microlinkhq","download_url":"https://codeload.github.com/microlinkhq/html-get/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248557372,"owners_count":21124158,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-08-01T13:01:44.685Z","updated_at":"2026-04-02T16:48:30.015Z","avatar_url":"https://github.com/microlinkhq.png","language":"JavaScript","funding_links":[],"categories":["JavaScript"],"sub_categories":[],"readme":"\u003cdiv align=\"center\"\u003e\n  \u003cimg src=\"https://github.com/microlinkhq/cdn/raw/master/dist/logo/banner.png#gh-light-mode-only\" alt=\"microlink logo\"\u003e\n  \u003cimg src=\"https://github.com/microlinkhq/cdn/raw/master/dist/logo/banner-dark.png#gh-dark-mode-only\" alt=\"microlink logo\"\u003e\n  \u003cbr\u003e\n  \u003cbr\u003e\n\u003c/div\u003e\n\n![Last version](https://img.shields.io/github/tag/microlinkhq/html-get.svg?style=flat-square)\n[![Coverage Status](https://img.shields.io/coveralls/microlinkhq/html-get.svg?style=flat-square)](https://coveralls.io/github/microlinkhq/html-get)\n[![NPM Status](https://img.shields.io/npm/dm/html-get.svg?style=flat-square)](https://www.npmjs.org/package/html-get)\n\n\u003e Get the HTML from any website, fine-tuned for correction \u0026 speed.\n\n## Features\n\n- Get HTML markup for any URL, including images, video, audio, or pdf.\n- Block ads tracker or any non-necessary network subrequest.\n- Handle unreachable or timeout URLs gracefully.\n- Ensure HTML markup is appropriately encoded.\n\n**html-get** takes advantage of [puppeteer](https://github.com/GoogleChrome/puppeteer) headless technology when is needed, such as client-side apps that needs to be prerender.\n\n## Install\n\n```bash\n$ npm install browserless puppeteer html-get --save\n```\n\n## Usage\n\n```js\nconst createBrowserless = require('browserless')\nconst getHTML = require('html-get')\n\n// Spawn Chromium process once\nconst browserlessFactory = createBrowserless()\n\n// Kill the process when Node.js exit\nprocess.on('exit', () =\u003e {\n  console.log('closing resources!')\n  browserlessFactory.close()\n})\n\nconst getContent = async url =\u003e {\n  // create a browser context inside Chromium process\n  const browserContext = browserlessFactory.createContext()\n  const getBrowserless = () =\u003e browserContext\n  const result = await getHTML(url, { getBrowserless })\n  // close the browser context after it's used\n  await getBrowserless((browser) =\u003e browser.destroyContext())\n  return result\n}\n\ngetContent('https://example.com')\n  .then(content =\u003e {\n    console.log(content)\n    process.exit()\n  })\n  .catch(error =\u003e {\n    console.error(error)\n    process.exit(1)\n  })\n```\n\n### Command Line\n\n```\n$ npx html-get https://example.com\n```\n\n## API\n\n### getHTML(url, [options])\n\n#### url\n\n*Required*\u003cbr\u003e\nType: `string`\n\nThe target URL for getting the HTML markup.\n\n#### options\n\n##### encoding\n\nType: `string`\nDefault: `'utf-8'`\n\nIt ensures the HTML markup is encoded to the encoded value provided.\n\nThe value will be passes to [`html-encode`](https://github.com/kikobeats/html-encode) \n\n##### getBrowserless\n\n*Required*\u003cbr\u003e\nType: `function`\n\nA function that should return a [browserless](https://browserless.js.org/) instance to be used for interact with puppeteer:\n\n##### getMode\n\nType: `function`\n\nIt determines the strategy to use based on the `url`, being the possibles values `'fetch'` or `'prerender'` .\n\n##### getTemporalFile\n\nType: `function`\n\nIt creates a temporal file.\n\n##### gotOpts\n\nType: `object`\n\nIt passes configuration object to [got](https://www.npmjs.com/package/got) under `'fetch'` strategy.\n\n##### headers\n\nType: `object`\n\nRequest headers that will be passed to fetch/prerender process.\n\n##### mutool\n\nType: `function`|`boolean`\u003cbr\u003e\nDefault: `source code`\n\nIt returns a function that receives that executes [mutool](https://mupdf.com/) binary for turning PDF files into HTML markup.\n\nIt can explicitly disabled passing `false`.\n\n##### prerender\n\nType: `boolean`|`string`\u003cbr\u003e\nDefault: `'auto'`\n\nEnable or disable prerendering as mechanism for getting the HTML markup explicitly.\n\nThe value `auto` means that that internally use a list of websites that don't need to use prerendering by default. This list is used for speedup the process, using `fetch` mode for these websites.\n\nSee [getMode parameter](#getMode) for know more.\n\n##### puppeteerOpts\n\nType: `object`\n\nIt passes coniguration object to [puppeteer](https://www.npmjs.com/package/puppeteer) under `'prerender'` strategy.\n\n##### rewriteUrls\n\nType: `boolean`\u003cbr\u003e\nDefault: `false`\n\nWhen is `true`, it will be rewritten CSS/HTML relatives URLs present in the HTML markup into absolutes.\n\n##### rewriteHtml\n\nType: `boolean`\u003cbr\u003e\nDefault: `false`\n\nWhen is `true`, it will rewrite some common mistake related with HTML meta tags.\n\n##### serializeHtml\n\nIt determines how HTML should be serialied before returning.\n\nIt's serialized `$ =\u003e ({ html: $.html() })` by default.\n\n## License\n\n**html-get** © [Microlink](https://microlink.io), released under the [MIT](https://github.com/microlinkhq/html-get/blob/master/LICENSE.md) License.\u003cbr\u003e\nAuthored and maintained by [Kiko Beats](https://kikobeats.com) with help from [contributors](https://github.com/microlinkhq/html-get/contributors).\n\n\u003e [microlink.io](https://microlink.io) · GitHub [microlinkhq](https://github.com/microlinkhq) · X [@microlinkhq](https://x.com/microlinkhq)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmicrolinkhq%2Fhtml-get","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmicrolinkhq%2Fhtml-get","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmicrolinkhq%2Fhtml-get/lists"}