{"id":13526942,"url":"https://github.com/andrejewski/himalaya","last_synced_at":"2025-05-14T03:03:18.924Z","repository":{"id":31942267,"uuid":"35511933","full_name":"andrejewski/himalaya","owner":"andrejewski","description":"JavaScript HTML to JSON Parser","archived":false,"fork":false,"pushed_at":"2025-04-04T07:11:31.000Z","size":1235,"stargazers_count":926,"open_issues_count":35,"forks_count":130,"subscribers_count":22,"default_branch":"master","last_synced_at":"2025-04-14T04:58:27.578Z","etag":null,"topics":["himalaya","html","javascript","json","parser"],"latest_commit_sha":null,"homepage":"http://andrejewski.github.io/himalaya","language":"JavaScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"isc","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/andrejewski.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2015-05-12T20:49:06.000Z","updated_at":"2025-04-04T07:11:35.000Z","dependencies_parsed_at":"2023-10-20T17:09:34.683Z","dependency_job_id":"565c8200-7a5b-4e7a-ac23-199a3dc794be","html_url":"https://github.com/andrejewski/himalaya","commit_stats":{"total_commits":84,"total_committers":4,"mean_commits":21.0,"dds":"0.22619047619047616","last_synced_commit":"f0b870011b84da362c863dc914157f30d4a603ac"},"previous_names":[],"tags_count":17,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/andrejewski%2Fhimalaya","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/andrejewski%2Fhimalaya/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/andrejewski%2Fhimalaya/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/andrejewski%2Fhimalaya/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/andrejewski","download_url":"https://codeload.github.com/andrejewski/himalaya/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254059474,"owners_count":22007767,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["himalaya","html","javascript","json","parser"],"created_at":"2024-08-01T06:01:38.153Z","updated_at":"2025-05-14T03:03:18.864Z","avatar_url":"https://github.com/andrejewski.png","language":"JavaScript","readme":"# Himalaya\n\n\u003e Parse HTML into JSON\n\n[![npm](https://img.shields.io/npm/v/himalaya.svg)](https://www.npmjs.com/package/himalaya)\n![Build Status](https://github.com/andrejewski/himalaya/actions/workflows/ci.yml/badge.svg)\n[![Coverage Status](https://coveralls.io/repos/github/andrejewski/himalaya/badge.svg?branch=master)](https://coveralls.io/github/andrejewski/himalaya?branch=master)\n\n[Try online 🚀](http://andrejewski.github.io/himalaya)\n|\n[Read the specification 📖](https://github.com/andrejewski/himalaya/blob/master/text/ast-spec-v1.md)\n\n## Usage\n\n### Node\n\n```bash\nnpm install himalaya\n```\n\n```js\nimport fs from 'fs'\nimport { parse } from 'himalaya'\nconst html = fs.readFileSync('/webpage.html', { encoding: 'utf8' })\nconst json = parse(html)\nconsole.log('👉', json)\n```\n\n### Browser\n\nDownload [himalaya.js](https://github.com/andrejewski/himalaya/blob/master/docs/dist/himalaya.js) and put it in a `\u003cscript\u003e` tag. Himalaya will be accessible from `window.himalaya`.\n\n```js\nconst html = '\u003cdiv\u003eHello world\u003c/div\u003e'\nconst json = window.himalaya.parse(html)\nconsole.log('👉', json)\n```\n\nHimalaya bundles well with Browersify and Webpack.\n\n## Example Input/Output\n\n```html\n\u003cdiv class=\"post post-featured\"\u003e\n  \u003cp\u003eHimalaya parsed me...\u003c/p\u003e\n  \u003c!-- ...and I liked it. --\u003e\n\u003c/div\u003e\n```\n\n```js\n;[\n  {\n    type: 'element',\n    tagName: 'div',\n    attributes: [\n      {\n        key: 'class',\n        value: 'post post-featured',\n      },\n    ],\n    children: [\n      {\n        type: 'element',\n        tagName: 'p',\n        attributes: [],\n        children: [\n          {\n            type: 'text',\n            content: 'Himalaya parsed me...',\n          },\n        ],\n      },\n      {\n        type: 'comment',\n        content: ' ...and I liked it. ',\n      },\n    ],\n  },\n]\n```\n\n_Note:_ In this example, text nodes consisting of whitespace are not shown for readability.\n\n## Features\n\n### Synchronous\n\nHimalaya transforms HTML into JSON, that's it. Himalaya is synchronous and does not require any complicated callbacks.\n\n### Handles Weirdness\n\nHimalaya handles a lot of HTML's fringe cases, like:\n\n- Closes unclosed tags `\u003cp\u003e\u003cb\u003e...\u003c/p\u003e`\n- Ignores extra closing tags `\u003cspan\u003e...\u003c/b\u003e\u003c/span\u003e`\n- Properly handles void tags like `\u003cmeta\u003e` and `\u003cimg\u003e`\n- Properly handles self-closing tags like `\u003cinput/\u003e`\n- Handles `\u003c!doctype\u003e` and `\u003c-- comments --\u003e`\n- Does not parse the contents of `\u003cscript\u003e`, `\u003cstyle\u003e`, and HTML5 `\u003ctemplate\u003e` tags\n\n### Preserves Whitespace\n\nHimalaya does not cut corners and returns an accurate representation of the HTML supplied. To remove whitespace, post-process the JSON; check out [an example script](https://gist.github.com/andrejewski/773487d4f4a46b16865405d7b74eabf9).\n\n### Line, column, and index positions\n\nHimalaya can include the start and end positions of nodes in the parse output.\nTo enable this, you can pass `parse` the `parseDefaults` extended with `includePositions: true`:\n\n```js\nimport { parse, parseDefaults } from 'himalaya'\nparse('\u003cimg\u003e', { ...parseDefaults, includePositions: true })\n/* =\u003e\n[\n  {\n    \"type\": \"element\",\n    \"tagName\": \"img\",\n    \"attributes\": [],\n    \"children\": [],\n    \"position\": {\n      \"start\": {\n        \"index\": 0,\n        \"line\": 0,\n        \"column\": 0\n      },\n      \"end\": {\n        \"index\": 5,\n        \"line\": 0,\n        \"column\": 5\n      }\n    }\n  }\n]\n*/\n```\n\n## Going back to HTML\n\nHimalaya provides a `stringify` method. The following example parses the HTML to JSON then parses the JSON back into HTML.\n\n```js\nimport fs from 'fs'\nimport { parse, stringify } from 'himalaya'\n\nconst html = fs.readFileSync('/webpage.html', { encoding: 'utf8' })\nconst json = parse(html)\nfs.writeFileSync('/webpage.html', stringify(json))\n```\n\n## Why \"Himalaya\"?\n\n[First, my friends weren't helpful.](https://twitter.com/compooter/status/597908517132042240) Except Josh, Josh had my back.\n\nWhile I was testing the parser, I threw a download of my Twitter homepage in and got a giant JSON blob out. My code editor Sublime Text has a mini-map and looking at it sideways the data looked like a never-ending mountain range. Also, \"himalaya\" has H, M, L in it.\n","funding_links":[],"categories":["Repository","JavaScript"],"sub_categories":["Parsing"],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fandrejewski%2Fhimalaya","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fandrejewski%2Fhimalaya","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fandrejewski%2Fhimalaya/lists"}