{"id":34963529,"url":"https://github.com/lemonadejs/html-to-json","last_synced_at":"2026-05-15T16:01:58.344Z","repository":{"id":325917014,"uuid":"1103024031","full_name":"lemonadejs/html-to-json","owner":"lemonadejs","description":"Convert an HTML string to a general JSON format.","archived":false,"fork":false,"pushed_at":"2025-11-24T14:57:03.000Z","size":45,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-11-27T09:35:28.223Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"JavaScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/lemonadejs.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-11-24T10:33:51.000Z","updated_at":"2025-11-24T14:57:07.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/lemonadejs/html-to-json","commit_stats":null,"previous_names":["lemonadejs/html-to-json"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/lemonadejs/html-to-json","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lemonadejs%2Fhtml-to-json","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lemonadejs%2Fhtml-to-json/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lemonadejs%2Fhtml-to-json/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lemonadejs%2Fhtml-to-json/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/lemonadejs","download_url":"https://codeload.github.com/lemonadejs/html-to-json/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lemonadejs%2Fhtml-to-json/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":33071582,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-15T11:35:32.926Z","status":"ssl_error","status_checked_at":"2026-05-15T11:35:31.362Z","response_time":103,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-12-26T22:51:11.510Z","updated_at":"2026-05-15T16:01:58.338Z","avatar_url":"https://github.com/lemonadejs.png","language":"JavaScript","funding_links":[],"categories":[],"sub_categories":[],"readme":"# HTML/XML to JSON Converter\n\n\u003e A lightweight, zero-dependency library for bidirectional conversion between HTML/XML and JSON\n\n[![License: MIT](https://img.shields.io/badge/License-MIT-blue.svg)](https://opensource.org/licenses/MIT)\n[![Tests](https://img.shields.io/badge/tests-58%20passing-brightgreen.svg)]()\n\nTransform HTML/XML markup into clean JSON trees and render them back to markup with full fidelity. Perfect for parsing, manipulating, and generating HTML/XML programmatically.\n\n## Features\n\n- **Zero Dependencies** - Pure JavaScript, no external libraries required\n- **TypeScript Support** - Fully typed with comprehensive type definitions\n- **Bidirectional** - Parse HTML/XML to JSON and render JSON back to HTML/XML\n- **High Fidelity** - Preserves structure, attributes, text nodes, and comments\n- **Lightweight** - Minimal footprint, fast parsing\n- **Flexible** - Works with HTML and XML, supports namespaces\n- **Sanitization Ready** - Built-in option to ignore unwanted tags (script, style, etc.)\n- **Pretty Printing** - Optional formatted output with customizable indentation\n- **Well Tested** - 58 comprehensive tests covering all features\n\n## Installation\n\n```bash\nnpm install @lemonadejs/html-to-json\n```\n\n## Import Options\n\nYou can import both functions from the main package:\n\n```javascript\n// Recommended: Import both from main package\nimport { parser, render } from '@lemonadejs/html-to-json';\n```\n\n## TypeScript Usage\n\nThe library includes comprehensive type definitions:\n\n```typescript\nimport { parser, render, type Node, type ParserOptions, type RenderOptions } from '@lemonadejs/html-to-json';\n\n// Fully typed parser with options\nconst options: ParserOptions = { ignore: ['script', 'style'] };\nconst tree: Node | undefined = parser('\u003cdiv\u003eHello\u003c/div\u003e', options);\n\n// Fully typed renderer with options\nconst renderOpts: RenderOptions = { pretty: true, indent: '  ' };\nconst html: string = render(tree, renderOpts);\n```\n\n## Quick Start\n\n### Parse HTML/XML to JSON\n\n```javascript\nimport { parser } from '@lemonadejs/html-to-json';\n\nconst html = '\u003cdiv class=\"card\"\u003e\u003ch1\u003eTitle\u003c/h1\u003e\u003cp\u003eContent\u003c/p\u003e\u003c/div\u003e';\nconst tree = parser(html);\n\nconsole.log(JSON.stringify(tree, null, 2));\n```\n\n**Output:**\n```json\n{\n  \"type\": \"div\",\n  \"props\": [\n    { \"name\": \"class\", \"value\": \"card\" }\n  ],\n  \"children\": [\n    {\n      \"type\": \"h1\",\n      \"children\": [\n        {\n          \"type\": \"#text\",\n          \"props\": [{ \"name\": \"textContent\", \"value\": \"Title\" }]\n        }\n      ]\n    },\n    {\n      \"type\": \"p\",\n      \"children\": [\n        {\n          \"type\": \"#text\",\n          \"props\": [{ \"name\": \"textContent\", \"value\": \"Content\" }]\n        }\n      ]\n    }\n  ]\n}\n```\n\n### Render JSON back to HTML/XML\n\n```javascript\nimport { parser, render } from '@lemonadejs/html-to-json';\n\nconst tree = parser('\u003cdiv class=\"greeting\"\u003eHello World\u003c/div\u003e');\nconst html = render(tree);\n\nconsole.log(html);\n// Output: \u003cdiv class=\"greeting\"\u003eHello World\u003c/div\u003e\n```\n\n### Pretty Printing\n\n```javascript\nimport { render } from '@lemonadejs/html-to-json';\n\nconst tree = {\n  type: 'article',\n  props: [{ name: 'class', value: 'post' }],\n  children: [\n    {\n      type: 'h2',\n      children: [\n        { type: '#text', props: [{ name: 'textContent', value: 'Article Title' }] }\n      ]\n    },\n    {\n      type: 'p',\n      children: [\n        { type: '#text', props: [{ name: 'textContent', value: 'Article content here.' }] }\n      ]\n    }\n  ]\n};\n\nconst html = render(tree, { pretty: true, indent: '  ' });\n\nconsole.log(html);\n```\n\n**Output:**\n```html\n\u003carticle class=\"post\"\u003e\n  \u003ch2\u003e\n    Article Title\n  \u003c/h2\u003e\n  \u003cp\u003e\n    Article content here.\n  \u003c/p\u003e\n\u003c/article\u003e\n```\n\n## 📖 API Reference\n\n### `parser(html, options)`\n\nParses HTML or XML string into a JSON tree structure.\n\n**Parameters:**\n- `html` (string) - The HTML or XML string to parse\n- `options` (Object, optional) - Parser options\n\n**Options:**\n\n| Option   | Type     | Default | Description                                    |\n|----------|----------|---------|------------------------------------------------|\n| `ignore` | string[] | `[]`    | Array of tag names to ignore during parsing    |\n\n**Returns:** `Object` - JSON tree representation\n\n**Examples:**\n\n```javascript\n// Basic parsing\nconst tree = parser('\u003cdiv id=\"app\"\u003eHello\u003c/div\u003e');\n\n// Ignore script and style tags\nconst clean = parser(html, { ignore: ['script', 'style'] });\n\n// Case-insensitive tag matching\nconst tree = parser('\u003cdiv\u003e\u003cSCRIPT\u003ebad\u003c/SCRIPT\u003e\u003c/div\u003e', { ignore: ['script'] });\n```\n\n### `render(tree, options)`\n\nRenders a JSON tree back into HTML or XML markup.\n\n**Parameters:**\n- `tree` (Object|Array) - The JSON tree to render\n- `options` (Object, optional) - Rendering options\n\n**Options:**\n\n| Option            | Type     | Default    | Description                                          |\n|-------------------|----------|------------|------------------------------------------------------|\n| `pretty`          | boolean  | `false`    | Format output with newlines and indentation          |\n| `indent`          | string   | `'  '`     | Indentation string (used when `pretty` is `true`)    |\n| `selfClosingTags` | string[] | See below* | Override default void elements list                  |\n| `xmlMode`         | boolean  | `false`    | Self-close all empty elements using `\u003ctag /\u003e` syntax |\n\n*Default self-closing tags: `area`, `base`, `br`, `col`, `embed`, `hr`, `img`, `input`, `link`, `meta`, `source`, `track`, `wbr`\n\n**Returns:** `string` - Rendered HTML/XML markup\n\n**Examples:**\n\n```javascript\n// Basic rendering\nconst html = render(tree);\n\n// Pretty printing\nconst formatted = render(tree, { pretty: true });\n\n// Custom indentation\nconst tabbed = render(tree, { pretty: true, indent: '\\t' });\n\n// XML mode\nconst xml = render(tree, { xmlMode: true });\n\n// Custom self-closing tags\nconst custom = render(tree, {\n  selfClosingTags: ['br', 'hr', 'img', 'custom-element']\n});\n```\n\n## 🎯 JSON Tree Structure\n\n### Element Node\n```json\n{\n  \"type\": \"tagName\",\n  \"props\": [\n    { \"name\": \"attributeName\", \"value\": \"attributeValue\" }\n  ],\n  \"children\": [...]\n}\n```\n\n### Text Node\n```json\n{\n  \"type\": \"#text\",\n  \"props\": [\n    { \"name\": \"textContent\", \"value\": \"text content here\" }\n  ]\n}\n```\n\n### Comment Node\n```json\n{\n  \"type\": \"#comments\",\n  \"props\": [\n    { \"name\": \"text\", \"value\": \" comment text \" }\n  ]\n}\n```\n\n### Template Wrapper (Multiple Root Elements)\n```json\n{\n  \"type\": \"template\",\n  \"children\": [\n    { \"type\": \"div\", ... },\n    { \"type\": \"span\", ... }\n  ]\n}\n```\n\n## 📦 TypeScript Types\n\nThe library exports the following TypeScript types:\n\n### Core Types\n- **`Node`** - Union type for all possible node types (ElementNode | TextNode | CommentNode | TemplateNode)\n- **`ElementNode`** - HTML/XML element with type, props, and children\n- **`TextNode`** - Text content node with `type: '#text'`\n- **`CommentNode`** - Comment node with `type: '#comments'`\n- **`TemplateNode`** - Wrapper for multiple root elements with `type: 'template'`\n- **`NodeProp`** - Property object with name and value\n\n### Options Types\n- **`ParserOptions`** - Options for the parser function\n- **`RenderOptions`** - Options for the render function\n\n```typescript\nimport type {\n  Node,\n  ElementNode,\n  TextNode,\n  CommentNode,\n  TemplateNode,\n  NodeProp,\n  ParserOptions,\n  RenderOptions\n} from '@lemonadejs/html-to-json';\n```\n\n## 💡 Use Cases\n\n### 1. HTML Sanitization\n\n```javascript\nimport { parser, render } from '@lemonadejs/html-to-json';\n\n// Remove potentially dangerous tags using the ignore option\nfunction sanitizeHTML(html) {\n  const tree = parser(html, {\n    ignore: ['script', 'style', 'iframe', 'object', 'embed']\n  });\n  return render(tree);\n}\n\nconst dirty = '\u003cdiv\u003eHello\u003cscript\u003ealert(\"xss\")\u003c/script\u003e\u003cstyle\u003ebad{}\u003c/style\u003eWorld\u003c/div\u003e';\nconst clean = sanitizeHTML(dirty);\nconsole.log(clean); // \u003cdiv\u003eHelloWorld\u003c/div\u003e\n```\n\n### 2. HTML Transformation\n\n```javascript\n// Add class to all divs\nfunction addClassToAllDivs(tree, className) {\n  if (tree.type === 'div') {\n    if (!tree.props) tree.props = [];\n    const classAttr = tree.props.find(p =\u003e p.name === 'class');\n    if (classAttr) {\n      classAttr.value += ` ${className}`;\n    } else {\n      tree.props.push({ name: 'class', value: className });\n    }\n  }\n\n  if (tree.children) {\n    tree.children.forEach(child =\u003e addClassToAllDivs(child, className));\n  }\n\n  return tree;\n}\n\nconst html = '\u003cdiv\u003e\u003cdiv\u003eNested\u003c/div\u003e\u003c/div\u003e';\nconst tree = parser(html);\naddClassToAllDivs(tree, 'highlight');\nconsole.log(render(tree));\n// \u003cdiv class=\"highlight\"\u003e\u003cdiv class=\"highlight\"\u003eNested\u003c/div\u003e\u003c/div\u003e\n```\n\n### 3. XML Processing\n\n```javascript\n// Parse and extract data from XML\nconst xml = `\n\u003ccatalog\u003e\n  \u003cbook isbn=\"978-0-123456-78-9\"\u003e\n    \u003ctitle\u003eSample Book\u003c/title\u003e\n    \u003cauthor\u003eJohn Doe\u003c/author\u003e\n    \u003cprice\u003e29.99\u003c/price\u003e\n  \u003c/book\u003e\n\u003c/catalog\u003e`;\n\nconst tree = parser(xml);\n\nfunction extractBooks(node) {\n  if (node.type === 'book') {\n    const isbn = node.props?.find(p =\u003e p.name === 'isbn')?.value;\n    const title = node.children?.find(c =\u003e c.type === 'title')\n      ?.children?.[0]?.props?.[0]?.value;\n    const author = node.children?.find(c =\u003e c.type === 'author')\n      ?.children?.[0]?.props?.[0]?.value;\n\n    return { isbn, title, author };\n  }\n\n  if (node.children) {\n    return node.children.map(extractBooks).filter(Boolean).flat();\n  }\n\n  return [];\n}\n\nconst books = extractBooks(tree);\nconsole.log(books);\n// [{ isbn: '978-0-123456-78-9', title: 'Sample Book', author: 'John Doe' }]\n```\n\n### 4. Complex HTML with Inline CSS\n\n```javascript\nconst complexHTML = `\n\u003cdiv style=\"padding: 20px; background: linear-gradient(135deg, #667eea 0%, #764ba2 100%);\"\u003e\n  \u003ch1 style=\"color: white; margin: 0;\"\u003eWelcome\u003c/h1\u003e\n  \u003cp style=\"color: rgba(255,255,255,0.9);\"\u003eBeautiful styled content\u003c/p\u003e\n\u003c/div\u003e`;\n\nconst tree = parser(complexHTML);\nconst rendered = render(tree, { pretty: true });\n\nconsole.log(rendered);\n// Perfectly preserves all inline CSS with gradients, rgba colors, etc.\n```\n\n## 🔍 Advanced Features\n\n### XML Namespaces Support\n\n```javascript\nconst xml = '\u003croot xmlns:custom=\"http://example.com\"\u003e\u003ccustom:element\u003eValue\u003c/custom:element\u003e\u003c/root\u003e';\nconst tree = parser(xml);\nconst output = render(tree);\n// Preserves namespace colons in tag names\n```\n\n### Self-Closing Tags\n\n```javascript\nconst html = '\u003cdiv\u003e\u003cbr /\u003e\u003cimg src=\"test.jpg\" /\u003e\u003cinput type=\"text\" /\u003e\u003c/div\u003e';\nconst tree = parser(html);\nconst output = render(tree);\n// Properly handles void elements\n```\n\n### Comments Preservation\n\n```javascript\nconst html = '\u003cdiv\u003e\u003c!-- Important comment --\u003e\u003cspan\u003eContent\u003c/span\u003e\u003c/div\u003e';\nconst tree = parser(html);\nconst output = render(tree);\n// Comments are preserved in the output\n```\n\n### Multiple Root Elements\n\n```javascript\nconst html = '\u003cdiv\u003eFirst\u003c/div\u003e\u003cspan\u003eSecond\u003c/span\u003e';\nconst tree = parser(html);\n// Returns: { type: 'template', children: [...] }\n```\n\n## 🧪 Testing\n\nRun the comprehensive test suite:\n\n```bash\nnpm test\n```\n\n**Test Coverage:**\n- ✅ Basic HTML elements (div, span, nested structures)\n- ✅ Self-closing tags (br, img, input, hr, meta, link)\n- ✅ Attributes (single, multiple, special characters, quotes)\n- ✅ Text content with escaping\n- ✅ HTML comments\n- ✅ XML documents with namespaces\n- ✅ Complex real-world examples (forms, navigation, tables)\n- ✅ Edge cases (empty input, whitespace, consecutive tags)\n- ✅ Parser behavior (no parent references, unclosed tags)\n- ✅ Parser options (ignore tags - script, style, nested, case-insensitive)\n- ✅ Renderer options (pretty printing, XML mode)\n- ✅ Complex HTML with extensive inline CSS (11,000+ characters)\n\n**58 tests passing** • 1 skipped\n\n## ⚡ Performance\n\nThe parser is designed for speed and efficiency:\n\n- **Streaming parser** - Single-pass character-by-character parsing\n- **No regex in main loop** - Only simple character matching\n- **Minimal allocations** - Reuses objects where possible\n- **Stack-based** - Efficient memory usage for deeply nested structures\n\nTypical performance:\n- Small HTML (\u003c 1KB): \u003c 1ms\n- Medium HTML (10KB): ~5ms\n- Large HTML (100KB+): ~50ms\n- Complex HTML with CSS (11KB): ~10ms\n\n## ⚠️ Known Limitations\n\n1. **HTML Entities**: Not decoded during parsing. They are stored as-is and escaped on render.\n   - Input: `\u003cp\u003e\u0026amp;\u003c/p\u003e` → Stored: `\"\u0026amp;\"` → Output: `\u003cp\u003e\u0026amp;amp;\u003c/p\u003e`\n   - **Workaround**: Use raw characters instead of entities in source\n\n2. **Whitespace**: Fully preserved in text nodes, no normalization applied.\n\n3. **Doctype**: `\u003c!DOCTYPE html\u003e` declarations are parsed as text nodes, not special nodes.\n\n4. **CDATA**: `\u003c![CDATA[...]]\u003e` sections are not specially handled.\n\n5. **Processing Instructions**: `\u003c?xml ...?\u003e` are not parsed.\n\n6. **Error Reporting**: Parser is lenient and produces a tree even for malformed HTML. No detailed error messages.\n\n7. **Attribute Order**: May differ from source in rendered output.\n\n8. **Quotes**: Renderer always uses double quotes for attributes.\n\n## 🤝 Contributing\n\nContributions are welcome! Please feel free to submit a Pull Request.\n\n### Development Setup\n\n```bash\n# Clone the repository\ngit clone https://github.com/lemonadejs/html-to-json.git\ncd html-to-json\n\n# Install dependencies\nnpm install\n\n# Run tests\nnpm test\n\n# Run tests in watch mode\nnpm test -- --watch\n```\n\n## 📄 License\n\nMIT © [Jspreadsheet Team](https://github.com/lemonadejs)\n\n## 🔗 Links\n\n- **Repository**: https://github.com/lemonadejs/html-to-json\n- **NPM Package**: https://www.npmjs.com/package/@lemonadejs/html-to-json\n- **Issues**: https://github.com/lemonadejs/html-to-json/issues\n- **Documentation**: https://github.com/lemonadejs/html-to-json#readme\n\n## 🙏 Acknowledgments\n\nBuilt with ❤️ by the [Jspreadsheet Team](https://jspreadsheet.com/)\n\n---\n\n**Star this repo** ⭐ if you find it useful!\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flemonadejs%2Fhtml-to-json","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Flemonadejs%2Fhtml-to-json","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flemonadejs%2Fhtml-to-json/lists"}