{"id":26610481,"url":"https://github.com/iwillspeak/teasel","last_synced_at":"2025-03-24T01:48:23.809Z","repository":{"id":66263492,"uuid":"450854054","full_name":"iwillspeak/Teasel","owner":"iwillspeak","description":"Teasing HTML Elements from Text","archived":false,"fork":false,"pushed_at":"2025-02-25T12:24:07.000Z","size":772,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-03-19T22:12:36.622Z","etag":null,"topics":["html","parser"],"latest_commit_sha":null,"homepage":"","language":"TypeScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/iwillspeak.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2022-01-22T15:19:28.000Z","updated_at":"2025-02-25T12:24:12.000Z","dependencies_parsed_at":null,"dependency_job_id":"64e44898-164d-4d08-92ee-db0b1e683b20","html_url":"https://github.com/iwillspeak/Teasel","commit_stats":null,"previous_names":[],"tags_count":3,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/iwillspeak%2FTeasel","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/iwillspeak%2FTeasel/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/iwillspeak%2FTeasel/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/iwillspeak%2FTeasel/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/iwillspeak","download_url":"https://codeload.github.com/iwillspeak/Teasel/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":245195914,"owners_count":20575937,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["html","parser"],"created_at":"2025-03-24T01:48:23.260Z","updated_at":"2025-03-24T01:48:23.803Z","avatar_url":"https://github.com/iwillspeak.png","language":"TypeScript","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Teasel\n\n\u003e Teasing HTML elements from plain text\n\n![Logo](assets/logo.png)\n\nTeasel is an HTML syntax tree parser written in TypeScript. Teasel aims to be\na fast and reliable full-fidelity parser for HTML linters and refactoring tools.\n\n## Key Features\n\n * **Full-fidelity tree** - Every byte in the input text will be represented\n   somewhere in the output syntax tree, in the order it was in the source text.\n * **Fault tolerant perser** - All input texts produce an output tree, and a\n   set of errors. The closer the input is to a standards-compliant HTML document\n   the fewer error diagnostics.\n * **Syntax, not Semantic** - Teasel parses HTML as a _syntax_ tree. The end\n   result is not an HTML DOM. This means that all the warts of the origional\n   document are avilable to dig into; ideal for linters.\n\n## Docs and Getting Started\n\nTo get started using Teasel it can be [installed from GitHub packages][pkg]:\n\n```\n$ npm install @iwillspeak/teasel@0.3.0\n```\n\nOnce installed you can then parse any string containing HTML into a syntax tree:\n\n```typescript\nimport {Parser} from '@iwillspeak/teasel/lib/parse/Parser.js';\n\nconst result = Parser.parseDocument('\u003chtml\u003e\u003cp\u003eHello World');\n```\n\nCheck out the [`teasel` docs][pkg-teasel] for where to go next.\n\n## Repo Structure\n\nThis repository contains three main packages:\n\n  * [`teasel`][pkg-teasel] - The main parser libary. This is the package\n    you want to reference as a consumer.\n  * [`pyracantha`][pkg-pyracantha] - The language agnostic low-level syntax\n    tree library used by `teasel` to represent parsed documents.\n  * [`teasel-cli`][pkg-teasel-cli] - A command line tool to test parsing\n    HTML documents with teasel.\n\n## 🐲 TODO 🐲:\n\n * [x] Handle attributes on opening tags\n * [x] Better error recovery when `expect` fails.\n   * [x] Tolerate and warn on some malformed whitespace. e.g.: `\u003c p\u003e`.\n   * [x] Malformed attribute lists synchronise on `\u003e`.\n * [x] Node cache should cache nodes in the green tree builder.\n  * [x] Node cache interface and implementation.\n  * [x] Parser should accept optional cache.\n * [x] Handle Closing of outer tags correctly. e.g.: `\u003cp\u003e\u003ci\u003ehello\u003c/p\u003e`.\n * [x] Handle Closing of non-nesting siblings. e.g.: `\u003cli\u003ea\u003cli\u003eb`.\n * [x] Handling for implicit self closing of 'void' elements `\u003chr\u003e` etc.\n * [x] Support for esoteric DOCTYPEs e.g. `SYSTEM 'about:legacy-compat'`.\n * [x] Document and fragment parse APIs.\n * [x] Syntax builder / factory API for creating and updating nodes.\n * [x] Handling of raw text elements. e.g. `script`, and `style`.\n * [ ] Support for character references. e.g. `\u0026amp;`.\n * [ ] HTML / XML crossover\n  * [ ] Support for *processing instructions*, e.g. `\u003c?xml version=\"1.0\"\u003e`.\n  * [ ] Support for `CDATA` values / tokens.\n\n\n [pkg]: https://github.com/iwillspeak/Teasel/packages/1313956\n [pkg-teasel]: packages/teasel/README.md\n [pkg-teasel-cli]: packages/teasel-cli/README.md\n [pkg-pyracantha]: packages/pyracantha/README.md","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fiwillspeak%2Fteasel","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fiwillspeak%2Fteasel","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fiwillspeak%2Fteasel/lists"}