{"id":26009235,"url":"https://github.com/rotatef/cl-html5-parser","last_synced_at":"2025-03-05T22:06:52.268Z","repository":{"id":4750255,"uuid":"5900081","full_name":"rotatef/cl-html5-parser","owner":"rotatef","description":"HTML5 parser for Common Lisp","archived":false,"fork":false,"pushed_at":"2019-08-15T12:10:55.000Z","size":327,"stargazers_count":52,"open_issues_count":11,"forks_count":13,"subscribers_count":13,"default_branch":"master","last_synced_at":"2024-03-26T18:28:07.673Z","etag":null,"topics":["common-lisp","html5-parser"],"latest_commit_sha":null,"homepage":null,"language":"Common Lisp","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":"GoogleDeveloperGroups/devfest-site","license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/rotatef.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"COPYING","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2012-09-21T10:45:06.000Z","updated_at":"2023-08-31T15:25:58.000Z","dependencies_parsed_at":"2022-08-26T07:10:41.191Z","dependency_job_id":null,"html_url":"https://github.com/rotatef/cl-html5-parser","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rotatef%2Fcl-html5-parser","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rotatef%2Fcl-html5-parser/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rotatef%2Fcl-html5-parser/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rotatef%2Fcl-html5-parser/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/rotatef","download_url":"https://codeload.github.com/rotatef/cl-html5-parser/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":242111445,"owners_count":20073433,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["common-lisp","html5-parser"],"created_at":"2025-03-05T22:02:01.665Z","updated_at":"2025-03-05T22:06:52.262Z","avatar_url":"https://github.com/rotatef.png","language":"Common Lisp","readme":"cl-html5-parser: HTML5 parser for Common Lisp\n=============================================\n\n## Abstract\n\ncl-html5-parser is a HTML5 parser for Common Lisp with the following features:\n\n* It is a port of the Python library [html5lib](http://code.google.com/p/html5lib/).\n* It passes all relevant tests from html5lib.\n* It is not tied to a specific DOM implementation.\n\n\n## Requirements\n\n* SBCL or ECL.\n* CL-PPCRE and FLEXI-STREAMS.\n\nMight work with CLISP, ABCL and Clozure CL, but many of the tests don't pass there.\n\n\n## Usage\n\n\n### Parsing\n\nParsing functions are in the package HTML5-PARSER.\n\n```\nparse-html5 source \u0026key encoding strictp dom\n    =\u003e document, errors\n```\n\nParse an HTML document from source. Source can be a string, a pathname\nor a stream. When parsing from a stream encoding detection is not\nsupported, encoding must be supplied via the encoding keyword\nparameter.\n\nWhen strictp is true, parsing stops on first error.\n\nReturns two values. The primary value is the document node. The\nsecondary value is a list of errors found during parsing. The format\nof this list is subject to change.\n\nThe type of document depends on the dom parameter. By default it's an\ninstance of cl-html5-parser's own DOM implementation. See the DOM\nparagraph below for more information.\n\n```\nparse-html5-fragment source \u0026key container encoding strictp dom\n    =\u003e document-fragment, errors\n```\n\nParses a fragment of HTML. Container sets the context, defaults to\n\"div\". Returns a document-fragment node. For the other parameters see\n`PARSE-HTML5`.\n\n\n### Example\n```common-lisp\n(html5-parser:parse-html5-fragment \"Parse \u003ci\u003esome\u003c/i\u003e HTML\" :dom :xmls)\n==\u003e (\"Parse \" (\"i\" NIL \"some\") \" HTML\")\n```\n\n### The DOM\n\nParsing HTML5 is not possible without a\n[DOM](http://en.wikipedia.org/wiki/Document_Object_Model). cl-html5-parser\ndefines a minimal DOM implementation for this task. Functions for\ntraversing documents are exported by the HTML5-PARSER package.\n\nAlternatively the parser can be instructed to to convert the document\ninto other DOM implementations using the dom parameter. The conversion\nis done by simply calling the generic function\ntransform-html5-dom. Support for other DOM implementations can be\nadded by defining new methods for this generic function. The dom\nparameter is either a symbol or a list where the car is a symbol and\nthe rest is key arguments. Below is the currently supported target\ntypes.\n\n\n### Namespace of elements and attributes\n\nThe HTML5 syntax has no support for namespaces, however the standard\ndefines special rules to set the expected namespace for SVG and MathML\nelements and the following attributes: `xlink:actuate`,\n`xlink:arcrole`, `xlink:href`, `xlink:role`, `xlink:show`,\n`xlink:title`, `xlink:type`, `xml:base`, `xml:lang`, `xml:space`,\n`xmlns`, `xmlns:xlink`. Please note that this only applies to SVG and\nMathML elements. Attributes of HTML elements will never get a\nnamespace.\n\n#### Examples\n\n```html\n\u003chtml xml:lang='en'\u003e\u003csvg xml:lang='en\u003e\u003c/svg\u003e\u003c/html\u003e\n```\n\n* Element `html` with namespace `http://www.w3.org/1999/xhtml`\n* Attribute with name `xml:lang` (no prefix)\n* Element `svg` with namespace `http://www.w3.org/2000/svg`\n* Attribute with prefix `xml`, local name `lang`, namespace `http://www.w3.org/XML/1998/namespace`\n\n```common-lisp\n(html5-parser:parse-html5 \"\u003c!doctype html\u003e\u003chtml xml:lang='en' xml@lang='en'\u003e\" :dom :xmls-ns)\n==\u003e\n((\"html\" . \"http://www.w3.org/1999/xhtml\")\n ((\"xmlU00003Alang\" \"en\") (\"xmlU000040lang\" \"en\")) (\"head\" NIL) (\"body\" NIL))\n```\n\nOn an HTML element `xml:lang` and `xml@lang` are just attributes with\nunusual characters in their name. In the HTML DOM these names are kept\nas is, but when converting to XML they are escaped, to ensure the XML\nbecomes valid. This escaping can be reversed with\n`HTML5-PARSER:XML-UNESCAPE-NAME`.\n\n```common-lisp\n(html5-parser:parse-html5 \"\u003c!doctype html\u003e\u003csvg xml:lang='en' xml@lang='en' xlink:href='#' xlink:to='#'\u003e\u003c/svg\u003e\" :dom :xmls-ns)\n==\u003e\n((\"html\" . \"http://www.w3.org/1999/xhtml\") NIL (\"head\" NIL)\n (\"body\" NIL\n  ((\"svg\" . \"http://www.w3.org/2000/svg\")\n   ((\"xml:lang\" \"en\") (\"xmlU000040lang\" \"en\") (\"xlink:href\" \"#\")\n    (\"xmlns:xlink\" \"http://www.w3.org/1999/xlink\") (\"xlinkU00003Ato\" \"#\")))))\n```\n\nIn this case the `xml:lang` and `xmlns:xlink` is one of those\nattributes with known namespace when used on SVG and MathML\nelements. However `xlink:to` is not the list, even if it's defined in\nthe xlink standard.\n\n### :XMLS or (:XMLS \u0026key namespace comments)\n\nConverts a node into a simple\n[XMLS](http://common-lisp.net/project/xmls/)-like list structure.\nIf node is a document fragment a list of XMLS nodes a returned. In\nall other cases a single XMLS node is returned.\n\nIf namespace argument is true, tag names are conses of name and\nnamespace URI.\n\nBy default comments are stripped. If comments argument is true,\ncomments are returned as (:COMMENT NIL \"comment text\"). This extension\nof XMLS format.\n\n\n### :CXML\n\nConvert to [Closure XML Parser](http://common-lisp.net/project/cxml/)\nDOM implementation. In order to use this you must load/depend on the\nthe system cl-html5-parser-cxml.\n\n\n## License\n\nThis library is available under the\n[GNU Lesser General Public License v3.0](http://www.gnu.org/licenses/lgpl.html).\n","funding_links":[],"categories":["REPLs ##","Interfaces to other package managers"],"sub_categories":["Isomorphic web frameworks"],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frotatef%2Fcl-html5-parser","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Frotatef%2Fcl-html5-parser","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frotatef%2Fcl-html5-parser/lists"}