{"id":13759333,"url":"https://github.com/jdesgats/lua-lolhtml","last_synced_at":"2026-01-05T02:08:14.894Z","repository":{"id":142335535,"uuid":"234720733","full_name":"jdesgats/lua-lolhtml","owner":"jdesgats","description":"Lua binding for the lol-HTML rewriter/parser","archived":false,"fork":false,"pushed_at":"2020-11-14T12:45:21.000Z","size":33,"stargazers_count":15,"open_issues_count":1,"forks_count":1,"subscribers_count":2,"default_branch":"master","last_synced_at":"2024-11-16T16:37:04.030Z","etag":null,"topics":["html","lua","parsing"],"latest_commit_sha":null,"homepage":null,"language":"C","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"bsd-3-clause","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/jdesgats.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null}},"created_at":"2020-01-18T10:54:38.000Z","updated_at":"2024-05-31T16:13:56.000Z","dependencies_parsed_at":"2023-04-11T19:02:18.734Z","dependency_job_id":null,"html_url":"https://github.com/jdesgats/lua-lolhtml","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jdesgats%2Flua-lolhtml","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jdesgats%2Flua-lolhtml/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jdesgats%2Flua-lolhtml/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jdesgats%2Flua-lolhtml/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/jdesgats","download_url":"https://codeload.github.com/jdesgats/lua-lolhtml/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":253397021,"owners_count":21901976,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["html","lua","parsing"],"created_at":"2024-08-03T13:00:51.003Z","updated_at":"2026-01-05T02:08:14.863Z","avatar_url":"https://github.com/jdesgats.png","language":"C","readme":"Lua binding for lol-html\n========================\n\nThis library is a Lua binding for [lol-html][lolhtml], a *Low output latency\nstreaming HTML parser/rewriter with CSS selector-based API*.\n\nIt can be used to either extract data from HTML documents or rewrite them\non-the-fly.\n\nInstallation\n------------\n\nYou need a functional setup of Rust and Cargo to be able to build this module.\nPlease refer to the [Rust website][rust-install] or install it with your\ndistribution's package manager.\n\n### Luarocks (version \u003e= 3.0 required)\n\nYou can install this module with Luarocks:\n\n```\nluarocks install https://raw.githubusercontent.com/jdesgats/lua-lolhtml/master/rockspecs/lolhtml-dev-1.rockspec\n```\n\n### Manual build\n\nFirst, be sure to clone this repository with its submodules. Then the provided\nMakefile should be able to build the module.\n\n```\ngit clone --recursive https://github.com/jdesgats/lua-lolhtml.git\nmake\n```\n\nRunning the tests require [my fork][telescope] of Telescope:\n\n```\nluarocks install https://raw.githubusercontent.com/jdesgats/telescope/master/rockspecs/telescope-scm-1.rockspec\ntsc spec/lolhtml.lua\n```\n\nQuick start\n-----------\n\nThe workflow is usually:\n\n1. Create a [*rewriter builder*](#rewriterbuilder-objects) object:\n   ```lua\n   local lolhtml = require \"lolhtml\"\n   local my_builder = lolhtml.new_rewriter_builder()\n   ```\n2. Attach callbacks to it with the logic to transform your documents:\n   ```lua\n   my_builder:add_element_content_handlers {\n     selector = lolhtml.new_selector(\"h1\"),\n     element_handler = function(el) el:set_attribute(\"class\", \"title\") end\n   }\n   ```\n3. Use the previous builder to create [*rewriter*](#rewriter-objects) objects,\n   one for each HTML page you want to work on:\n   ```lua\n   local my_rewriter = lolhtml.new_rewriter {\n     builder = my_builder,\n     sink = function(s) print(s) end,\n   }\n   ```\n4. Feed the rewriter with the actual HTML stream:\n   ```lua\n   for l in io.stdin:lines() do\n     my_rewriter:write(l)\n   end\n   my_rewriter:close()\n   ```\n\nThe `examples` directory contains a port of the original Rust examples from\nlol-html. You can run them by feeding an HTML page as input:\n\n```sh\ncurl -NL https://git.io/JeOSZ | lua examples/defer_scripts.lua\n```\n\nStatus\n------\n\n**ALPHA VERSION**\n\nThis binding is not finished yet. Even if the test coverage is quite good and\npass and Valgrind is not complaining, bugs might still be present.\n\nAlso, the API is dot frozen and might change. Here are a non-exhaustive list\nof things that I still consider:\n\n* API naming: stay close of the original names, or choose shorter ones\n* Selectors: should they be exposed at all? or compiled and cached transparently\n* Some data could be exposed as attributes rather than methods, is it better?\n* Tables vs. lots of arguments for some functions\n* Error handling: when to raise errors, when to return `nil, err`\n\nReference\n---------\n\nThis library tries to stay close of the original API, while being more Lua-ish\nwhen appropriate. In particular it should not panic (as in triggering\n`SIGABRT`), such case would be considered as a bug.\n\n### Top-level objects\n\nObject constructors:\n\n* `lolhtml.new_selector`: see [`Selector`](#selector-objects)\n* `lolhtml.new_rewriter_builder`: see [`RewriterBuilder`](#rewriterbuilder-objects)\n* `lolhtml.new_rewriter`: see [`Rewriter`](#rewriter-objects)\n\nConstants:\n\n* `lolhtml.CONTINUE`\n* `lolhtml.STOP`\n\n### Selector objects\n\nSelector object represent a parsed CSS selector that can be used to build\nrewriter builders.\n\nSelector objects don't have any methods or attributes. They are exposed only\nfor garbage collection purposes (and also as an optimization if you need to\nreuse the same selector in multiple builders).\n\n#### `lolhtml.new_selector(sel: string) =\u003e Selector | nil, err`\n\nBuilds a new [`Selector`](#selector-objects) object out of the give string.\nReturns `nil, err` in case of syntax error.\n\n### RewriterBuilder objects\n\nThe `RewriterBuilder` encapsulate the logic to make rewrites, usually they are\ncreated at program startup and are used to instantiate many `Rewriter` objects.\n\nAll callbacks functions are called with a single argument whose type depend on\nthe type of callback. This argument should not outlive the callback and any\nattempt to keep a reference of it to use it later will result in an error.\n\nThese functions can return:\n\n* `lolhtml.CONTINUE`: instructs the parser to continue processing the HTML\n  stream\n* `lolhtml.STOP`: causes the parser to stop immediately, `write()` or `end()`\n  methods of the rewriter will return an error code\n* *nothing*: same as `lolhtml.CONTINUE`\n\nIf a callback raises an error, it will also causes the rewriter to stop\nimmediately. The error object or message will be returned as error by the\n`write()` or `end()` methods of the rewriter.\n\n#### `lolhtml.new_rewriter_builder() =\u003e RewriterBuilder`\n\nCreate a new `RewriterBuilder` object.\n\n#### `RewriterBuilder:add_document_content_handlers(callbacks) =\u003e self`\n\nAdds new document-level content handlers. This function might be called\nmultiple times to add multiple handlers.\n\nThe `callback` parameter must be a table with callbacks for different types\nof events, the possible fields are:\n\n* `doctype_handler`: called after parsing the Document Type declaration with\n  a [`Doctype`](#doctype-objects) object.\n* `comment_handler`: called whenever a comment is parsed with a\n  [`Comment`](#comment-objects) object.\n* `text_handler`: called when text nodes are parsed with a\n  [`TextChunk`](#textchunk-objects) object.\n* `doc_end_handler`: called at the end of the document with a\n  [`DocumentEnd`](#documentend-objects) object.\n\nAll of the fields are optional. Calling a callback has a cost so leave out any\ncallback you don't need.\n\n#### `RewriterBuilder:add_element_content_handlers(callbacks) =\u003e self`\n\nAdds new element content handlers associated with a selector. This function\nmight be called multiple times to add multiple handlers for different\nselectors.\n\nThe `callback` parameter must be a table with the selector and the callbacks\nfor different types of events, the possible fields are:\n\n* `selector`: the [CSS selector](#selector-objects) to call the callbacks on\n  (required)\n* `comment_handler`: called whenever a comment is parsed with a\n  [`Comment`](#comment-objects) object.\n* `text_handler`: called when text nodes are parsed with a\n  [`TextChunk`](#textchunk-objects) object.\n* `element_handler`: called when an element is parsed with a\n  [`Element`](#element-objects) object.\n\nAll of the fields are optional (except `selector`). Calling a callback has a\ncost so leave out any callback you don't need.\n\n\n### Rewriter objects\n\nRewriter object are processing a single HTML document and are instantiated with\na [`RewriterBuilder`](#rewriterbuilder-objects) object.\n\nEach rewriter has an associated `sink`, which is a function called to output\nthe rewritten HTML.\n\n#### `lolhtml.new_rewriter(options) =\u003e Rewriter | nil, err`\n\nCreates a new reriter object. The `options` argument must be a table, the\nfollowing fields are allowed:\n\n* `builder`: a `RewriterBuilder` object (required)\n* `encoding`: the text encoding for the HTML stream. Can be a label for any of\n  the web-compatible encodings with an exception for `UTF-16LE`, `UTF-16BE`,\n  `ISO-2022-JP` and `replacement` (these non-ASCII-compatible encodings are\n  not supported). (optional, default is `\"utf-8\"`)\n* `preallocated_parsing_buffer_size`: Specifies the number of bytes that should\n  be preallocated on HtmlRewriter instantiation for the internal parsing\n  buffer. See [lol-html documentation][lolhtml-memory] for details. (optional,\n  default is 1024)\n* `max_allowed_memory_usage`: Sets a hard limit in bytes on memory consumption\n  of a Rewriter instance. See [lol-html documentation][lolhtml-memory] for\n  details. (optional, default is `SIZE_MAX`)\n* `strict`: boolean, if set to true the rewriter bails out if it encounters\n   markup that drives the HTML parser into ambigious state. See\n  [lol-html documentation][lolhtml-strict] for details. (optional, default is\n  `false`)\n\nReturns the new Rewriter on success, or `nil` and an error message on failure.\n\n#### `Rewriter:write(s) =\u003e self | nil, err`\n\nWrite HTML chunk to rewriter. Returns the rewriter itself on success, or `nil`\nand an error message on failure. Failure happens if (incomplete list):\n\n* A callback or a sink raises an error\n* A previous invocation returned an error\n* Called after `close`\n\n#### `Rewriter:close(s) =\u003e self | nil, err`\n\nFinalizes the rewriting process. Should be called once the last chunk of the\ninput is written. Returns the rewriter itself on success, or `nil` and an\nerror message on failure. Failure happens if (incomplete list):\n\n* A callback or a sink raises an error\n* A previous invocation returned an error\n* Called more than once\n\n\n### Doctype objects\n\n#### `Doctype:get_name() =\u003e string|nil`\n#### `Doctype:get_id() =\u003e string|nil`\n#### `Doctype:get_system_id() =\u003e string|nil`\n\n### Comment objects\n\n#### `Comment:get_text() =\u003e string`\n#### `Comment:set_text(string) =\u003e self|nil, err`\n#### `Comment:before(string, is_html) =\u003e self|nil, err`\n#### `Comment:after(string, is_html) =\u003e self|nil, err`\n#### `Comment:replace(string, is_html) =\u003e self|nil, err`\n#### `Comment:remove() =\u003e self|nil, err`\n#### `Comment:is_removed() =\u003e boolean`\n\n### TextChunk objects\n\n#### `TextChunk:get_text() =\u003e string`\n#### `TextChunk:is_last_in_text_node() =\u003e boolean`\n#### `TextChunk:before(string, is_html) =\u003e self|nil, err`\n#### `TextChunk:after(string, is_html) =\u003e self|nil, err`\n#### `TextChunk:replace(string, is_html) =\u003e self|nil, err`\n#### `TextChunk:remove() =\u003e self|nil, err`\n#### `TextChunk:is_removed() =\u003e boolean`\n\n### Element objects\n\n#### `Element:get_tag_name() =\u003e string`\n#### `Element:get_namespace_uri() =\u003e string`\n#### `Element:get_attribute(name) =\u003e string|nil`\n#### `Element:has_attribute(name) =\u003e boolean`\n#### `Element:set_attribute(name, value) =\u003e self|nil, err`\n#### `Element:remove_attribute(name) =\u003e self|nil, err`\n#### `Element:attributes() =\u003e iterator`\n\nReturns a Lua iterator triplet so the following construction is valid:\n\n```lua\nfor attr_name, value in element:attribute() do\n  ...\nend\n```\n\n#### `Element:before(string, is_html) =\u003e self|nil, err`\n#### `Element:after(string, is_html) =\u003e self|nil, err`\n#### `Element:prepend(string, is_html) =\u003e self|nil, err`\n#### `Element:append(string, is_html) =\u003e self|nil, err`\n#### `Element:set_inner_content(string, is_html) =\u003e self|nil, err`\n#### `Element:replace(string, is_html) =\u003e self|nil, err`\n#### `Element:remove() =\u003e self|nil, err`\n#### `Element:remove_and_keep_content() =\u003e self|nil, err`\n#### `Element:is_removed() =\u003e boolean`\n\n### DocumentEnd objects\n\n#### `DocumentEnd:append(string, is_html) =\u003e self|nil, err`\n\n\n[lolhtml]: https://github.com/cloudflare/lol-html\n[lolhtml-memory]: https://docs.rs/lol_html/0.1.0/lol_html/struct.MemorySettings.html\n[lolhtml-strict]: https://docs.rs/lol_html/0.1.0/lol_html/struct.Settings.html#structfield.stricti\n[rust-install]: https://www.rust-lang.org/tools/install\n[telescope]: https://github.com/jdesgats/telescope\n","funding_links":[],"categories":["C"],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjdesgats%2Flua-lolhtml","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fjdesgats%2Flua-lolhtml","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjdesgats%2Flua-lolhtml/lists"}