{"id":13513421,"url":"https://github.com/serpapi/nokolexbor","last_synced_at":"2026-04-02T12:46:59.403Z","repository":{"id":64777053,"uuid":"571801927","full_name":"serpapi/nokolexbor","owner":"serpapi","description":"High-performance HTML5 parser for Ruby based on Lexbor, with support for both CSS selectors and XPath.","archived":false,"fork":false,"pushed_at":"2024-12-31T06:42:20.000Z","size":673,"stargazers_count":340,"open_issues_count":1,"forks_count":6,"subscribers_count":14,"default_branch":"master","last_synced_at":"2025-05-11T14:47:47.915Z","etag":null,"topics":["c-extension","css","html5","parser","ruby","serpapi","web-scraping","xpath"],"latest_commit_sha":null,"homepage":"","language":"C","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/serpapi.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2022-11-28T23:05:38.000Z","updated_at":"2025-05-03T12:31:48.000Z","dependencies_parsed_at":"2024-01-31T05:08:37.262Z","dependency_job_id":"a11b1955-6f06-4377-9284-448510d11c2d","html_url":"https://github.com/serpapi/nokolexbor","commit_stats":{"total_commits":167,"total_committers":2,"mean_commits":83.5,"dds":"0.017964071856287456","last_synced_commit":"5bf9e5f13c2afa9585ec8184f93aa1d0c3c55b8c"},"previous_names":[],"tags_count":22,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/serpapi%2Fnokolexbor","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/serpapi%2Fnokolexbor/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/serpapi%2Fnokolexbor/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/serpapi%2Fnokolexbor/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/serpapi","download_url":"https://codeload.github.com/serpapi/nokolexbor/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":253938510,"owners_count":21987404,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["c-extension","css","html5","parser","ruby","serpapi","web-scraping","xpath"],"created_at":"2024-08-01T05:00:24.723Z","updated_at":"2026-04-02T12:46:59.391Z","avatar_url":"https://github.com/serpapi.png","language":"C","readme":"# Nokolexbor\n\n[![CI](https://github.com/serpapi/nokolexbor/actions/workflows/ci.yml/badge.svg)](https://github.com/serpapi/nokolexbor/actions/workflows/ci.yml)\n\nNokolexbor is a drop-in replacement for Nokogiri. It's 4.7x faster at parsing HTML and up to 1352x faster at CSS selectors.\n\nIt's a performance-focused HTML5 parser for Ruby based on [Lexbor](https://github.com/lexbor/lexbor/). It supports both CSS selectors and XPath. Nokolexbor's API is designed to be 1:1 compatible as much as possible with [Nokogiri's API](https://github.com/sparklemotion/nokogiri).\n\n## Requirements\n\nNokolexbor is shipped with pre-compiled gems on most common platforms:\n* Linux: `x86_64` and `aarch64`\n* macOS: `x86_64` and `arm64`\n* Windows: `ucrt64`\n\nIf you are on a supported platform, just jump to the [Installation](#installation) section. Otherwise, you need to install CMake to compile C extensions:\n\n### macOS\n\n```\nbrew install cmake\n```\n\n### Linux (Debian, Ubuntu, etc.)\n\n```\nsudo apt-get install cmake\n```\n\n## Installation\n\nAdd to your Gemfile:\n\n```ruby\ngem 'nokolexbor'\n```\n\nThen, run `bundle install`.\n\nOr, install the gem directly:\n\n```\ngem install nokolexbor\n```\n\n## Quick start\n\n```ruby\nrequire 'nokolexbor'\nrequire 'open-uri'\n\n# Parse HTML document\ndoc = Nokolexbor::HTML(URI.open('https://github.com/serpapi/nokolexbor'))\n\n# Search for nodes by css\ndoc.css('#readme h1', 'article h2', 'p[dir=auto]').each do |node|\n  puts node.content\nend\n\n# Search for text nodes by css\ndoc.css('#readme p \u003e ::text').each do |text|\n  puts text.content\nend\n\n# Search for nodes by xpath\ndoc.xpath('//div[@id=\"readme\"]//h1', '//article//h2').each do |node|\n  puts node.content\nend\n```\n\n## Features\n* Nokogiri-compatible APIs.\n* High performance HTML parsing, DOM manipulation and CSS selectors engine.\n* XPath search engine (ported from libxml2).\n* Text nodes CSS selector support: `::text`.\n\n## Searching methods overview\n* `css` and `at_css`\n  * Based on Lexbor.\n  * Only accepts CSS selectors, doesn't support mixed syntax like `div#abc /text()`.\n  * To select text nodes, use pseudo element `::text`. e.g. `div#abc \u003e ::text`.\n  * Performance is much higher than libxml2 based methods.\n* `xpath` and `at_xpath`\n  * Based on libxml2.\n  * Only accepts XPath syntax.\n  * Works in the same way as Nokogiri's `xpath` and `at_xpath`.\n* `nokogiri_css` and `nokogiri_at_css` (requires Nokogiri installed)\n  * Based on libxml2.\n  * Accept mixed syntax like `div#abc /text()`.\n  * Works in the same way as Nokogiri's `css` and `at_css`.\n\n## Different behaviors from Nokogiri\n* For selector `:nth-of-type(n)`, `n` is not affected by prior filter. For example, if we want to select the 3rd `div` excluding class `a` and class `b`, which will be the last `div` in the following HTML:\n  ```\n  \u003cbody\u003e\n    \u003cdiv\u003e\u003c/div\u003e\n    \u003cdiv class=\"a\"\u003e\u003c/div\u003e\n    \u003cdiv class=\"b\"\u003e\u003c/div\u003e\n    \u003cdiv\u003e\u003c/div\u003e\n    \u003cdiv\u003e\u003c/div\u003e\n  \u003c/body\u003e\n  ```\n  In Nokogiri, the selector should be `div:not(.a):not(.b):nth-of-type(3)`\n\n  In Nokolexbor, `:not` does affect the place of the last `div` (same in browsers), the selector should be `div:not(.a):not(.b):nth-of-type(5)`, but this losts the purpose of filtering though.\n\n## Benchmarks\n\nBenchmarks of parsing Google search result page (367 KB) and finding nodes using CSS selectors and XPath.\n\nCPU: AMD Ryzen 5 5600 (Ubuntu 20.04 on Windows 10 WSL 2).\n\nRun with: `ruby bench/bench.rb`\n\n|            | Nokolexbor (iters/s) | Nokogiri (iters/s) | Diff |\n| ---------- | ------------- | ------------ | --------------- |\n| parsing    | 994.8         | 211.8        | 4.70x faster    |\n| at_css     | 202963.7      | 150.1        | 1352.33x faster |\n| css        | 9787.9        | 150.0        | 65.27x faster   |\n| at_xpath   | 154.6         | 153.2        | same-ish        |\n| xpath      | 154.3         | 153.2        | same-ish        |\n\n\u003cdetails\u003e\n\u003csummary\u003eRaw data\u003c/summary\u003e\n\n```\nWarming up --------------------------------------\nNokolexbor parse (367 KB)\n                       100.000  i/100ms\nNokogiri parse (367 KB)\n                        20.000  i/100ms\nCalculating -------------------------------------\nNokolexbor parse (367 KB)\n                        994.773  (± 0.9%) i/s -     19.900k in  20.006124s\nNokogiri parse (367 KB)\n                        211.793  (±12.3%) i/s -      4.180k in  20.093299s\n\nComparison:\nNokolexbor parse (367 KB):      994.8 i/s\nNokogiri parse (367 KB):      211.8 i/s - 4.70x  (± 0.00) slower\n\nWarming up --------------------------------------\n   Nokolexbor at_css    20.195k i/100ms\n     Nokogiri at_css    15.000  i/100ms\nCalculating -------------------------------------\n   Nokolexbor at_css    202.964k (± 0.7%) i/s -      4.059M in  20.000626s\n     Nokogiri at_css    150.084  (± 0.7%) i/s -      3.015k in  20.089207s\n\nComparison:\n   Nokolexbor at_css:   202963.7 i/s\n     Nokogiri at_css:      150.1 i/s - 1352.33x  (± 0.00) slower\n\nWarming up --------------------------------------\n      Nokolexbor css   977.000  i/100ms\n        Nokogiri css    15.000  i/100ms\nCalculating -------------------------------------\n      Nokolexbor css      9.788k (± 0.4%) i/s -    196.377k in  20.063658s\n        Nokogiri css    149.956  (± 0.7%) i/s -      3.000k in  20.006363s\n\nComparison:\n      Nokolexbor css:     9787.9 i/s\n        Nokogiri css:      150.0 i/s - 65.27x  (± 0.00) slower\n\nWarming up --------------------------------------\n Nokolexbor at_xpath    15.000  i/100ms\n   Nokogiri at_xpath    15.000  i/100ms\nCalculating -------------------------------------\n Nokolexbor at_xpath    153.190  (± 0.7%) i/s -      3.075k in  20.073628s\n   Nokogiri at_xpath    154.588  (± 0.6%) i/s -      3.105k in  20.086664s\n\nComparison:\n   Nokogiri at_xpath:      154.6 i/s\n Nokolexbor at_xpath:      153.2 i/s - same-ish: difference falls within error\n\nWarming up --------------------------------------\n    Nokolexbor xpath    15.000  i/100ms\n      Nokogiri xpath    15.000  i/100ms\nCalculating -------------------------------------\n    Nokolexbor xpath    153.159  (± 0.7%) i/s -      3.075k in  20.077580s\n      Nokogiri xpath    154.322  (± 1.3%) i/s -      3.090k in  20.026288s\n\nComparison:\n      Nokogiri xpath:      154.3 i/s\n    Nokolexbor xpath:      153.2 i/s - same-ish: difference falls within error\n```\n\u003c/details\u003e\n","funding_links":[],"categories":["C","HTML/XML Parsing"],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fserpapi%2Fnokolexbor","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fserpapi%2Fnokolexbor","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fserpapi%2Fnokolexbor/lists"}