{"id":15389507,"url":"https://github.com/jgarber623/nokogiri-html-ext","last_synced_at":"2025-08-02T01:43:03.764Z","repository":{"id":38976080,"uuid":"506780892","full_name":"jgarber623/nokogiri-html-ext","owner":"jgarber623","description":"A Ruby gem extending Nokogiri with several useful HTML-centric features.","archived":false,"fork":false,"pushed_at":"2024-03-01T03:27:57.000Z","size":99,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-03-27T23:18:26.952Z","etag":null,"topics":["html-parser","nokogiri","ruby","rubygems"],"latest_commit_sha":null,"homepage":"https://rubygems.org/gems/nokogiri-html-ext","language":"Ruby","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/jgarber623.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2022-06-23T20:23:23.000Z","updated_at":"2023-12-07T15:51:35.000Z","dependencies_parsed_at":"2023-11-29T05:27:15.188Z","dependency_job_id":"82d1169d-0c78-4ed8-a852-13a7eb0b2d38","html_url":"https://github.com/jgarber623/nokogiri-html-ext","commit_stats":{"total_commits":69,"total_committers":2,"mean_commits":34.5,"dds":0.05797101449275366,"last_synced_commit":"ea7a3a46d19ad765be6c9babb1d4f757ef31e35e"},"previous_names":[],"tags_count":9,"template":false,"template_full_name":null,"purl":"pkg:github/jgarber623/nokogiri-html-ext","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jgarber623%2Fnokogiri-html-ext","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jgarber623%2Fnokogiri-html-ext/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jgarber623%2Fnokogiri-html-ext/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jgarber623%2Fnokogiri-html-ext/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/jgarber623","download_url":"https://codeload.github.com/jgarber623/nokogiri-html-ext/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jgarber623%2Fnokogiri-html-ext/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":268326460,"owners_count":24232478,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-08-01T02:00:08.611Z","response_time":67,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["html-parser","nokogiri","ruby","rubygems"],"created_at":"2024-10-01T15:01:58.208Z","updated_at":"2025-08-02T01:43:03.523Z","avatar_url":"https://github.com/jgarber623.png","language":"Ruby","funding_links":[],"categories":[],"sub_categories":[],"readme":"# nokogiri-html-ext\n\n**A Ruby gem extending [Nokogiri](https://nokogiri.org) with several useful HTML-centric features.**\n\n[![Gem](https://img.shields.io/gem/v/nokogiri-html-ext.svg?logo=rubygems\u0026style=for-the-badge)](https://rubygems.org/gems/nokogiri-html-ext)\n[![Downloads](https://img.shields.io/gem/dt/nokogiri-html-ext.svg?logo=rubygems\u0026style=for-the-badge)](https://rubygems.org/gems/nokogiri-html-ext)\n[![Build](https://img.shields.io/github/actions/workflow/status/jgarber623/nokogiri-html-ext/ci.yml?branch=main\u0026logo=github\u0026style=for-the-badge)](https://github.com/jgarber623/nokogiri-html-ext/actions/workflows/ci.yml)\n\n## Key features\n\n- Resolves all relative URLs in a Nokogiri-parsed HTML document.\n- Adds helpers for getting and setting a document's `\u003cbase\u003e` element's `href` attribute.\n- Supports Ruby 2.7 and newer\n\n## Getting Started\n\nBefore installing and using nokogiri-html-ext, you'll want to have [Ruby](https://www.ruby-lang.org) 2.7 (or newer) installed. Using a Ruby version managment tool like [rbenv](https://github.com/rbenv/rbenv), [chruby](https://github.com/postmodern/chruby), or [rvm](https://github.com/rvm/rvm) is recommended.\n\nnokogiri-html-ext is developed using Ruby 2.7.8 and is tested against additional Ruby versions using [GitHub Actions](https://github.com/jgarber623/nokogiri-html-ext/actions).\n\n## Installation\n\nAdd nokogiri-html-ext to your project's `Gemfile` and run `bundle install`:\n\n```ruby\nsource \"https://rubygems.org\"\n\ngem \"nokogiri-html-ext\"\n```\n\n## Usage\n\n### `base_href`\n\nnokogiri-html-ext provides two helper methods for getting and setting a document's `\u003cbase\u003e` element's `href` attribute. The first, `base_href`, retrieves the element's `href` attribute value if it exists.\n\n```ruby\nrequire \"nokogiri/html-ext\"\n\ndoc = Nokogiri::HTML(%(\u003chtml\u003e\u003cbody\u003eHello, world!\u003c/body\u003e\u003c/html\u003e))\n\ndoc.base_href\n#=\u003e nil\n\ndoc = Nokogiri::HTML(%(\u003chtml\u003e\u003chead\u003e\u003cbase target=\"_top\"\u003e\u003cbody\u003eHello, world!\u003c/body\u003e\u003c/html\u003e))\n\ndoc.base_href\n#=\u003e nil\n\ndoc = Nokogiri::HTML(%(\u003chtml\u003e\u003chead\u003e\u003cbase href=\"/foo\"\u003e\u003cbody\u003eHello, world!\u003c/body\u003e\u003c/html\u003e))\n\ndoc.base_href\n#=\u003e \"/foo\"\n```\n\nThe `base_href=` method allows you to manipulate the document's `\u003cbase\u003e` element.\n\n```ruby\nrequire \"nokogiri/html-ext\"\n\ndoc = Nokogiri::HTML(%(\u003chtml\u003e\u003cbody\u003eHello, world!\u003c/body\u003e\u003c/html\u003e))\n\ndoc.base_href = \"/foo\"\n#=\u003e \"/foo\"\n\ndoc.at_css(\"base\").to_s\n#=\u003e \"\u003cbase href=\\\"/foo\\\"\u003e\"\n\ndoc = Nokogiri::HTML(%(\u003chtml\u003e\u003chead\u003e\u003cbase href=\"/foo\"\u003e\u003cbody\u003eHello, world!\u003c/body\u003e\u003c/html\u003e))\n\ndoc.base_href = \"/bar\"\n#=\u003e \"/bar\"\n\ndoc.at_css(\"base\").to_s\n#=\u003e \"\u003cbase href=\\\"/bar\\\"\u003e\"\n```\n\n### `resolve_relative_urls!`\n\nnokogiri-html-ext will resolve a document's relative URLs against a provided source URL. The source URL _should_ be an absolute URL (e.g. `https://jgarber.example`) representing the location of the document being parsed. The source URL _may_ be any `String` (or any Ruby object that responds to `#to_s`).\n\nnokogiri-html-ext takes advantage of [the `Nokogiri::XML::Document.parse` method](https://github.com/sparklemotion/nokogiri/blob/main/lib/nokogiri/xml/document.rb#L48)'s second positional argument to set the parsed document's URL.Nokogiri's source code is _very_ complex, but in short: [the `Nokogiri::HTML` method](https://github.com/sparklemotion/nokogiri/blob/main/lib/nokogiri/html.rb#L7-L8) is an alias to [the `Nokogiri::HTML4` method](https://github.com/sparklemotion/nokogiri/blob/main/lib/nokogiri/html4.rb#L10-L12) which eventually winds its way to the aforementioned `Nokogiri::XML::Document.parse` method. _Phew._ 🥵\n\nURL resolution uses Ruby's built-in URL parsing and normalizing capabilities. Absolute URLs will remain unmodified.\n\n**Note:** If the document's markup includes a `\u003cbase\u003e` element whose `href` attribute is an absolute URL, _that_ URL will take precedence when performing URL resolution.\n\nAn abbreviated example:\n\n```ruby\nrequire \"nokogiri/html-ext\"\n\nmarkup = \u003c\u003c-HTML\n  \u003chtml\u003e\n  \u003cbody\u003e\n    \u003ca href=\"/home\"\u003eHome\u003c/a\u003e\n    \u003cimg src=\"/foo.png\" srcset=\"../bar.png 720w\"\u003e\n  \u003c/body\u003e\n  \u003c/html\u003e\nHTML\n\ndoc = Nokogiri::HTML(markup, \"https://jgarber.example\")\n\ndoc.url\n#=\u003e \"https://jgarber.example\"\n\ndoc.base_href\n#=\u003e nil\n\ndoc.base_href = \"/foo/bar/biz\"\n#=\u003e \"/foo/bar/biz\"\n\ndoc.resolve_relative_urls!\n\ndoc.at_css(\"base\")[\"href\"]\n#=\u003e \"https://jgarber.example/foo/bar/biz\"\n\ndoc.at_css(\"a\")[\"href\"]\n#=\u003e \"https://jgarber.example/home\"\n\ndoc.at_css(\"img\").to_s\n#=\u003e \"\u003cimg src=\\\"https://jgarber.example/foo.png\\\" srcset=\\\"https://jgarber.example/foo/bar.png 720w\\\"\u003e\"\n```\n\n### `resolve_relative_url`\n\nYou may also resolve an arbitrary `String` representing a relative URL against the document's URL (or `\u003cbase\u003e` element's `href` attribute value):\n\n```ruby\ndoc = Nokogiri::HTML(%(\u003chtml\u003e\u003cbase href=\"/foo/bar\"\u003e\u003c/html\u003e), \"https://jgarber.example\")\n\ndoc.resolve_relative_url(\"biz/baz\")\n#=\u003e \"https://jgarber.example/foo/biz/baz\"\n```\n\n## Acknowledgments\n\nnokogiri-html-ext wouldn't exist without the [Nokogiri](https://nokogiri.org) project and its [community](https://github.com/sparklemotion/nokogiri).\n\nnokogiri-html-ext is written and maintained by [Jason Garber](https://sixtwothree.org).\n\n## License\n\nnokogiri-html-ext is freely available under the MIT License. Use it, learn from it, fork it, improve it, change it, tailor it to your needs.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjgarber623%2Fnokogiri-html-ext","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fjgarber623%2Fnokogiri-html-ext","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjgarber623%2Fnokogiri-html-ext/lists"}