{"id":16571854,"url":"https://github.com/hrbrmstr/htmlunitjars","last_synced_at":"2025-08-22T09:33:55.511Z","repository":{"id":141238328,"uuid":"161996871","full_name":"hrbrmstr/htmlunitjars","owner":"hrbrmstr","description":"☕️ Java Archive Wrapper Supporting the 'htmlunit' Package","archived":false,"fork":false,"pushed_at":"2025-04-12T17:26:31.000Z","size":52269,"stargazers_count":2,"open_issues_count":1,"forks_count":2,"subscribers_count":3,"default_branch":"master","last_synced_at":"2025-04-12T18:31:09.472Z","etag":null,"topics":["htmlunit","r","r-cyber","rjava","rstats","web-scraping"],"latest_commit_sha":null,"homepage":"","language":"R","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/hrbrmstr.png","metadata":{"files":{"readme":"README.Rmd","changelog":"NEWS.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2018-12-16T12:03:42.000Z","updated_at":"2025-04-12T17:26:35.000Z","dependencies_parsed_at":null,"dependency_job_id":"9ad34b10-2356-42b0-a01b-d6500ee7f089","html_url":"https://github.com/hrbrmstr/htmlunitjars","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hrbrmstr%2Fhtmlunitjars","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hrbrmstr%2Fhtmlunitjars/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hrbrmstr%2Fhtmlunitjars/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hrbrmstr%2Fhtmlunitjars/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/hrbrmstr","download_url":"https://codeload.github.com/hrbrmstr/htmlunitjars/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248710438,"owners_count":21149188,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["htmlunit","r","r-cyber","rjava","rstats","web-scraping"],"created_at":"2024-10-11T21:25:28.617Z","updated_at":"2025-04-13T11:51:24.815Z","avatar_url":"https://github.com/hrbrmstr.png","language":"R","funding_links":[],"categories":[],"sub_categories":[],"readme":"---\noutput: \n  rmarkdown::github_document:\n    df_print: kable\n---\n```{r pkg-knitr-opts, include=FALSE}\nhrbrpkghelpr::global_opts()\n```\n\n```{r badges, results='asis', echo=FALSE, cache=FALSE}\nhrbrpkghelpr::stinking_badges()\n```\n\n```{r description, results='asis', echo=FALSE, cache=FALSE}\nhrbrpkghelpr::yank_title_and_description()\n```\n\n\u003e_`HtmlUnit` is a \"GUI-Less browser for Java programs\". It models HTML documents and provides an API that allows you to invoke pages, fill out forms, click links, etc... just like you do in your \"normal\" browser._\n\u003e\n\u003e_It has fairly good JavaScript support (which is constantly improving) and is able to work even with quite complex AJAX libraries, simulating Chrome, Firefox or Internet Explorer depending on the configuration used._\n\u003e\n\u003e_It is typically used for testing purposes or to retrieve information from web sites._\n\u003e\n\u003e_`HtmlUnit` is not a generic unit testing framework. It is specifically a way to simulate a browser._\n    \n## What's Inside The Tin\n\nEverything necessary to use the HtmlUnit library directly via `rJava`.\n\n`HtmlUnit` Library JavaDoc: \u003chttps://htmlunit.sourceforge.net/apidocs/index.html\u003e\n\n## Installation\n\n```{r install-ex, results='asis', echo=FALSE, cache=FALSE}\nhrbrpkghelpr::install_block()\n```\n\n## Usage\n\n```{r lib, cache=FALSE}\nlibrary(htmlunitjars)\n\n# current verison\npackageVersion(\"htmlunitjars\")\n\n```\n\n### Give It A Go\n\n`xml2::read_html()` cannot execute javascript so the traditional approach won't work:\n\n```{r go1}\nlibrary(rvest)\n\ntest_url \u003c- \"https://hrbrmstr.github.io/htmlunitjars/index.html\"\n\ndoc \u003c- read_html(test_url)\n\nhtml_table(doc)\n```\n\n☹️\n\nWe _can_ do this with the classes from `HtmlUnit` proivided by this JAR wrapper package:\n\n```{r go2}\nlibrary(htmlunitjars)\n```\n\nTell `HtmlUnit` to work like FireFox:\n\n```{r go3}\nbrowsers \u003c- J(\"com.gargoylesoftware.htmlunit.BrowserVersion\")\n\nwc \u003c- new(J(\"com.gargoylesoftware.htmlunit.WebClient\"), browsers$CHROME)\n```\n\nTell it to wait for javascript to execute and not throw exceptions on page resource errors:\n\n```{r go4}\ninvisible(wc$waitForBackgroundJavaScriptStartingBefore(.jlong(2000L)))\n\nwc_opts \u003c- wc$getOptions()\nwc_opts$setThrowExceptionOnFailingStatusCode(FALSE)\nwc_opts$setThrowExceptionOnScriptError(FALSE)\n```\n\nNow, acccess the site again and get the table:\n\n```{r go5}\npg \u003c- wc$getPage(test_url)\n\ndoc \u003c- read_html(pg$asXml())\n\nhtml_table(doc)\n```\n\nNo need for Selenium or Splash!\n\nThe ultimate goal is to have an `htmlunit` package that provides a nicer API than needing to know how to work with `rJava` directly.\n\n## htmlunitjars Metrics\n\n```{r cloc, echo=FALSE}\ncloc::cloc_pkg_md()\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhrbrmstr%2Fhtmlunitjars","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fhrbrmstr%2Fhtmlunitjars","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhrbrmstr%2Fhtmlunitjars/lists"}