{"id":13858159,"url":"https://github.com/hrbrmstr/decapitated","last_synced_at":"2025-07-13T23:31:25.881Z","repository":{"id":141238100,"uuid":"90038424","full_name":"hrbrmstr/decapitated","owner":"hrbrmstr","description":"Headless 'Chrome' Orchestration in R","archived":true,"fork":false,"pushed_at":"2019-07-31T15:41:04.000Z","size":725,"stargazers_count":65,"open_issues_count":7,"forks_count":3,"subscribers_count":7,"default_branch":"master","last_synced_at":"2024-11-22T16:39:15.097Z","etag":null,"topics":["headless-chrome","javascript","r","r-cyber","rstats","web-scraping"],"latest_commit_sha":null,"homepage":null,"language":"R","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/hrbrmstr.png","metadata":{"files":{"readme":"README.Rmd","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null}},"created_at":"2017-05-02T13:46:25.000Z","updated_at":"2024-01-04T16:13:42.000Z","dependencies_parsed_at":"2024-02-09T02:09:03.271Z","dependency_job_id":"c1911e5b-38e1-4bb7-9951-5f2c80839cb7","html_url":"https://github.com/hrbrmstr/decapitated","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/hrbrmstr/decapitated","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hrbrmstr%2Fdecapitated","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hrbrmstr%2Fdecapitated/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hrbrmstr%2Fdecapitated/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hrbrmstr%2Fdecapitated/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/hrbrmstr","download_url":"https://codeload.github.com/hrbrmstr/decapitated/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hrbrmstr%2Fdecapitated/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":265220340,"owners_count":23729795,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["headless-chrome","javascript","r","r-cyber","rstats","web-scraping"],"created_at":"2024-08-05T03:01:58.735Z","updated_at":"2025-07-13T23:31:20.872Z","avatar_url":"https://github.com/hrbrmstr.png","language":"R","funding_links":[],"categories":["R"],"sub_categories":[],"readme":"---\noutput: rmarkdown::github_document\n---\n\n# decapitated\n\nHeadless 'Chrome' Orchestration\n\n## Description\n\nThe 'Chrome' browser \u003chttps://www.google.com/chrome/\u003e has a headless mode\nwhich can be instrumented programmatically. Tools are provided to perform headless\n'Chrome' instrumentation on the command-line, including retrieving the javascript-executed web page, PDF output or screen shot of a URL.\n\n## IMPORTANT\n\nYou'll need to set an envrionment variable `HEADLESS_CHROME` to use this package.\n\nIf this value is not set, a location heuristic is used on package start which looks\nfor the following depending on the operating system:\n\n- Windows(32bit): `C:/Program Files/Google/Chrome/Application/chrome.exe`\n- Windows(64bit): `C:/Program Files (x86)/Google/Chrome/Application/chrome.exe`\n- macOS: `/Applications/Google\\ Chrome.app/Contents/MacOS/Google\\ Chrome`\n- Linux: `/usr/bin/google-chrome`\n\nIf a verification test fails, you will be notified. \n\n**It is HIGHLY recommended** that you use `decapitated::download_chromium()` to use\na standalone version of Chrome with this packge for your platform. \n\nIt's best to use `~/.Renviron` to store this value.\n\n## Working around headless Chrome \u0026 OS security restrictions:\n\nSecurity restrictions on various operating systems and OS configurations can cause\nheadless Chrome execution to fail. As a result, headless Chrome operations should\nuse a special directory for `decapitated` package operations. You can pass this\nin as `work_dir`. If `work_dir` is `NULL` a `.rdecapdata` directory will be\ncreated in your home directory and used for the data, crash dumps and utility\ndirectories for Chrome operations.\n\n`tempdir()` does not always meet these requirements (after testing on various\nmacOS 10.13 systems) as Chrome does some interesting attribute setting for\nsome of its file operations.\n\nIf you pass in a `work_dir`, it must be one that does not violate OS security\nrestrictions or headless Chrome will not function.\n\n## Helping it \"always work\"\n\nThe three core functions have a `prime` parameter. In testing (again, especially on macOS),\nI noticed that the first one or two requests to a URL often resulted in an empty `\u003cbody\u003e`\nresponse. I don't use Chrome as my primary browser anymore so I'm not sure if that has something\nto do with it, but requests after the first one or two do return content. The `prime`\nparameter lets you specify `TRUE`, `FALSE` or a numeric value that will issue the\nURL retrieval multiple times before returning a result (or generating a PDF or PNG).\nUntil there is more granular control over the command-line execution of headless\nChrome.\n\n## What's in the tin?\n\nThe following functions are implemented:\n\n### CLI-based ops\n\n- `downlaod_chromium`:  Download a standalone version of Chromium (recommended)\n- `chrome_dump_pdf`:\t\"Print\" to PDF\n- `chrome_read_html`:\tRead a URL via headless Chrome and return the raw or rendered '\u003cbody\u003e' 'innerHTML' DOM elements\n- `chrome_shot`:\tCapture a screenshot\n- `chrome_version`:\tGet Chrome version\n- `get_chrome_env`:\tget an envrionment variable 'HEADLESS_CHROME'\n- `set_chrome_env`:\tset an envrionment variable 'HEADLESS_CHROME'\n\n### `gepetto`-based ops\n\nHelpers to get gepetto installed:\n\n- `install_gepetto`:\tInstall gepetto\n- `start_gepetto`:\tStart/stop gepetto\n- `stop_gepetto`:\tStart/stop gepetto\n\nAPI interface functions:\n\n- `gepetto`:\tCreate a connection to a Gepetto API server\n- `gep_active`:\tGet test whether the gepetto server is active\n- `gep_debug`:\tGet \"debug-level\" information of a running gepetto server\n- `gep_render_har`:\tRender a page in a javascript context and serialize to HAR\n- `gep_render_html`:\tRender a page in a javascript context and serialize to HTML\n- `gep_render_magick`:\tRender a page in a javascript context and take a screenshot\n- `gep_render_pdf`:\tRender a page in a javascript context and rendero to PDF\n\nMore information on `gepetto` is forthcoming but you can take a sneak peek [here](https://gitlab.com/hrbrmstr/gepetto).\n\n## Installation\n\n```{r eval=FALSE}\ndevtools::install_github(\"hrbrmstr/decapitated\")\n```\n\n```{r message=FALSE, warning=FALSE, error=FALSE, include=FALSE}\noptions(width=120)\n```\n\n## Usage\n\n```{r message=FALSE, warning=FALSE, error=FALSE}\nlibrary(decapitated)\n\n# current verison\npackageVersion(\"decapitated\")\n\nchrome_version()\n\nchrome_read_html(\"http://httpbin.org/\")\n```\n\n```{r eval=FALSE, message=FALSE, warning=FALSE, error=FALSE}\nchrome_dump_pdf(\"http://httpbin.org/\")\n```\n\n```{r message=FALSE, warning=FALSE, error=FALSE, eval=FALSE}\nchrome_shot(\"http://httpbin.org/\")\n\n##   format width height colorspace filesize\n## 1    PNG  1600   1200       sRGB   215680\n```\n\n![screenshot.png](screenshot.png)\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhrbrmstr%2Fdecapitated","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fhrbrmstr%2Fdecapitated","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhrbrmstr%2Fdecapitated/lists"}