https://github.com/gesistsa/webbotparser
:mag: R package to parse search engine results
https://github.com/gesistsa/webbotparser
browser-extension rstats rstats-package search-engine
Last synced: about 1 year ago
JSON representation
:mag: R package to parse search engine results
- Host: GitHub
- URL: https://github.com/gesistsa/webbotparser
- Owner: gesistsa
- License: other
- Created: 2023-03-16T09:54:15.000Z (over 3 years ago)
- Default Branch: main
- Last Pushed: 2024-12-02T14:18:42.000Z (over 1 year ago)
- Last Synced: 2025-05-03T10:02:23.743Z (about 1 year ago)
- Topics: browser-extension, rstats, rstats-package, search-engine
- Language: HTML
- Homepage: https://gesistsa.github.io/webbotparseR/
- Size: 41.1 MB
- Stars: 8
- Watchers: 3
- Forks: 1
- Open Issues: 2
-
Metadata Files:
- Readme: README.Rmd
- License: LICENSE
Awesome Lists containing this project
README
---
output: github_document
---
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
```
# webbotparseR 
[](https://app.codecov.io/gh/gesistsa/webbotparseR?branch=main)
[](https://github.com/gesistsa/webbotparseR/actions/workflows/R-CMD-check.yaml)
webbotparseR allows to parse search engine results that where scraped with the [WebBot](https://github.com/gesiscss/WebBot) browser extension. A similar python library is [also available](https://github.com/gesiscss/WebBot-tutorials).
## Installation
You can install the development version of webbotparseR like so:
``` r
remotes::install_github("schochastics/webbotparseR")
```
The package contains an example html from a google search on climate change.
```{r ex_file}
library(webbotparseR)
ex_file <- system.file("www.google.com_climatechange_text_2023-03-16_08_16_11.html", package = "webbotparseR")
```
Such search results can be parsed via the function `parse_search_results()`. The parameter `engine` is used to specify the
search engine and the search type.
```{r parse}
output <- parse_search_results(path = ex_file, engine = "google text")
output
```
Note that images are always returned base64 encoded.
```{r image}
output$image[1]
```
The function `base64_to_img()` can be used to decode the image and save it in an appropriate format.