https://github.com/gesistsa/webbotparser

:mag: R package to parse search engine results
https://github.com/gesistsa/webbotparser

browser-extension rstats rstats-package search-engine

Last synced: about 1 year ago
JSON representation

:mag: R package to parse search engine results

Host: GitHub
URL: https://github.com/gesistsa/webbotparser
Owner: gesistsa
License: other
Created: 2023-03-16T09:54:15.000Z (over 3 years ago)
Default Branch: main
Last Pushed: 2024-12-02T14:18:42.000Z (over 1 year ago)
Last Synced: 2025-05-03T10:02:23.743Z (about 1 year ago)
Topics: browser-extension, rstats, rstats-package, search-engine
Language: HTML
Homepage: https://gesistsa.github.io/webbotparseR/
Size: 41.1 MB
Stars: 8
Watchers: 3
Forks: 1
Open Issues: 2
Metadata Files:
- Readme: README.Rmd
- License: LICENSE

Awesome Lists containing this project

README

          ---

output: github_document

---

```{r, include = FALSE}

knitr::opts_chunk$set(

    collapse = TRUE,

    comment = "#>",

    fig.path = "man/figures/README-",

    out.width = "100%"

)

```

# webbotparseR  

[![Codecov test coverage](https://codecov.io/gh/schochastics/webbotparseR/branch/main/graph/badge.svg)](https://app.codecov.io/gh/gesistsa/webbotparseR?branch=main)

[![R-CMD-check](https://github.com/schochastics/webbotparseR/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/gesistsa/webbotparseR/actions/workflows/R-CMD-check.yaml)

webbotparseR allows to parse search engine results that where scraped with the [WebBot](https://github.com/gesiscss/WebBot) browser extension. A similar python library is [also available](https://github.com/gesiscss/WebBot-tutorials).

## Installation

You can install the development version of webbotparseR like so:

``` r

remotes::install_github("schochastics/webbotparseR")

```

The package contains an example html from a google search on climate change.

```{r ex_file}

library(webbotparseR)

ex_file <- system.file("www.google.com_climatechange_text_2023-03-16_08_16_11.html", package = "webbotparseR")

```

Such search results can be parsed via the function `parse_search_results()`. The parameter `engine` is used to specify the

search engine and the search type.  

```{r parse}

output <- parse_search_results(path = ex_file, engine = "google text")

output

```

Note that images are always returned base64 encoded.

```{r image}

output$image[1]

```

The function `base64_to_img()` can be used to decode the image and save it in an appropriate format.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/gesistsa/webbotparser

Awesome Lists containing this project

README