An open API service indexing awesome lists of open source software.

https://github.com/tylerlittlefield/glassdoor-scraper

:door: Scrape Glassdoor reviews with rvest
https://github.com/tylerlittlefield/glassdoor-scraper

rstats rvest scraper scraping-websites

Last synced: 7 months ago
JSON representation

:door: Scrape Glassdoor reviews with rvest

Awesome Lists containing this project

README

          

---
output: github_document
---

```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>"
)
```

# glassdoor-scraper

A demonstration of scraping glassdoor reviews using `rvest`. Note that the underlying functions rely on xpath's that I copied by simply clicking what I wanted and inspecting the element. These will probably change over time and consequently, the scripts will fail. As of `r Sys.Date()`, it seems to work pretty well.

```{r}
source("R/scrape.R")

# example urls, we'll go with Google
tesla_url <- "https://www.glassdoor.com/Reviews/Tesla-Reviews-E43129"
apple_url <- "https://www.glassdoor.com/Reviews/Apple-Reviews-E1138"
google_url <- "https://www.glassdoor.com/Reviews/Google-Reviews-E9079"

# loop through n pages
pages <- 1:5
out <- lapply(pages, function(page) {
Sys.sleep(1)
try_scrape_reviews(google_url, page)
})

# filter for stuff we successfully extracted
reviews <- bind_rows(Filter(Negate(is.null), out), .id = "page")

# remove any duplicates, parse the review time
reviews %>%
distinct() %>%
mutate(
review_time = clean_review_datetime(review_time_raw),
page = as.numeric(page)
) %>%
select(
page,
review_id,
review_time_raw,
review_time,
review_title,
employee_role,
employee_history,
employeer_pros,
employeer_cons,
employeer_rating,
work_life_balance,
culture_values,
career_opportunities,
compensation_and_benefits,
senior_management
) %>%
glimpse()
```

## Session Info

```{r}
sessioninfo::session_info()
```