https://github.com/tylerlittlefield/glassdoor-scraper

:door: Scrape Glassdoor reviews with rvest
https://github.com/tylerlittlefield/glassdoor-scraper

rstats rvest scraper scraping-websites

Last synced: 7 months ago
JSON representation

:door: Scrape Glassdoor reviews with rvest

Host: GitHub
URL: https://github.com/tylerlittlefield/glassdoor-scraper
Owner: tylerlittlefield
Created: 2020-08-08T02:39:35.000Z (about 5 years ago)
Default Branch: master
Last Pushed: 2020-08-09T20:06:38.000Z (about 5 years ago)
Last Synced: 2025-01-08T08:16:44.831Z (9 months ago)
Topics: rstats, rvest, scraper, scraping-websites
Language: R
Homepage:
Size: 16.6 KB
Stars: 3
Watchers: 2
Forks: 1
Open Issues: 1
Metadata Files:
- Readme: README.Rmd

Awesome Lists containing this project

README

          ---

output: github_document

---

```{r, include = FALSE}

knitr::opts_chunk$set(

  collapse = TRUE,

  comment = "#>"

)

```

# glassdoor-scraper

A demonstration of scraping glassdoor reviews using `rvest`. Note that the underlying functions rely on xpath's that I copied by simply clicking what I wanted and inspecting the element. These will probably change over time and consequently, the scripts will fail. As of `r Sys.Date()`, it seems to work pretty well.

```{r}

source("R/scrape.R")

# example urls, we'll go with Google

tesla_url <- "https://www.glassdoor.com/Reviews/Tesla-Reviews-E43129"

apple_url <- "https://www.glassdoor.com/Reviews/Apple-Reviews-E1138"

google_url <- "https://www.glassdoor.com/Reviews/Google-Reviews-E9079"

# loop through n pages

pages <- 1:5

out <- lapply(pages, function(page) {

  Sys.sleep(1)

  try_scrape_reviews(google_url, page)

})

# filter for stuff we successfully extracted

reviews <- bind_rows(Filter(Negate(is.null), out), .id = "page")

# remove any duplicates, parse the review time

reviews %>%

  distinct() %>%

  mutate(

    review_time = clean_review_datetime(review_time_raw),

    page = as.numeric(page)

  ) %>% 

  select(

    page,

    review_id,

    review_time_raw,

    review_time,

    review_title,

    employee_role,

    employee_history,

    employeer_pros,

    employeer_cons,

    employeer_rating,

    work_life_balance,

    culture_values,

    career_opportunities,

    compensation_and_benefits,

    senior_management

  ) %>% 

  glimpse()

```

## Session Info

```{r}

sessioninfo::session_info()

```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/tylerlittlefield/glassdoor-scraper

Awesome Lists containing this project

README