An open API service indexing awesome lists of open source software.

https://github.com/dmuth/splunk-glassdoor

Splunk app to graph Glassdoor reviews of companies
https://github.com/dmuth/splunk-glassdoor

Last synced: 10 months ago
JSON representation

Splunk app to graph Glassdoor reviews of companies

Awesome Lists containing this project

README

          

# Splunking Glassdoor Reviews

This project is based largely on the work I did for
my Splunk Yelp project
not too long ago. The impetus for it came about when a local tech company pinged
me about the possibility of working for them, and I wanted to see what other people
thought of their company. Thus, Splunk Glassdoor was born!

This app will tell you the following:

- Avg ratings/number of ratings over time
- Recent pros, cons, and advice to management
- Tag cloud of words from pros, cons, and advice to management

In real-life, I've used this app to check out potential employers.

This app uses Splunk Lab, an open-source
app I built to effortlessly run Splunk in a Docker container.

# Screenshots

Facebook Glassdoor Reviews
Netflix Glassdoor Reviews
QVC Glassdoor Reviews

## Requirements

- Docker

## Running The App

- `SPLUNK_START_ARGS=--accept-license bash <(curl -s https://raw.githubusercontent.com/dmuth/splunk-glassdoor/master/go.sh ) ./urls.txt`
- The file `urls.txt` should contain one URL per line, and each URL should be a business's review page from Glassdoor.
- Since some businesees can have thousands of reviews, this script will pick up where it left off if interrupted.
- This grabs the HTML from review pages uses Beautiful Soup to parse the reviews and then export them to the `logs/` directory. I looked into using Glassdoor's API, but when I went to the signup page, it was a broken page that was mostly blank. So I tried 🤷.
- The script is single threaded, but reasonably efficient. (and I don't want to DoS Glassdoor's website) I've clocked downloads at 5,000 in a little over 8 minutes, or about 600 reviews a minute.
- Go to https://localhost:8000/, log in with the password you set, and you'll see the Glassdoor Reviews Dashboard.

## Troubleshooting

- Q: Dashboards show ` Search is waiting for input...`
- A: You need to select a venue in the dropdown! If no items are in the dropdown, that means no data was ingested. Did you run the command to download some Glassdor reviews?

## Development

Mostly for my benefit, these are the scripts that I use to make my life easier:

- `./bin/build.sh` - Build the Python and Splunk Docker containers
- `./bin/push.sh` - Upload the Docker containers to Docker Hub
- `./bin/devel.sh` - Build and run the Splunk Docker container with an interactive shell
- `./bin/run-download-reviews.sh` - Run the script to download reviews directly
- `./bin/stop.sh` - Stop the Splunk container
- `./bin/clean.sh` - Stop Splunk, and remove the data and logs

## Credits

I'd like to thank Splunk, for having such a kick-ass data
analytics platform, and the operational excellence which it embodies.

Also:
- This text to ASCII art generator, for the logo I used in the script.

## Bugs

- Excessive CPU Usage
- In Docker on OS/X, if you have thousands and thousands of files, Splunk persistently uses like 70% of the CPU. Not good. I think it's more a Docker thing than Splunk thing, but I could write a workaround as follows:
- Download reviews to a SQLite database with SQLAlchemy
- When downloads are done, dump all reviews for that business to a single JSON file in the logs/ directory
- Workaround: Run `index=main earliest=-10y | stats count` and when the number of events stops going up, stop Splunk, remove the contents of the `logs/` directory, and restart Splunk.
- Sometimes you'll see a yellow exclamation point with the text "Field 'words' does not exist in the data" on the Advice Tag Cloud. The underlying search appears to be executing normally, so I can trying to sort this one out.

## Copyright

Splunk is copyright by Splunk. Apps within Splunk Lab are copyright their creators,
and made available under the respective license.

## Contact

- Email me
- Twitter
- Facebook