https://github.com/dmuth/splunk-glassdoor
Splunk app to graph Glassdoor reviews of companies
https://github.com/dmuth/splunk-glassdoor
Last synced: 10 months ago
JSON representation
Splunk app to graph Glassdoor reviews of companies
- Host: GitHub
- URL: https://github.com/dmuth/splunk-glassdoor
- Owner: dmuth
- License: gpl-3.0
- Created: 2019-06-22T16:02:57.000Z (almost 7 years ago)
- Default Branch: main
- Last Pushed: 2023-05-22T22:16:38.000Z (about 3 years ago)
- Last Synced: 2025-04-12T14:51:26.905Z (about 1 year ago)
- Language: Shell
- Size: 1.27 MB
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: LICENSE
- Code of conduct: CODE_OF_CONDUCT.md
Awesome Lists containing this project
README
# Splunking Glassdoor Reviews
This project is based largely on the work I did for
my Splunk Yelp project
not too long ago. The impetus for it came about when a local tech company pinged
me about the possibility of working for them, and I wanted to see what other people
thought of their company. Thus, Splunk Glassdoor was born!
This app will tell you the following:
- Avg ratings/number of ratings over time
- Recent pros, cons, and advice to management
- Tag cloud of words from pros, cons, and advice to management
In real-life, I've used this app to check out potential employers.
This app uses Splunk Lab, an open-source
app I built to effortlessly run Splunk in a Docker container.
# Screenshots
## Requirements
- Docker
## Running The App
- `SPLUNK_START_ARGS=--accept-license bash <(curl -s https://raw.githubusercontent.com/dmuth/splunk-glassdoor/master/go.sh ) ./urls.txt`
- The file `urls.txt` should contain one URL per line, and each URL should be a business's review page from Glassdoor.
- Since some businesees can have thousands of reviews, this script will pick up where it left off if interrupted.
- This grabs the HTML from review pages uses Beautiful Soup to parse the reviews and then export them to the `logs/` directory. I looked into using Glassdoor's API, but when I went to the signup page, it was a broken page that was mostly blank. So I tried 🤷.
- The script is single threaded, but reasonably efficient. (and I don't want to DoS Glassdoor's website) I've clocked downloads at 5,000 in a little over 8 minutes, or about 600 reviews a minute.
- Go to https://localhost:8000/, log in with the password you set, and you'll see the Glassdoor Reviews Dashboard.
## Troubleshooting
- Q: Dashboards show ` Search is waiting for input...`
- A: You need to select a venue in the dropdown! If no items are in the dropdown, that means no data was ingested. Did you run the command to download some Glassdor reviews?
## Development
Mostly for my benefit, these are the scripts that I use to make my life easier:
- `./bin/build.sh` - Build the Python and Splunk Docker containers
- `./bin/push.sh` - Upload the Docker containers to Docker Hub
- `./bin/devel.sh` - Build and run the Splunk Docker container with an interactive shell
- `./bin/run-download-reviews.sh` - Run the script to download reviews directly
- `./bin/stop.sh` - Stop the Splunk container
- `./bin/clean.sh` - Stop Splunk, and remove the data and logs
## Credits
I'd like to thank Splunk, for having such a kick-ass data
analytics platform, and the operational excellence which it embodies.
Also:
- This text to ASCII art generator, for the logo I used in the script.
## Bugs
- Excessive CPU Usage
- In Docker on OS/X, if you have thousands and thousands of files, Splunk persistently uses like 70% of the CPU. Not good. I think it's more a Docker thing than Splunk thing, but I could write a workaround as follows:
- Download reviews to a SQLite database with SQLAlchemy
- When downloads are done, dump all reviews for that business to a single JSON file in the logs/ directory
- Workaround: Run `index=main earliest=-10y | stats count` and when the number of events stops going up, stop Splunk, remove the contents of the `logs/` directory, and restart Splunk.
- Sometimes you'll see a yellow exclamation point with the text "Field 'words' does not exist in the data" on the Advice Tag Cloud. The underlying search appears to be executing normally, so I can trying to sort this one out.
## Copyright
Splunk is copyright by Splunk. Apps within Splunk Lab are copyright their creators,
and made available under the respective license.
## Contact


