Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/trflorian/series-heatmap
Scraper and heatmap plotter for episode ratings of series on IMDB
https://github.com/trflorian/series-heatmap
episodes heatmap imdb matplotlib-pyplot python ratings scraper selenium-webdriver series webscraping
Last synced: 15 days ago
JSON representation
Scraper and heatmap plotter for episode ratings of series on IMDB
- Host: GitHub
- URL: https://github.com/trflorian/series-heatmap
- Owner: trflorian
- License: mit
- Created: 2022-11-26T14:32:29.000Z (about 2 years ago)
- Default Branch: master
- Last Pushed: 2022-12-21T22:02:17.000Z (about 2 years ago)
- Last Synced: 2024-04-22T18:42:55.123Z (9 months ago)
- Topics: episodes, heatmap, imdb, matplotlib-pyplot, python, ratings, scraper, selenium-webdriver, series, webscraping
- Language: Python
- Homepage:
- Size: 5.6 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# IMDB Series Rating Scraper
## Introduction
This tool scrapes the website https://www.imdb.com for ratings of individual episodes of a series.
A csv file is generated to cache the ratings.
Using matplotlib, the tool then generates a heatmap representation of all episodes in the series.
Because this tools relies on scraping the html tree of the imdb page, it might break anytime.
Feel free to message me if the scraper doesn't work anymore or create a pull request with adjusted xpaths.| ![](examples/img/Dark.png) | ![](examples/img/Breaking_Bad.png) |
|---------|-----|
| ![](examples/img/Game_Of_Thrones.png) | ![](examples/img/NCIS_Naval_Criminal_Investigative_Service.png) |## Examples
### Data output
The following table shows data that is generated by the scraper for the first season of **Breaking Bad**.
For the full data output see `examples/data/Breaking Bad.csv`.| season | episode | name | rating |
|--------|---------|-------------------------------|--------|
| 1 | 1 | Pilot | 9.0 |
| 1 | 2 | Cat's in the Bag... | 8.6 |
| 1 | 3 | ...And the Bag's in the River | 8.7 |
| 1 | 4 | Cancer Man | 8.2 |
| 1 | 5 | Gray Matter | 8.3 |
| 1 | 6 | Crazy Handful of Nothin' | 9.3 |
| 1 | 7 | A No-Rough-Stuff-Type Deal | 8.8 |### Heatmap output
The following image shows an example of the heatmap that can be generated.
Heatmaps of some example series can be found under `examples/img/`.![](examples/img/Breaking_Bad.png)
## Quickstart
### Dependencies
- Python version `Python 3.9.13`
- Python packages see `requirements.txt`### Setup
1. Clone this repository
| **HTTPS** | `$ git clone https://github.com/trflorian/imdb-scraper-heatmap.git` |
| ---|---|
| **SSH** |`$ git clone [email protected]:trflorian/imdb-scraper-heatmap.git` |3. (Optional) Create a virtual environment for this project
4. Install the required python packages in your python environment.`$ python -m pip install -r requirements.txt`
5. Run `$ python scraper.py` to scrape the IMDB website for a specific series.
6. Run `$ python heatmap.py` to create a plot for the scraped series.### Usage
```
$ python .\examples\heatmap.py --helpusage: heatmap.py [-h] [-s] [-d] [-o] [-n NAME]
optional arguments:
-h, --help show this help message and exit
-s, --show show the heatmap plot instead of saving it
-d, --dark use dark mode for the plot style
-o, --override override existing plots, only used if show flag is not set
-n NAME, --name NAME name of the series, if not set the whole data directory will be scanned
```## Development
### Upload to Pypi
```python -m build```
```python -m twine upload --skip-existing dist/*```