https://github.com/santhoshse7en/imdb-scrapy
A fun projects made using Scrapy. The Spiders included in this are able to extract Movie, TV-Series, TV-Movies based on year and title type. A lot more to come ahead
https://github.com/santhoshse7en/imdb-scrapy
beautifulsoup4 imdb imdb-dataset imdb-scraper scrapy scrapy-crawler scrapy-framework scrapy-spider
Last synced: 10 months ago
JSON representation
A fun projects made using Scrapy. The Spiders included in this are able to extract Movie, TV-Series, TV-Movies based on year and title type. A lot more to come ahead
- Host: GitHub
- URL: https://github.com/santhoshse7en/imdb-scrapy
- Owner: santhoshse7en
- License: mit
- Created: 2020-06-03T08:27:57.000Z (almost 6 years ago)
- Default Branch: master
- Last Pushed: 2020-06-03T09:24:44.000Z (almost 6 years ago)
- Last Synced: 2025-05-31T00:47:26.178Z (about 1 year ago)
- Topics: beautifulsoup4, imdb, imdb-dataset, imdb-scraper, scrapy, scrapy-crawler, scrapy-framework, scrapy-spider
- Language: Python
- Size: 97.7 KB
- Stars: 3
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
## IMDb_Scraper
A fun projects made using Scrapy. The Spiders included in this are able to extract Movie, TV-Series, TV-Movies based on year and title type. A lot more to come features ahead
## Run
### Create and activate virtual env
**Python3**
```python
>> python3 -m venv venv
>> . ./venv/bin/activate
```
**Anaconda**
```python
>> conda create --name venv
>> conda activate venv
```
### Dependencies
* Scrapy
### Extracted information
IMDb Scraper extracts the following attributes from IMDb websites. Also, have a look at an examplary [json](https://github.com/santhoshse7en/IMDb_Scraper/blob/master/example/sample.json) and [CSV](https://github.com/santhoshse7en/IMDb_Scraper/blob/master/example/sample.csv) file extracted by IMDb Scraper.
* Movie Name
* Movie ID
* Movie URL
* Poster
* Year
* Genre
* RunTime
* Certificate
* Rating
* MetaScore
* Plot
* Votes
* Gross
* Director
* Director ID
* Director URL
* Cast
* Cast ID
* Cast URL
### Install dependencies
Use the package manager [pip](https://pip.pypa.io/en/stable/) to install following
```python
>> pip install -r requirements.txt
>> pip install scrapy
```
**Anaconda**
```python
>> conda install scrapy -y
```
### TitleType was the main parameter to different title alongside release year to sort the release
* feature
* tv_series
* tv_movie
* tv_episode
* tv_special
* tv_miniseries
* documentary
* video_game
* short
* video
* tv_short
### Usage
```python
>> scrapy crawl imdb_year -a title_type=feature -a year=2019
```
**Save the output as a file**
```python
>> scrapy crawl imdb_year -a title_type=feature -a year=2019 -o output.csv
>> scrapy crawl imdb_year -a title_type=feature -a year=2019 -o output.json
```