https://github.com/santhoshse7en/imdb-scrapy

A fun projects made using Scrapy. The Spiders included in this are able to extract Movie, TV-Series, TV-Movies based on year and title type. A lot more to come ahead
https://github.com/santhoshse7en/imdb-scrapy

beautifulsoup4 imdb imdb-dataset imdb-scraper scrapy scrapy-crawler scrapy-framework scrapy-spider

Last synced: 12 months ago
JSON representation

A fun projects made using Scrapy. The Spiders included in this are able to extract Movie, TV-Series, TV-Movies based on year and title type. A lot more to come ahead

Host: GitHub
URL: https://github.com/santhoshse7en/imdb-scrapy
Owner: santhoshse7en
License: mit
Created: 2020-06-03T08:27:57.000Z (about 6 years ago)
Default Branch: master
Last Pushed: 2020-06-03T09:24:44.000Z (about 6 years ago)
Last Synced: 2025-05-31T00:47:26.178Z (about 1 year ago)
Topics: beautifulsoup4, imdb, imdb-dataset, imdb-scraper, scrapy, scrapy-crawler, scrapy-framework, scrapy-spider
Language: Python
Size: 97.7 KB
Stars: 3
Watchers: 0
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

## IMDb_Scraper
A fun projects made using Scrapy. The Spiders included in this are able to extract Movie, TV-Series, TV-Movies based on year and title type. A lot more to come features ahead

## Run

### Create and activate virtual env

**Python3**

```python

>> python3 -m venv venv
>> . ./venv/bin/activate

```

**Anaconda**

```python

>> conda create --name venv
>> conda activate venv

```

### Dependencies

* Scrapy

### Extracted information

IMDb Scraper extracts the following attributes from IMDb websites. Also, have a look at an examplary [json](https://github.com/santhoshse7en/IMDb_Scraper/blob/master/example/sample.json) and [CSV](https://github.com/santhoshse7en/IMDb_Scraper/blob/master/example/sample.csv) file extracted by IMDb Scraper.

* Movie Name
* Movie ID
* Movie URL
* Poster
* Year
* Genre
* RunTime
* Certificate
* Rating
* MetaScore
* Plot
* Votes
* Gross
* Director
* Director ID
* Director URL
* Cast
* Cast ID
* Cast URL

### Install dependencies

Use the package manager [pip](https://pip.pypa.io/en/stable/) to install following

```python

>> pip install -r requirements.txt
>> pip install scrapy

```

**Anaconda**

```python

>> conda install scrapy -y

```

### TitleType was the main parameter to different title alongside release year to sort the release

* feature
* tv_series
* tv_movie
* tv_episode
* tv_special
* tv_miniseries
* documentary
* video_game
* short
* video
* tv_short

### Usage

```python

>> scrapy crawl imdb_year -a title_type=feature -a year=2019

```

**Save the output as a file**

```python

>> scrapy crawl imdb_year -a title_type=feature -a year=2019 -o output.csv

>> scrapy crawl imdb_year -a title_type=feature -a year=2019 -o output.json

```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/santhoshse7en/imdb-scrapy

Awesome Lists containing this project

README