Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/evan-buss/imdb-web-scraper

IMDB web scraper using Scrapy framework. Flask server for data visualization
https://github.com/evan-buss/imdb-web-scraper

Last synced: 8 days ago
JSON representation

IMDB web scraper using Scrapy framework. Flask server for data visualization

Host: GitHub
URL: https://github.com/evan-buss/imdb-web-scraper
Owner: evan-buss
Created: 2019-03-28T00:56:46.000Z (almost 6 years ago)
Default Branch: master
Last Pushed: 2023-05-01T20:33:35.000Z (almost 2 years ago)
Last Synced: 2024-12-19T09:46:25.809Z (2 months ago)
Language: Python
Homepage:
Size: 13.2 MB
Stars: 2
Watchers: 2
Forks: 0
Open Issues: 4
Metadata Files:
- Readme: readme.md

Awesome Lists containing this project

README

# IMDB Web Scraper

Scrapy is a python framework for scraping data and crawling websites. I have created various crawlers to learn Scrapy and improve my Python skills

## Project

This repository contains various Scrapy demo spiders.
- Quotes Scraper
- IMDB movies scraper
- Books Scraper

It also contains a simple http server to view the scraped data from the spiders.

The spiders save their data to an SQLite3 database. The website queries data from
the database.

# Setup
*I recommend using virtualenv to isolate your project dependencies*

- Install virtualenv
- `sudo pip3 install --user virtualenv`
- *May have to use `sudo -H` with newer versions*

- Create a new virtual environment with venv
- `virtualenv env`
- Active the virtual environment
- `source env/bin/activate`
- Install the package dependencies
- `pip install -r requirements.txt`
- Scrapy + Dependencies (Spiders)
- Flask + Dependencies (Server)

## Running the Scrapy Spiders

- Run a spider using the name defined within the class
- `scrapy crawl movies`
- Current List of Available Spiders:
- `movies`
- Scrapes IMDB movie data and saves it to an SQLite3 database
- `quotes`
- `books`

- Run scrapy interactively to test html selectors
- `scrapy shell [url]`
- You can then execute selections
- Ex) `response.css('div.summary::text').get()`

## Running the Flask Server

- Set the shell environment variables
- `set FLASK_APP=server`
- `set FLASK_ENV=development`
- Start the server
- `flask run`
- Site Pages
- /movies
- List of all movies contained in database
- Supports title search and pagination

## Simple Flask Site to View and Search Scraped Data
![homepage](https://github.com/evan-buss/imdb-web-scraper/blob/master/screenshot/Screen%20Shot%202019-10-07%20at%2020.57.41.png)