Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/evan-buss/imdb-web-scraper
IMDB web scraper using Scrapy framework. Flask server for data visualization
https://github.com/evan-buss/imdb-web-scraper
Last synced: 30 days ago
JSON representation
IMDB web scraper using Scrapy framework. Flask server for data visualization
- Host: GitHub
- URL: https://github.com/evan-buss/imdb-web-scraper
- Owner: evan-buss
- Created: 2019-03-28T00:56:46.000Z (over 5 years ago)
- Default Branch: master
- Last Pushed: 2023-02-08T00:44:19.000Z (almost 2 years ago)
- Last Synced: 2023-03-03T11:34:48.181Z (over 1 year ago)
- Language: Python
- Homepage:
- Size: 13.2 MB
- Stars: 1
- Watchers: 2
- Forks: 0
- Open Issues: 3
-
Metadata Files:
- Readme: readme.md
Awesome Lists containing this project
README
# IMDB Web Scraper
Scrapy is a python framework for scraping data and crawling websites. I have created various crawlers to learn Scrapy and improve my Python skills
## Project
This repository contains various Scrapy demo spiders.
- Quotes Scraper
- IMDB movies scraper
- Books ScraperIt also contains a simple http server to view the scraped data from the spiders.
The spiders save their data to an SQLite3 database. The website queries data from
the database.# Setup
*I recommend using virtualenv to isolate your project dependencies*- Install virtualenv
- `sudo pip3 install --user virtualenv`
- *May have to use `sudo -H` with newer versions*- Create a new virtual environment with venv
- `virtualenv env`
- Active the virtual environment
- `source env/bin/activate`
- Install the package dependencies
- `pip install -r requirements.txt`
- Scrapy + Dependencies (Spiders)
- Flask + Dependencies (Server)## Running the Scrapy Spiders
- Run a spider using the name defined within the class
- `scrapy crawl movies`
- Current List of Available Spiders:
- `movies`
- Scrapes IMDB movie data and saves it to an SQLite3 database
- `quotes`
- `books`- Run scrapy interactively to test html selectors
- `scrapy shell [url]`
- You can then execute selections
- Ex) `response.css('div.summary::text').get()`## Running the Flask Server
- Set the shell environment variables
- `set FLASK_APP=server`
- `set FLASK_ENV=development`
- Start the server
- `flask run`
- Site Pages
- /movies
- List of all movies contained in database
- Supports title search and pagination
## Simple Flask Site to View and Search Scraped Data
![homepage](https://github.com/evan-buss/imdb-web-scraper/blob/master/screenshot/Screen%20Shot%202019-10-07%20at%2020.57.41.png)