https://github.com/cack195/imdb_web_scraping

Data Extraction using beautiful soup
https://github.com/cack195/imdb_web_scraping

Last synced: 10 months ago
JSON representation

Data Extraction using beautiful soup

Host: GitHub
URL: https://github.com/cack195/imdb_web_scraping
Owner: cack195
Created: 2024-01-04T16:19:02.000Z (almost 2 years ago)
Default Branch: main
Last Pushed: 2024-01-10T14:24:44.000Z (almost 2 years ago)
Last Synced: 2025-01-16T02:28:35.530Z (11 months ago)
Language: Python
Size: 240 KB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# IMDb Movie Scraper and Database Manager

This project scrapes IMDb's top movies for various genres, fetches their reviews, and stores the data in a SQLite database.

## Features

- **Movie Data Scraping:** Utilizes BeautifulSoup to extract movie information and reviews from IMDb.
- **Database Management:** Stores scraped data in a SQLite database with genre-specific tables.
- **Error Handling:** Handles duplicates and potential errors during data insertion.

## Setup Steps

1. **Clone the Repository**

```bash
git clone https://github.com/your-username/IMdb_Web_Scraping.git
cd IMdb_Web_Scraping
```

2. **Environment Setup**

- Ensure you have Python installed (preferably Python 3.x).
- Install required dependencies:

```bash
pip install -r requirements.txt
```

3. **Database Configuration**

- The default database name is `movies.db`. Change the name in `database_manager.py` if required.

4. **Run the Scraper**

```bash
python main.py
```

## Project Structure

- **`main.py`:** Contains the main logic for scraping IMDb and storing data.
- **`database_manager.py`:** Manages SQLite database creation and data insertion.
- **`requirements.txt`:** Lists project dependencies.

## Viewing Database Diagram

- For viewing the database diagram, open the `draw.io` tool and import the `Database_Diagram` file located in the project root directory.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/cack195/imdb_web_scraping

Awesome Lists containing this project

README