Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/cack195/imdb_web_scraping
Data Extraction using beautiful soup
https://github.com/cack195/imdb_web_scraping
Last synced: 8 days ago
JSON representation
Data Extraction using beautiful soup
- Host: GitHub
- URL: https://github.com/cack195/imdb_web_scraping
- Owner: cack195
- Created: 2024-01-04T16:19:02.000Z (about 1 year ago)
- Default Branch: main
- Last Pushed: 2024-01-10T14:24:44.000Z (about 1 year ago)
- Last Synced: 2024-11-15T14:19:33.596Z (2 months ago)
- Language: Python
- Size: 240 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# IMDb Movie Scraper and Database Manager
This project scrapes IMDb's top movies for various genres, fetches their reviews, and stores the data in a SQLite database.
## Features
- **Movie Data Scraping:** Utilizes BeautifulSoup to extract movie information and reviews from IMDb.
- **Database Management:** Stores scraped data in a SQLite database with genre-specific tables.
- **Error Handling:** Handles duplicates and potential errors during data insertion.## Setup Steps
1. **Clone the Repository**
```bash
git clone https://github.com/your-username/IMdb_Web_Scraping.git
cd IMdb_Web_Scraping
```2. **Environment Setup**
- Ensure you have Python installed (preferably Python 3.x).
- Install required dependencies:```bash
pip install -r requirements.txt
```3. **Database Configuration**
- The default database name is `movies.db`. Change the name in `database_manager.py` if required.
4. **Run the Scraper**
```bash
python main.py
```## Project Structure
- **`main.py`:** Contains the main logic for scraping IMDb and storing data.
- **`database_manager.py`:** Manages SQLite database creation and data insertion.
- **`requirements.txt`:** Lists project dependencies.## Viewing Database Diagram
- For viewing the database diagram, open the `draw.io` tool and import the `Database_Diagram` file located in the project root directory.