Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/rahulvictor12/the-movie-database-data-scrapper

A Python web scraper that collects movie data from The Movie Database (TMDB). It uses `requests`, `BeautifulSoup`, and `pandas` to extract titles, ratings, genres, and cast details from multiple pages. The data is structured into DataFrames and saved as a CSV, perfect for analysis or integration into projects.
https://github.com/rahulvictor12/the-movie-database-data-scrapper

beautifulsoup colab-notebook dataframes numpy pandas python requests testing webscraping

Last synced: 30 days ago
JSON representation

Host: GitHub
URL: https://github.com/rahulvictor12/the-movie-database-data-scrapper
Owner: rahulvictor12
Created: 2024-12-25T11:59:25.000Z (about 1 month ago)
Default Branch: main
Last Pushed: 2024-12-25T12:12:12.000Z (about 1 month ago)
Last Synced: 2024-12-25T13:18:44.624Z (about 1 month ago)
Topics: beautifulsoup, colab-notebook, dataframes, numpy, pandas, python, requests, testing, webscraping
Language: Jupyter Notebook
Homepage:
Size: 146 KB
Stars: 1
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# The Movie DataBase Movie Data Scraper

## Description
This project is a Python-based web scraper designed to extract movie-related information from [The Movie Database (TMDB)](https://www.themoviedb.org). Using libraries like `requests` and `BeautifulSoup`, it collects data such as movie titles, ratings, genres, and cast details. The extracted data is organized into structured formats using Pandas and exported to a CSV file for further analysis.

## Features
- **Web Scraping**: Extracts movie details from multiple pages of the TMDB website.
- **Data Storage**: Combines data into Pandas DataFrames and exports as CSV.
- **Error Handling**: Implements robust mechanisms for handling request failures.
- **Reusable Functions**: Includes modular user-defined functions for easy extensibility.

## Prerequisites
Ensure you have the following installed:
- Python 3.7+
- Pip (Python package manager)

## Setup Instructions

1. Clone this repository:
```bash
git clone
cd tmdb-movie-data-scraper
```

2. Install the required Python libraries:
```bash
pip install -r requirements.txt
```

3. Run the script:
```bash
python main.py
```

## Usage

### 1. Scrape Data
The script fetches data from the first 6 pages of TMDB and combines the results into a single CSV file.

### 2. Modify Parameters
You can customize the number of pages to scrape or adjust headers by editing the `main.py` script.

### 3. Output
The combined movie data is saved as `Combined_Data.csv` in the project directory.

## Outputs
- **CSV File**: Contains the following columns:
- Title
- Rating
- Genre(s)
- Cast

Example output:
| Title | Rating | Genres | Cast |
|--------------------|--------|----------------|-------------------|
| The Shawshank... | 9.3 | Drama, Crime | Tim Robbins, ... |
| The Godfather | 9.2 | Drama, Crime | Marlon Brando,... |

## Built With
- Python
- Requests
- BeautifulSoup
- Pandas

## Contributing
Contributions are welcome! Please follow these steps:
1. Fork the repository.
2. Create a feature branch.
3. Submit a pull request.

## License
This project is licensed under the MIT License. See the LICENSE file for details.

## Acknowledgments
- [The Movie Database (TMDB)](https://www.themoviedb.org) for providing the data.