Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/jerinpious/movie-recommendation-system
A content-based movie recommendation system built using Python. The system processes movie data, extracts relevant features, and provides recommendations based on user preferences
https://github.com/jerinpious/movie-recommendation-system
content-based-recommendation data-analysis jupyter-notebook machine-learning pandas python streamlit
Last synced: about 1 month ago
JSON representation
A content-based movie recommendation system built using Python. The system processes movie data, extracts relevant features, and provides recommendations based on user preferences
- Host: GitHub
- URL: https://github.com/jerinpious/movie-recommendation-system
- Owner: jerinpious
- Created: 2024-11-28T02:20:59.000Z (about 2 months ago)
- Default Branch: main
- Last Pushed: 2024-11-29T06:45:37.000Z (about 2 months ago)
- Last Synced: 2024-11-29T07:22:42.828Z (about 2 months ago)
- Topics: content-based-recommendation, data-analysis, jupyter-notebook, machine-learning, pandas, python, streamlit
- Language: Jupyter Notebook
- Homepage:
- Size: 10.1 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Movie Recommendation System
This project is a **content-based movie recommendation system** built using Python. The system processes movie data, extracts relevant features, and provides recommendations based on user preferences. It leverages **TMDb 5000 Movies** and **Credits datasets** for analysis and prediction.
- Data Source: [Kaggle TMDb 5000 Dataset](https://www.kaggle.com/datasets/tmdb/tmdb-movie-metadata)
---## Screenshots
![Project_image](img/ss_1.png)![Project_image](img/ss_2.png)
## Features
- **Data Merging**: Combines information from two datasets (`tmdb_5000_movies.csv` and `tmdb_5000_credits.csv`) to create a unified database.
- **Feature Engineering**: Extracts relevant features like genres, keywords, cast, and crew.
- **Content-Based Recommendations**: Recommends movies based on shared features, such as:
- Genre similarity
- Common keywords
- Top cast members
- Shared directors
- **Optimized Data Preprocessing**:
- Handles null values and removes duplicates.
- Converts JSON-like data (genres, keywords, cast) into readable and usable formats.
- **Customized Casting**: Limits the cast feature to the top 3 main actors.---
## Dataset Details
### 1. TMDb 5000 Movies Dataset
- **Source**: Kaggle
- **Features**:
- Title, Budget, Revenue
- Genres, Keywords
- Popularity, Vote Count, Vote Average
- Overview, Homepage, Release Date### 2. TMDb 5000 Credits Dataset
- **Source**: Kaggle
- **Features**:
- Title
- Cast (detailed information about actors)
- Crew (including directors, writers, etc.)---
## Data Preprocessing Steps
1. **Merging Datasets**: Unified the `movies` and `credits` datasets on the `title` column.
2. **Selected Features**: Filtered for the most relevant columns:
- `movie_id`, `title`, `overview`, `genres`, `keywords`, `cast`, `crew`
3. **Null Handling**:
- Dropped rows with missing values in critical columns.
4. **Duplicate Removal**:
- Ensured no repeated entries in the data.
5. **Data Transformation**:
- Extracted and converted JSON-like data in `genres`, `keywords`, `cast`, and `crew` columns into usable lists.
6. **Feature Customization**:
- Limited the cast feature to the top 3 actors using the `top3()` function.
- Extracted directors using the `fetch_director()` function.---
## Project Workflow
1. **Exploratory Data Analysis (EDA)**:
- Analyzed data distribution and cleaned missing values.
2. **Feature Engineering**:
- Processed `genres` and `keywords` into lists for similarity computation.
3. **Modeling**:
- Used cosine similarity to compute pairwise movie similarities.
- Built a recommendation engine based on the computed similarity matrix.
4. **Integration**:
- Created functions to fetch recommendations for a given movie title.---
## Installation and Setup
1. Clone the repository:
```bash
git clone https://github.com/your-username/movie-recommendation-system.git
cd movie-recommendation-system
```
2. Install the required dependencies:
```bash
pip install -r requirements.txt
```
3. Add the datasets:
- Place `tmdb_5000_movies.csv` and `tmdb_5000_credits.csv` inside the `data/` directory.4. Run the project:
```bash
python main.py
```---
## Usage
- **Input**: Provide a movie title.
- **Output**: A list of recommended movies with high similarity based on:
- Shared genres
- Common keywords
- Similar cast or crewExample:
```python
recommendations = get_recommendations("Avatar")
print(recommendations)
```---
## Future Enhancements
- Implement **hybrid recommendations** by integrating collaborative filtering.
- Develop a **web interface** using Flask or Streamlit for user interaction.---
## Acknowledgments
- Data Source: [Kaggle TMDb 5000 Dataset](https://www.kaggle.com/datasets/tmdb/tmdb-movie-metadata)
- Inspiration: Building a robust content-based recommendation system.---
Let me know if you'd like to add anything else! 😊