Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/ramchaik/cinebrain
CineBrain uses ML & NLP to analyze movies and recommend similar ones based on user preferences through cosine similarity.
https://github.com/ramchaik/cinebrain
cosine-similarity flask htmx kaggle machine-learning movie-recomendation-system nlp nltk numpy pandas sklearn tailwind tmdb tmdb-api unsupervised-learning vectorization
Last synced: 2 days ago
JSON representation
CineBrain uses ML & NLP to analyze movies and recommend similar ones based on user preferences through cosine similarity.
- Host: GitHub
- URL: https://github.com/ramchaik/cinebrain
- Owner: ramchaik
- License: mit
- Created: 2024-08-16T09:57:13.000Z (4 months ago)
- Default Branch: main
- Last Pushed: 2024-08-16T18:38:49.000Z (4 months ago)
- Last Synced: 2024-11-10T04:13:54.150Z (about 1 month ago)
- Topics: cosine-similarity, flask, htmx, kaggle, machine-learning, movie-recomendation-system, nlp, nltk, numpy, pandas, sklearn, tailwind, tmdb, tmdb-api, unsupervised-learning, vectorization
- Language: Jupyter Notebook
- Homepage:
- Size: 8.95 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# 🎬 CineBrain: Movie Recommender System
CineBrain is an intelligent movie recommender system that suggests similar movies based on user selection. It utilizes the TMDB 5000 Movie Dataset and employs advanced natural language processing techniques to provide accurate recommendations.
## 🚀 Demo
https://github.com/user-attachments/assets/0a67e42f-faa6-4059-9cc0-a00dcebe53fb
## 📚 Table of Contents
- [Project Overview](#-project-overview)
- [Dataset](#-dataset)
- [Model Architecture](#-model-architecture)
- [Key Features](#-key-features)
- [Directory Structure](#-directory-structure)
- [Installation](#-installation)
- [Usage](#-usage)
- [Technologies Used](#-technologies-used)
- [Future Improvements](#-future-improvements)
- [Contributing](#-contributing)
- [License](#-license)## 🌟 Project Overview
CineBrain is designed to enhance the movie discovery experience by leveraging machine learning and natural language processing techniques. The system analyzes various aspects of movies, including overview, cast, crew, genres, and keywords, to generate meaningful recommendations.
## 📊 Dataset
The project uses the [TMDB 5000 Movie Dataset](https://www.kaggle.com/datasets/tmdb/tmdb-movie-metadata) from Kaggle, which includes metadata on approximately 5,000 movies from The Movie Database (TMDb).
## 🧠 Model Architecture
The recommendation model follows these key steps:
1. **Data Loading and Preprocessing:**
- Load the dataset into a pandas DataFrame
- Handle missing values and drop null entries
- Convert JSON strings to tags for cast and crew information2. **Feature Engineering:**
- Tokenize movie overviews
- Create a cumulative tag for each movie, combining information from overview, crew, cast, genres, and keywords3. **Vectorization:**
- Utilize CountVectorizer from scikit-learn
- Set dimension size to 5000 and remove English stop words
- Apply stemming using PorterStemmer from NLTK to reduce word duplications4. **Similarity Calculation:**
- Compute cosine similarity between movie vectors5. **Recommendation Generation:**
- Implement a recommendation method that calculates cosine similarity between the selected movie and all other movies in the dataset
- Return the top 5 movies with the highest similarity scores## ✨ Key Features
- Select a movie from the list to receive top 5 movie recommendations
- User-friendly interface built with Flask, HTMX, and Tailwind CSS
- Efficient data processing and similarity calculation for quick recommendations## 📁 Directory Structure
```sh
cinebrain/
│
├── app/ # Flask application directory
│
├── data/ # Data directory
│ ├── processed_df.pkl
│ ├── tmdb_5000_credits.csv
│ ├── tmdb_5000_movies.csv
│ └── similarity.npy
│
├── recommender_system.ipynb # Jupyter notebook for model development
├── run.py # Script to run the Flask application
├── .env # Environment file for API key
└── README.md
```## 🔑 API Setup
This project uses The Movie Database (TMDb) API to fetch additional movie information. To use the API:
1. Create an account on [The Movie Database](https://www.themoviedb.org/)
2. Go to your account settings and navigate to the API section
3. Request an API key for developer use
4. Create a `.env` file in the root directory and add your key:```sh
TMDB_API_KEY=your_api_key_here
```⚠️ Keep your API key confidential and never share it publicly.
## 🛠️ Installation
1. Clone the repository:
```sh
git clone https://github.com/ramchaik/cinebrain.git
cd cinebrain
```2. Create and activate a virtual environment:
```sh
python -m venv venv
source venv/bin/activate # On Windows, use venv\Scripts\activate
```3. Install the required dependencies:
```sh
pip install -r requirements.txt
```## 🖥️ Usage
1. Run the Flask application:
```sh
python run.py
```2. Open a web browser and navigate to `http://localhost:5000`
3. Select a movie from the list to receive recommendations
## 🛠️ Technologies Used
- Python
- pandas
- scikit-learn
- NLTK
- Flask
- HTMX
- Tailwind CSS## 🔮 Future Improvements
- Implement user authentication and personalized recommendations
- Integrate real-time data updates from TMDb API
- Enhance the user interface with movie posters and additional details
- Develop a mobile application for on-the-go recommendations## 🤝 Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
## 📄 License
This project is open source and available under the [MIT License](LICENSE).