Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/gabrielmazzotta/nlp-clustering--movie-similarity-from-plot-summaries
A Python-based movie recommendation system leveraging NLP and clustering techniques. This project includes data processing, vectorization of plot summaries, and the implementation of recommendation algorithms to suggest similar movies based on user input.
https://github.com/gabrielmazzotta/nlp-clustering--movie-similarity-from-plot-summaries
clustering cosine-similarity hierarchical-clustering kmeans lemmatization nlp recommendation-engine scikit-learn similarity-score spacy tokenization
Last synced: 7 days ago
JSON representation
A Python-based movie recommendation system leveraging NLP and clustering techniques. This project includes data processing, vectorization of plot summaries, and the implementation of recommendation algorithms to suggest similar movies based on user input.
- Host: GitHub
- URL: https://github.com/gabrielmazzotta/nlp-clustering--movie-similarity-from-plot-summaries
- Owner: GabrielMazzotta
- Created: 2024-09-16T20:59:33.000Z (17 days ago)
- Default Branch: main
- Last Pushed: 2024-09-16T21:07:49.000Z (17 days ago)
- Last Synced: 2024-09-26T20:23:07.217Z (7 days ago)
- Topics: clustering, cosine-similarity, hierarchical-clustering, kmeans, lemmatization, nlp, recommendation-engine, scikit-learn, similarity-score, spacy, tokenization
- Language: Jupyter Notebook
- Homepage:
- Size: 1.21 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# NLP & Clustering -Movie Similarity from Plot Summaries
## Project Description
Natural Language Processing (NLP) is an exciting field of study for data scientists where they develop algorithms that can make sense out of conversational language used by humans. In this Project, I'll use NLP to find the degree of similarity between movies based on their plots available on IMDb and Wikipedia.
## Dataset
The dataset contains the titles of the top 100 movies on [IMDb](https://www.imdb.com/) as well as each movie's plot summary from both IMDb and [Wikipedia](https://en.wikipedia.org/).
## Objective
To Find the top 3 similar movies within the same cluster.
## Tools and Libraries
- Tokenization and Lemmatization (spaCy)
- TF-IDF (scikit-learn)
- KMeans
- Cosine Similarity / Similarity Score
- Hierarchical Clustering (SciPy)
- Seaborn / Matplotlib
- Pandas
- Numpy