https://github.com/nikoshet/pyspark-movie-similarities
Using Spark In Python For Movie Similarities With Jaccard Index
https://github.com/nikoshet/pyspark-movie-similarities
jaccard-index movie-similarities pyspark spark
Last synced: 5 months ago
JSON representation
Using Spark In Python For Movie Similarities With Jaccard Index
- Host: GitHub
- URL: https://github.com/nikoshet/pyspark-movie-similarities
- Owner: nikoshet
- License: mit
- Created: 2020-09-23T18:17:04.000Z (almost 5 years ago)
- Default Branch: master
- Last Pushed: 2020-11-09T10:04:00.000Z (over 4 years ago)
- Last Synced: 2025-01-03T23:22:41.372Z (6 months ago)
- Topics: jaccard-index, movie-similarities, pyspark, spark
- Language: Python
- Homepage:
- Size: 863 KB
- Stars: 1
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Use Of PySpark For Movie Similarities With Jaccard Index
## Dataset
The dataset is the MovieLens 100K Dataset that can be found [here](https://grouplens.org/datasets/movielens/). It includes 100,000 ratings from 1000 users on 1700 movies and was released 4/1998. The needed files for the app are uploaded with changed name.## Requirements
- PySpark## Example Usage
To find similar movies with 'Star Wars (1977)' movie:
```
spark-submit movie-similarites.py 50
```