Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/nikoshet/pyspark-movie-similarities
Using Spark In Python For Movie Similarities With Jaccard Index
https://github.com/nikoshet/pyspark-movie-similarities
jaccard-index movie-similarities pyspark spark
Last synced: about 1 month ago
JSON representation
Using Spark In Python For Movie Similarities With Jaccard Index
- Host: GitHub
- URL: https://github.com/nikoshet/pyspark-movie-similarities
- Owner: nikoshet
- License: mit
- Created: 2020-09-23T18:17:04.000Z (over 4 years ago)
- Default Branch: master
- Last Pushed: 2020-11-09T10:04:00.000Z (over 4 years ago)
- Last Synced: 2024-11-09T09:44:29.668Z (3 months ago)
- Topics: jaccard-index, movie-similarities, pyspark, spark
- Language: Python
- Homepage:
- Size: 863 KB
- Stars: 1
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Use Of PySpark For Movie Similarities With Jaccard Index
## Dataset
The dataset is the MovieLens 100K Dataset that can be found [here](https://grouplens.org/datasets/movielens/). It includes 100,000 ratings from 1000 users on 1700 movies and was released 4/1998. The needed files for the app are uploaded with changed name.## Requirements
- PySpark## Example Usage
To find similar movies with 'Star Wars (1977)' movie:
```
spark-submit movie-similarites.py 50
```