https://github.com/fpopic/master-thesis

(Class) Master's thesis source code. "A Distributed Recommender System on Apache Spark"
https://github.com/fpopic/master-thesis

apache-spark blocks collaborative-filtering distributed-computing machine-learning-algorithms matrix-multiplication mllib recommendation-engine recommender-system similarity-matrix similarity-measures sparse-matrix

Last synced: 8 months ago
JSON representation

(Class) Master's thesis source code. "A Distributed Recommender System on Apache Spark"

Host: GitHub
URL: https://github.com/fpopic/master-thesis
Owner: fpopic
License: mit
Created: 2017-03-02T16:20:34.000Z (over 8 years ago)
Default Branch: master
Last Pushed: 2017-07-09T23:36:34.000Z (over 8 years ago)
Last Synced: 2025-01-10T19:42:16.726Z (9 months ago)
Topics: apache-spark, blocks, collaborative-filtering, distributed-computing, machine-learning-algorithms, matrix-multiplication, mllib, recommendation-engine, recommender-system, similarity-matrix, similarity-measures, sparse-matrix
Language: Scala
Homepage:
Size: 91.8 KB
Stars: 1
Watchers: 3
Forks: 3
Open Issues: 1
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

# Abstract

The result of this thesis is a distributed recommender system based on the item-item collaborative filtering.
The recommendation algorithm builds an item-item similarity matrix based on the collaboratively collected data on user-item interactions, for all users in the system.
The recommendation algorithm supports several similarity measures including a vector normalisation of rows in the matrix.
Moreover, the recommendation algorithm supports three different distributed matrix multiplication algorithms.
The entire recommender system source code is written in Scala programming language based on Apache Spark.
However, the data pre-processing scripts are written in C++ programming language executed in a single-node environment.
The tests and performance evaluation of the implemented algorithm were executed on a Cloudera cluster using real dataset obtained from the particular case study.

# Running

```
$spark-submit \
--class hr.fer.ztel.thesis.Main \
--master yarn --deploy-mode cluster \
...
\
\
\
\
\
\
\
\
\

```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/fpopic/master-thesis

Awesome Lists containing this project

README