{"id":20556586,"url":"https://github.com/billgewrgoulas/recommendation-systems","last_synced_at":"2026-05-06T04:03:53.801Z","repository":{"id":158211322,"uuid":"471670543","full_name":"billgewrgoulas/Recommendation-Systems","owner":"billgewrgoulas","description":"Algorithms for joke rating prediction using the joke data-set from Kaggle.","archived":false,"fork":false,"pushed_at":"2022-04-02T17:42:10.000Z","size":10211,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-05-30T12:57:08.213Z","etag":null,"topics":["algorithm","clustering","collaborative-filtering","machine-learning","numpy","pandas","recommender-system","scikit-learn","scypi"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/billgewrgoulas.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2022-03-19T11:06:27.000Z","updated_at":"2022-03-27T21:33:16.000Z","dependencies_parsed_at":null,"dependency_job_id":"e1ccc68f-0485-43c6-92fa-1dc8d0e63e74","html_url":"https://github.com/billgewrgoulas/Recommendation-Systems","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/billgewrgoulas/Recommendation-Systems","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/billgewrgoulas%2FRecommendation-Systems","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/billgewrgoulas%2FRecommendation-Systems/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/billgewrgoulas%2FRecommendation-Systems/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/billgewrgoulas%2FRecommendation-Systems/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/billgewrgoulas","download_url":"https://codeload.github.com/billgewrgoulas/Recommendation-Systems/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/billgewrgoulas%2FRecommendation-Systems/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":32677933,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-06T02:33:58.958Z","status":"ssl_error","status_checked_at":"2026-05-06T02:33:39.611Z","response_time":117,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["algorithm","clustering","collaborative-filtering","machine-learning","numpy","pandas","recommender-system","scikit-learn","scypi"],"created_at":"2024-11-16T03:29:57.577Z","updated_at":"2026-05-06T04:03:53.793Z","avatar_url":"https://github.com/billgewrgoulas.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"\n## Introduction\n\nThe purpose of this project is to implement various algorithms to predict the \"rating\" a user would give to a joke they haven't already rated, using the joke data set from Kaggle. The main methods that will be featured are \u003cb\u003eCollaborative Filtering\u003c/b\u003e, \u003cb\u003eSingular Value Decomposition\u003c/b\u003e and some \u003cb\u003eCluster Based\u003c/b\u003e algorithms using Kmeans.\n\n## Step 1 - Pre-Processing\n\nInitially, our data will look something like this:\n\n\u003cimg src='https://github.com/billgewrgoulas/Recommendation-Systems/blob/main/gif/p1.png'\u003e\n\nIn order to apply the algorithms, we will have to create the rating matrix. Each row will represent the rating vector for each user and every one of these vectors will have as many axes as the number of jokes. Similary each column will represent the rating vector of each joke. Obviously, this will result in a matrix with many blank values since not all users have rated every joke. In this case, we will fill the NaN values with 0, so we can use sparse matrices for efficiency. In addition, we will use the ids of the jokes and users as indices for convenience.\n\nLastly to test the algorithms we will keep the first 10000 rows of the initial matrix as test data and use the rest as train data. The goal will be that for each pair user - joke (u, j) in the test data predict the rating using the train data. \n\n\u003cimg src='https://github.com/billgewrgoulas/Recommendation-Systems/blob/main/gif/p6.png'\u003e\n\n## Step 2 - Defining Similarity\n\nNow that we have assigned a 1xN vector to each user we need to find a way to determine how similar two users are. \nTo achieve this we will use \u003cb\u003eCosine similarity\u003c/b\u003e which measures the angle between two vectors.\n\n\u003cp align=\"center\"\u003e\u003cimg src='https://github.com/billgewrgoulas/Recommendation-Systems/blob/main/gif/p7.png'\u003e\u003c/p\u003e\n\n\n\n## Step 3 - Evaluation\n\nTo test the accuracy of each algorithm we will use the \u003cb\u003eRoot Mean Squared Error\u003c/b\u003e. In addition, we will implement two simple algorithms that work with the means and will act as baselines.\n\n\u003cp align=\"center\"\u003e\u003cimg src='https://github.com/billgewrgoulas/Recommendation-Systems/blob/main/gif/p2.png'\u003e\u003c/p\u003e\n\n\n## Collaborative Filtering Methods\n\n### User Collaborative Filtering\n\n\u003cp align=\"center\"\u003e\u003cimg src='https://github.com/billgewrgoulas/Recommendation-Systems/blob/main/gif/p3.png'\u003e\u003c/p\u003e\n\n\n * Get vector of u, and find indices of users that rated j\n * Compute similarities between users that rated j and user u\n * Keep k most similar users and their scores\n * Predict the score using the above formula\n\n### Item Collaborative Filtering - (same as UCF but it works with the joke vectors)\n\n\u003cp align=\"center\"\u003e\u003cimg src='https://github.com/billgewrgoulas/Recommendation-Systems/blob/main/gif/p4.png'\u003e\u003c/p\u003e\n\n * Get vector of J, and find indices of jokes that were rated by user u\n * Compute similarities between jokes that were rated by u and joke j\n * Keep k most similar jokes and their scores\n * Predict the rating using the above formula\n\n\n### Improved UCF\n\n * In this case we will predict the deviations from the mean and use the pearson correlation coefficient for the similarity.\n\n\u003cp align=\"center\"\u003e\u003cimg src='https://github.com/billgewrgoulas/Recommendation-Systems/blob/main/gif/p5.png'\u003e\u003c/p\u003e\n\n## Singular Value Decomposition\n\n * Apply SVD and keep the k largest singular values in order to produce the prediction matrix.\n\n## Cluster-Based Methods\n\nIn this case, we will combine Kmeans with some of the previous algorithms. We will apply Kmeans on the rating matrix using the user rating vectors for the clustering, but first, we have to determine the right number of clusters. To achieve this we will create the combined Silhouette Coefficient - SSE plot to find the best number of clusters.\n\n### Cluster-Based JA (CB-JA)\n\n * Find cluster of u and users in the cluster\n * Get ratings of j in the cluster\n * Return mean in the cluster or mean of j\n\n### Cluster-Based UCF (CB-UCF)\n\n The idea here is to run UCF on the vectors in the cluster of the user u.\n\n * Find cluster of u and users in the cluster\n * Get the sparse submatrix that contains only the rating vectors of the users in the cluster.\n * Run UCF on the submatrix\n\n### Cluster-Based UA (CB-UA)\n For this method we will use Kmeans to cluster the jokes using their text. To achieve this we will assign a vector to each joke using \u003cb\u003eTF-IDF Vectorizer\u003c/b\u003e\n\n * Find cluster of j and all jokes in the cluster\n * Return the mean of the user in the cluster.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbillgewrgoulas%2Frecommendation-systems","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fbillgewrgoulas%2Frecommendation-systems","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbillgewrgoulas%2Frecommendation-systems/lists"}