Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/szapp/mangoleaf
Live web application demonstrating personalized recommendations for books and mangas implemented using collaborative filtering based recommender systems
https://github.com/szapp/mangoleaf
data-science item-based-collaborative-filtering machine-learning recommendation recommender user-based-collaborative-filtering
Last synced: about 1 month ago
JSON representation
Live web application demonstrating personalized recommendations for books and mangas implemented using collaborative filtering based recommender systems
- Host: GitHub
- URL: https://github.com/szapp/mangoleaf
- Owner: szapp
- License: mit
- Created: 2024-07-31T14:52:10.000Z (5 months ago)
- Default Branch: main
- Last Pushed: 2024-08-08T15:19:26.000Z (5 months ago)
- Last Synced: 2024-08-09T09:45:21.603Z (5 months ago)
- Topics: data-science, item-based-collaborative-filtering, machine-learning, recommendation, recommender, user-based-collaborative-filtering
- Language: Python
- Homepage: https://mangoleaf-dev.streamlit.app
- Size: 1.42 MB
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
[![streamlit](https://img.shields.io/badge/streamlit-deployed-4c1?logo=streamlit&logoColor=white)](https://mangoleaf-dev.streamlit.app)
Welcome to MANGOLEAF, your ultimate guide to discovering the best books and manga tailored to your tastes.
Whether you're a seasoned reader or just starting, MANGOLEAF provides personalized recommendations to help you find your next favorite read.Personal recommendations for books and mangas implemented using collaborative filtering based recommender systems (popularity, user-based & item-based).
---
## Project
The goal of this project was to familiarize ourselves with and develop different recommender systems during a limited time of 2.5 weeks and clearly defined deliverable using agile methods.
The recommender systems include item popularity based, item-based collaborative filtering, and user-based collaborative filtering.The deliverable is a functional web app including user profiles for personalized recommendation available to anyone.
For the sake of demonstration the datasets are limited to around 2000 items (around 1500 books and 500 manga) and the personalized recommendations are updated only at certain intervals (every 24 hours).To avoid spam and abuse in this demo project, user ratings are reset and user profiles are deleted every five days.
To offset this limitation, user ratings can be exported and downloaded as CSV file at any time.## Authors
[![Contributors](https://contrib.rocks/image?repo=szapp/Mangoleaf)](https://github.com/szapp/Mangoleaf/graphs/contributors)
## Recommender implementation
We trained and evaluated different recommenders for both the book and manga dataset. Below *user* is an individual, *item* refers to either a book or a manga, and a *rating* is a user score for each user-item combination.
1. **Popularity recommender**:
The ratings of all users are queried from the database and aggregated by average and count grouped by the items.
Given a threshold of minimum number of ratings, the best average ratings are selected as the most popular items.
In order of their rating they make up the popularity recommendation.2. **Item-based collaborative filtering recommender**:
A collaborative filtering model is trained using the item ratings and their similarity matrix.
The K-nearest neighbor (k-NN) inspired algorithm with a baseline ratings showed the most accuracy during model validation.
For each item, the nearest neighbors are determined.
These neighbors make up the the item-based, "you-might-also-like"-recommendation.3. **User-based collaborative filtering recommender**:
Here, another baseline k-NN model is trained on the user ratings and their similarity matrix.
For each user, the missing ratings constitute a testing set.
The highest predicted ratings make up the user-based, personalized recommendation.Each of the recommendations were subsequently filtered to remove the items that a (logged-in) user has already rated to display only novel, meaningful reading suggestions on the user interface.
## Key learning
- Project planning and collaborative working using agile methods
- Balancing limited time against a working product
- Working with different datasets and bringing them into a consistent format
- Deploying a Streamlit app online
- Implementing and maintaining a PostgreSQL database
- Implementing user authentication with hashed and salted passwords and base64-encoded, cropped user pictures
- Automated scheduling with GitHub Action workflows## Languages, tools, and libraries
- scikit-surprise
- streamlit
- pandas
- SQLAlchemy
- bcrypt
- pillow
- Postgres SQLSee [requirements.txt](requirements.txt) for all used Python packages.
## Schedule
The project was implemented based on a well devised schedule of two and a half weeks.
Implementation was done using agile methods including daily stand-ups, iterative implementation of minimally working examples, and weekly sprints/milestones.![schedule](https://github.com/user-attachments/assets/13da011d-cd98-4512-b429-06d3ed1d9869)
## Database schema
The database structure is separated into static tables, dynamic tables, and semi-dynamic tables, for both books and manga.
- The static tables (left and right: `books` and `mangas`) remain filled with the book and manga datasets. They are read-only.
- The dynamic tables (center: `users` and `user_data`, `*_ratings`) are altered through user interactions.
- The semi-dynamic tables (bottom row: `*_popular`, `*_item_based`, `*_user_based`) are updated through scheduled GitHub Actions and are otherwise read-only.![schema](https://github.com/user-attachments/assets/88afc170-d81d-47fa-93ee-cc61c1a38908)
## Repository structure
The repository contains the exploratory data analysis, the implementation of the recommenders, the database schema and SQL operations, and the code of the Streamlit web application. The core code of the project is organized into a Python package `mangoleaf`.
```
├── mangoleaf/ <- Source code of the Python package
│ │
│ ├── connection.py <- Connection and interface with the database
│ ├── query.py
│ │
│ ├── authentication.py <- Authentication functions for the user accounts
│ │
│ ├── frontend.py <- Functions for frontend components
│ │
│ └── recommend.py <- Functions to predict the recommendations
│
├── notebooks/ <- Jupyter notebooks with EDA and initial recommenders
│
├── requirements.txt <- Dependencies for reproducing the environment
│
├── .streamlit/ <- Streamlit configuration
│
├── Home.py <- Pages, CSS, and images for the Streamlit app
├── pages/
├── style/
├── images/
│
├── schema.sql <- SQL scripts for creating and truncating the database structure
├── reset_dynamic_tables.sql
│
├── create_schema.py <- Python scripts to create, update, and reset the database
├── reset_database.py
├── update_database.py
│
└── .github/workflows/ <- Scheduled GitHub Action workflows to update/reset the database
```## Data sources
The datasets fueling the recommendations were modified from
- https://www.kaggle.com/datasets/arashnic/book-recommendation-dataset
- https://www.kaggle.com/datasets/dbdmobile/myanimelist-datasetThe repository [MaxYurch/MANGOLEAF-APP](https://github.com/MaxYurch/MANGOLEAF-APP) is an adjacent implementation.