An open API service indexing awesome lists of open source software.

https://github.com/zenwor/equilibrium

đŸ—žī¸ Article Management System
https://github.com/zenwor/equilibrium

cosine-similarity crud data-science machine-learning python pytorch tfidf tfidf-vectorizer

Last synced: 4 months ago
JSON representation

đŸ—žī¸ Article Management System

Awesome Lists containing this project

README

          

📰 Equilibrium

Equilibrium is an "Article Management System" , created as a little project for "Scripting Languages" course (Faculty of Sciences, University of Novi Sad).

â„šī¸ Dataset used in final model can be found here .

đŸ“Ĩ Installation


Installation is a very simple process:


  1. Clone the repository using: &nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbspgit clone https://github.com/LukaNedimovic/equilibrium


  2. Run the following to install dependencies: pip install -r requirements.txt

đŸ”Ĩ Motivation


  • Research of machine learning models in creation of text-based recommendation systems.

  • Creation of console application with "modern" GUI (multiple input boxes at the same time; selection / movement using arrow keys)



âš™ī¸ Features

Equilibrium is a slightly-more-complex CRUD application - one can create an account, log in, create an article, delete it, interact with it (like / dislike / save), search for articles based on keywords, and get recommended an article similar to the one currently reading.

Administrator account is already created and can be used to interact with platform completely - capable of deleting all articles, viewing keyword statistics and so on.

🤖 Machine Learning Model

Main motivation behind creating such project was to get a bit more knowledge on how some machine learning concepts work - especially embeddings .

I have tried implementing 3 different models:


  1. Article x Tag Model &nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp - where premise is that similar articles share more of the similar tags

  2. Collaborative Filtering Model - where articles are suggested by trying to predict the rating based on other user's ratings

  3. TF-IDF &nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp - standard method, combined with cosine similarity, which gave the best performance (being the simplest model out of these three)

I wished to create a NN model that would eventually have a good performance, but noticed following:


  1. I don't have sufficient data to create a good-working NN (for my current knowledge level, at the very least)

  2. It's better to use a simpler model if possible

  3. Hybrid model was possible and the most "modern" choice, but that would require a bit more time to implement and train

TF-IDF performed extremely well on given dataset, was quick to train and easy to implement. My wish to learn more about some NN models were also fulfilled by creating these two "less good" models, so it balanced out meaningfully.

🔍 What can be improved?

I believe that this project is good, especially being a "first semester" one. However, some meaningful changes can (and hopefully will) be made:


  1. Paths generation &nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp - a function should be implemented to generate the paths, or some other interesting workaround. It should be platform-agnostic, too.

  2. Generalized version of prompt rendering - it would be fun to create a module that renders these prompts dynamically, like miniature version of HTML. It could be also useful for programming newbies, who would then be able to create very nice console UIs.