https://github.com/dafevara/propius
Propius allows for extracting similar items over a big data volume by using correlation between items over sparse data structures which use less space and memory.
https://github.com/dafevara/propius
big-data bigdata knn machine-learning python recommender-system sparse sparse-matrices
Last synced: 8 days ago
JSON representation
Propius allows for extracting similar items over a big data volume by using correlation between items over sparse data structures which use less space and memory.
- Host: GitHub
- URL: https://github.com/dafevara/propius
- Owner: dafevara
- License: apache-2.0
- Created: 2021-06-08T15:24:40.000Z (over 4 years ago)
- Default Branch: master
- Last Pushed: 2025-08-08T06:17:10.000Z (6 months ago)
- Last Synced: 2025-08-08T08:23:38.992Z (6 months ago)
- Topics: big-data, bigdata, knn, machine-learning, python, recommender-system, sparse, sparse-matrices
- Language: Python
- Homepage:
- Size: 170 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 3
-
Metadata Files:
- Readme: README.md
- License: LICENSE.txt
Awesome Lists containing this project
README
# Propius
Propius is latin for *closer*. Propius is a simple tool to uncover similar items in a dataset. In terms of distance, similar items tend to be closer, that's why _propius_ in latin.
Its main feature is to allow for extracting similar items over a big data volume by using correlation between items over sparse data structures which use less time and memory space.
It has two main components. First, the similarity model which is in charge of finding correlations between items based on how they happen together across data. Second, once model training is completed, Propius is able to store similarities so they can be retrieved later via REST API allowing to integrate this to different systems (e.g. recommender systems) transparently.
# How does it work?
Propius take advantage of [SciPy Sparse Module](https://docs.scipy.org/doc/scipy/reference/sparse.html) to build a correlation coefficients matrix to model similarities between items as distances between vectors in a _i_-dimensional space where _i_ represent an item.
Using this correlation coefficients matrix Propius is able to calculate kNN per each unique item and store each similarity score in a local db (sqlite) to retrieve similar items without the need to keep the correlation matrix in memory and returning result faster at the same time.