https://github.com/dafevara/propius

Propius allows for extracting similar items over a big data volume by using correlation between items over sparse data structures which use less space and memory.
https://github.com/dafevara/propius

big-data bigdata knn machine-learning python recommender-system sparse sparse-matrices

Last synced: 8 days ago
JSON representation

Propius allows for extracting similar items over a big data volume by using correlation between items over sparse data structures which use less space and memory.

Host: GitHub
URL: https://github.com/dafevara/propius
Owner: dafevara
License: apache-2.0
Created: 2021-06-08T15:24:40.000Z (over 4 years ago)
Default Branch: master
Last Pushed: 2025-08-08T06:17:10.000Z (6 months ago)
Last Synced: 2025-08-08T08:23:38.992Z (6 months ago)
Topics: big-data, bigdata, knn, machine-learning, python, recommender-system, sparse, sparse-matrices
Language: Python
Homepage:
Size: 170 KB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 3
Metadata Files:
- Readme: README.md
- License: LICENSE.txt

Awesome Lists containing this project

README

# Propius

Propius is latin for *closer*. Propius is a simple tool to uncover similar items in a dataset. In terms of distance, similar items tend to be closer, that's why _propius_ in latin.

Its main feature is to allow for extracting similar items over a big data volume by using correlation between items over sparse data structures which use less time and memory space.

It has two main components. First, the similarity model which is in charge of finding correlations between items based on how they happen together across data. Second, once model training is completed, Propius is able to store similarities so they can be retrieved later via REST API allowing to integrate this to different systems (e.g. recommender systems) transparently.

# How does it work?

Propius take advantage of [SciPy Sparse Module](https://docs.scipy.org/doc/scipy/reference/sparse.html) to build a correlation coefficients matrix to model similarities between items as distances between vectors in a _i_-dimensional space where _i_ represent an item.

Using this correlation coefficients matrix Propius is able to calculate kNN per each unique item and store each similarity score in a local db (sqlite) to retrieve similar items without the need to keep the correlation matrix in memory and returning result faster at the same time.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/dafevara/propius

Awesome Lists containing this project

README