https://github.com/cmccomb/smartcore_vs_linfa

Benchmarking the top ML crates available for Rust 🦀
https://github.com/cmccomb/smartcore_vs_linfa

benchmark linfa machine-learning rust rust-lang smartcore

Last synced: 6 months ago
JSON representation

Benchmarking the top ML crates available for Rust 🦀

Host: GitHub
URL: https://github.com/cmccomb/smartcore_vs_linfa
Owner: cmccomb
Created: 2021-12-29T19:49:35.000Z (over 4 years ago)
Default Branch: master
Last Pushed: 2022-08-19T01:54:22.000Z (almost 4 years ago)
Last Synced: 2025-01-20T07:12:47.725Z (over 1 year ago)
Topics: benchmark, linfa, machine-learning, rust, rust-lang, smartcore
Language: HTML
Homepage: https://cmccomb.com/smartcore_vs_linfa/
Size: 10.2 MB
Stars: 2
Watchers: 3
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

          


     

    vs

    



## About

[`linfa`](https://rust-ml.github.io/linfa/) and [`smartcore`](https://smartcorelib.org/) have emerged as two leading `scikit-learn`-analogous machine learning frameworks for Rust. Both provide access to a number of algorithms that form the backbone of machine learning analysis. This repository provides a comparison between the training time of algorithms in these two machine learning frameworks. The algorithms included are:

| Algorithm 
|:----------------------- 
| Linear Regression 
| Ridge Regression 
| LASSO Regression 
| Decision Tree Regression 
| Random Forest Regression 
| Support Vector Regression 
| KNN Regression 
| Elastic Net Regression 
| Partial Least Squares 
| Logistic Regression 
| Decision Tree Classification 
| Random Forest Classification 
| Support Vector Classification | ✓ 
| KNN Classification 
| Gaussian Naive Bayes 
| K-Means 
| DBSCAN 
| Hierarchical Clustering 
| Approximated DBSCAN 
| Gaussian Mixture Model 
| PCA 
| ICA 
| SVD 
| t-SNE 
| Diffusion Mapping

| Smartcore v0.4.2 | Linfa v0.7.1 | Benchmarked here? | -------|:-----------------|:-------------|:------------------| | ✓                | ✓            | ✓                 | | ✓                |              |                   | | ✓                |              |                   | | ✓                |              |                   | | ✓                |              |                   | | ✓                | ✓            | ✓                 | | ✓                |              |                   | | ✓                | ✓            | ✓                 | |                  | ✓            |                   | | ✓                | ✓            | ✓                 | | ✓                | ✓            | ✓                 | | ✓                |              |                   | | ✓            | ✓                 | | ✓                |              |                   | | ✓                | ✓            | ✓                 | | ✓                | ✓            | ✓                 | | ✓                | ✓            | ✓                 | |                  | ✓            |                   | |                  | ✓            |                   | |                  | ✓            |                   | | ✓                | ✓            | ✓                 | |                  | ✓            |                   | | ✓                |              |                   | |                  | ✓            |                   | |                  | ✓            |                   |

The full report is available [here](criterion/report/index.html), but summary violin plots are provided below.

## Considerations Besides Execution Time

Over the process of creating this benchmark study, a few additional differences between the libraries emerged.

### Documentation

The documentation for `smartcore` is a bit more consistent across algorithms. This may be due to the fact that it is maintained in a single crate.

### Dependencies

While `linfa` requires a BLAS/LAPACK backend (either `openblas`, `netblas`, or `intel-mkl`), `smartcore` does not. This allows `linfa` to take advantage of some additional optimization, but it limits portability.

### Dataset Determinism

Benchmark datasets are now generated from deterministic random seeds that pair each [`TestSize`](https://docs.rs/smartcore_vs_linfa/latest/smartcore_vs_linfa/enum.TestSize.html) with a scenario (`regression`, `classification`, or `clustering`). This ensures that all cached helpers (`xy_regression`, `xy_classification`, and `x_unsupervised`) draw from reproducible inputs across runs, aligning the smartcore and linfa comparisons.

## Results

### Regression

#### [Linear Regression](criterion/Linear%20Regression/report/index.html)

_No customization needed to equate algorithms._

![](criterion/Linear%20Regression/report/violin.svg)

#### [Elastic Net](criterion/Elastic%20Net/report/index.html)

![](criterion/Elastic%20Net/report/violin.svg)

#### [Support Vector Regression](criterion/Support%20Vector%20Regression/report/index.html)

![](criterion/Support%20Vector%20Regression/report/violin.svg)

### Classification

#### [Logistic Regression](criterion/Logistic%20Regression/report/index.html)

The `smartcore` implementation has no parameters, but the `linfa` settings were modified to align it with `smartcore` defaults:

- Gradient tolerance set to `1e-8`

- Maximum number of iterations set to `1000`

![](criterion/Logistic%20Regression/report/violin.svg)

#### [Decision Tree](criterion/Decision%20Tree%20Classification/report/index.html)

![](criterion/Decision%20Tree%20Classification/report/violin.svg)

#### [Gaussian Naive Bayes](criterion/Gaussian%20Naive%20Bayes/report/index.html)

![](criterion/Gaussian%20Naive%20Bayes/report/violin.svg)

#### [Support Vector Classification](criterion/Support%20Vector%20Classification/report/index.html)

![](criterion/Support%20Vector%20Classification/report/violin.svg)

### Clustering

#### [K-Means](criterion/K-Means%20Clustering/report/index.html)

Since the two implementations use different convergence criteria, the number of max iterations was equated at a low value, and only 1 run of the `linfa` algorithm was permitted:

- Max iterations set to `10`

- Number of runs set to `1`

![](criterion/K-Means%20Clustering/report/violin.svg)

#### [DBSCAN](criterion/DBSCAN%20Clustering/report/index.html)

![](criterion/DBSCAN%20Clustering/report/violin.svg)

### Dimensionality Reduction

#### [PCA](criterion/PCA/report/index.html)

![](criterion/PCA/report/violin.svg)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/cmccomb/smartcore_vs_linfa

Awesome Lists containing this project

README