https://github.com/cmccomb/smartcore_vs_linfa
Benchmarking the top ML crates available for Rust 🦀
https://github.com/cmccomb/smartcore_vs_linfa
benchmark linfa machine-learning rust rust-lang smartcore
Last synced: 6 months ago
JSON representation
Benchmarking the top ML crates available for Rust 🦀
- Host: GitHub
- URL: https://github.com/cmccomb/smartcore_vs_linfa
- Owner: cmccomb
- Created: 2021-12-29T19:49:35.000Z (over 4 years ago)
- Default Branch: master
- Last Pushed: 2022-08-19T01:54:22.000Z (almost 4 years ago)
- Last Synced: 2025-01-20T07:12:47.725Z (over 1 year ago)
- Topics: benchmark, linfa, machine-learning, rust, rust-lang, smartcore
- Language: HTML
- Homepage: https://cmccomb.com/smartcore_vs_linfa/
- Size: 10.2 MB
- Stars: 2
- Watchers: 3
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
vs
## About
[`linfa`](https://rust-ml.github.io/linfa/) and [`smartcore`](https://smartcorelib.org/) have emerged as two leading `scikit-learn`-analogous machine learning frameworks for Rust. Both provide access to a number of algorithms that form the backbone of machine learning analysis. This repository provides a comparison between the training time of algorithms in these two machine learning frameworks. The algorithms included are:
| Algorithm | Smartcore v0.4.2 | Linfa v0.7.1 | Benchmarked here? |
|:------------------------------|:-----------------|:-------------|:------------------|
| Linear Regression | ✓ | ✓ | ✓ |
| Ridge Regression | ✓ | | |
| LASSO Regression | ✓ | | |
| Decision Tree Regression | ✓ | | |
| Random Forest Regression | ✓ | | |
| Support Vector Regression | ✓ | ✓ | ✓ |
| KNN Regression | ✓ | | |
| Elastic Net Regression | ✓ | ✓ | ✓ |
| Partial Least Squares | | ✓ | |
| Logistic Regression | ✓ | ✓ | ✓ |
| Decision Tree Classification | ✓ | ✓ | ✓ |
| Random Forest Classification | ✓ | | |
| Support Vector Classification | ✓ | ✓ | ✓ |
| KNN Classification | ✓ | | |
| Gaussian Naive Bayes | ✓ | ✓ | ✓ |
| K-Means | ✓ | ✓ | ✓ |
| DBSCAN | ✓ | ✓ | ✓ |
| Hierarchical Clustering | | ✓ | |
| Approximated DBSCAN | | ✓ | |
| Gaussian Mixture Model | | ✓ | |
| PCA | ✓ | ✓ | ✓ |
| ICA | | ✓ | |
| SVD | ✓ | | |
| t-SNE | | ✓ | |
| Diffusion Mapping | | ✓ | |
The full report is available [here](criterion/report/index.html), but summary violin plots are provided below.
## Considerations Besides Execution Time
Over the process of creating this benchmark study, a few additional differences between the libraries emerged.
### Documentation
The documentation for `smartcore` is a bit more consistent across algorithms. This may be due to the fact that it is maintained in a single crate.
### Dependencies
While `linfa` requires a BLAS/LAPACK backend (either `openblas`, `netblas`, or `intel-mkl`), `smartcore` does not. This allows `linfa` to take advantage of some additional optimization, but it limits portability.
### Dataset Determinism
Benchmark datasets are now generated from deterministic random seeds that pair each [`TestSize`](https://docs.rs/smartcore_vs_linfa/latest/smartcore_vs_linfa/enum.TestSize.html) with a scenario (`regression`, `classification`, or `clustering`). This ensures that all cached helpers (`xy_regression`, `xy_classification`, and `x_unsupervised`) draw from reproducible inputs across runs, aligning the smartcore and linfa comparisons.
## Results
### Regression
#### [Linear Regression](criterion/Linear%20Regression/report/index.html)
_No customization needed to equate algorithms._

#### [Elastic Net](criterion/Elastic%20Net/report/index.html)

#### [Support Vector Regression](criterion/Support%20Vector%20Regression/report/index.html)

### Classification
#### [Logistic Regression](criterion/Logistic%20Regression/report/index.html)
The `smartcore` implementation has no parameters, but the `linfa` settings were modified to align it with `smartcore` defaults:
- Gradient tolerance set to `1e-8`
- Maximum number of iterations set to `1000`

#### [Decision Tree](criterion/Decision%20Tree%20Classification/report/index.html)

#### [Gaussian Naive Bayes](criterion/Gaussian%20Naive%20Bayes/report/index.html)

#### [Support Vector Classification](criterion/Support%20Vector%20Classification/report/index.html)

### Clustering
#### [K-Means](criterion/K-Means%20Clustering/report/index.html)
Since the two implementations use different convergence criteria, the number of max iterations was equated at a low value, and only 1 run of the `linfa` algorithm was permitted:
- Max iterations set to `10`
- Number of runs set to `1`

#### [DBSCAN](criterion/DBSCAN%20Clustering/report/index.html)

### Dimensionality Reduction
#### [PCA](criterion/PCA/report/index.html)
