https://github.com/maxgfr/regressio
Zero-dependency TypeScript regression, classification & statistics library. OLS, Ridge, Lasso, Elastic Net, Logistic, KNN, Neural Network + diagnostics + preprocessing. Optional Rust/WASM engine.
https://github.com/maxgfr/regressio
bun knn lasso linear-regression logistic-regression machine-learning neural-network ols regression ridge-regression statistics typescript wasm zero-dependencies
Last synced: about 17 hours ago
JSON representation
Zero-dependency TypeScript regression, classification & statistics library. OLS, Ridge, Lasso, Elastic Net, Logistic, KNN, Neural Network + diagnostics + preprocessing. Optional Rust/WASM engine.
- Host: GitHub
- URL: https://github.com/maxgfr/regressio
- Owner: maxgfr
- Created: 2026-04-04T19:29:19.000Z (3 months ago)
- Default Branch: main
- Last Pushed: 2026-04-04T19:43:32.000Z (3 months ago)
- Last Synced: 2026-04-04T22:02:46.401Z (3 months ago)
- Topics: bun, knn, lasso, linear-regression, logistic-regression, machine-learning, neural-network, ols, regression, ridge-regression, statistics, typescript, wasm, zero-dependencies
- Language: TypeScript
- Size: 114 KB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# regressio
Zero-dependency TypeScript regression, classification & statistics library with full statistical outputs, diagnostics, and preprocessing. Ships with an optional Rust/WASM engine for accelerated linear algebra.
## Install
```bash
bun add regressio
# or
npm install regressio
# or
pnpm add regressio
```
## Quick Start
```typescript
import { LinearRegression } from 'regressio';
const model = new LinearRegression();
model.fit([1, 2, 3, 4, 5], [2.1, 3.9, 6.2, 7.8, 10.1]);
console.log(model.coefficients); // [2.02]
console.log(model.intercept); // 0.06
console.log(model.predict([6])); // [12.18]
console.log(model.summary()); // R-style formatted summary table
```
## Models
### Regression
| Model | Class | What it does |
|-------|-------|--------------|
| **OLS** | `LinearRegression` | Fits a linear relationship between features and target using Ordinary Least Squares solved via QR decomposition. The foundational regression method. |
| **Polynomial** | `PolynomialRegression` | Fits non-linear curves by expanding a single feature into polynomial terms (x, x², x³, ...) then applying OLS. |
| **Ridge (L2)** | `RidgeRegression` | Adds an L2 penalty (sum of squared coefficients) to OLS to handle multicollinearity and prevent overfitting. Shrinks coefficients toward zero but never exactly to zero. |
| **Lasso (L1)** | `LassoRegression` | Adds an L1 penalty (sum of absolute coefficients) via coordinate descent. Forces some coefficients to exactly zero, performing automatic feature selection. |
| **Elastic Net** | `ElasticNet` | Combines L1 and L2 penalties. Balances Lasso's feature selection with Ridge's stability for correlated features. |
| **WLS** | `WeightedRegression` | Weighted Least Squares. Assigns different importance to each observation. Useful when some data points are more reliable than others. |
| **Robust** | `RobustRegression` | Resistant to outliers. Uses Iteratively Reweighted Least Squares (IRLS) with Huber or Tukey bisquare M-estimators to downweight extreme values. |
### Classification
| Model | Class | What it does |
|-------|-------|--------------|
| **Logistic** | `LogisticRegression` | Binary classification (0/1). Models the probability of class membership using a sigmoid function, fitted via Newton-Raphson/IRLS. |
| **Multiclass Logistic** | `MulticlassLogisticRegression` | Extends logistic regression to K classes using softmax. Fitted via gradient descent on the cross-entropy loss. |
| **K-Nearest Neighbors** | `KNearestNeighbors` | Non-parametric method. Predicts by majority vote (classification) or mean (regression) of the k closest training points. Supports Euclidean and Manhattan distances. |
### Neural Network
| Model | Class | What it does |
|-------|-------|--------------|
| **Feedforward NN** | `NeuralNetwork` | Multi-layer perceptron with backpropagation. Configurable hidden layers, activations (relu, sigmoid, tanh, softmax), and learning rate. Supports both regression and classification tasks. |
### Usage
```typescript
import {
LinearRegression,
PolynomialRegression,
RidgeRegression,
LassoRegression,
ElasticNet,
WeightedRegression,
RobustRegression,
LogisticRegression,
MulticlassLogisticRegression,
KNearestNeighbors,
NeuralNetwork,
} from 'regressio';
// --- Regression ---
// OLS: multiple regression
const ols = new LinearRegression();
ols.fit([[1, 2], [3, 4], [5, 6]], [10, 22, 34]);
// Polynomial: fit a cubic curve
const poly = new PolynomialRegression({ degree: 3 });
poly.fit([1, 2, 3, 4, 5], [1, 8, 27, 64, 125]);
// Ridge: regularized regression for correlated features
const ridge = new RidgeRegression({ alpha: 0.5 });
ridge.fit(X, y);
// Lasso: automatic feature selection
const lasso = new LassoRegression({ alpha: 0.1 });
lasso.fit(X, y);
// Some coefficients will be exactly 0
// Elastic Net: mix of L1 and L2
const enet = new ElasticNet({ alpha: 0.1, l1Ratio: 0.5 });
enet.fit(X, y);
// Weighted Least Squares: different reliability per observation
const wls = new WeightedRegression();
wls.fit(X, y, weights);
// Robust: resistant to outliers
const robust = new RobustRegression({ method: 'huber' });
robust.fit(X, y);
// --- Classification ---
// Binary logistic regression
const logit = new LogisticRegression();
logit.fit(X, y); // y must be 0/1
logit.predictProbability(Xnew); // [0.12, 0.87, ...]
// Multiclass logistic regression (softmax)
const multi = new MulticlassLogisticRegression({ learningRate: 0.05 });
multi.fit(X, y); // y = 0, 1, 2, ...
multi.predictProbability(Xnew); // [[0.7, 0.2, 0.1], ...]
// K-Nearest Neighbors (classification or regression)
const knn = new KNearestNeighbors({ k: 5, mode: 'classification' });
knn.fit(X, y);
knn.predict(Xnew);
// --- Neural Network ---
// Regression with a neural network
const nn = new NeuralNetwork({
layers: [
{ units: 16, activation: 'relu' },
{ units: 8, activation: 'relu' },
],
learningRate: 0.01,
epochs: 200,
task: 'regression',
});
nn.fit(X, y);
nn.predict(Xnew);
// Classification with a neural network
const clf = new NeuralNetwork({
layers: [{ units: 10, activation: 'sigmoid' }],
learningRate: 0.1,
epochs: 100,
task: 'classification',
});
clf.fit(X, y); // y = 0, 1, 2, ...
clf.predict(Xnew);
```
## Statistical Outputs
Every linear model (OLS, Ridge, Lasso, Elastic Net, WLS, Robust, Polynomial) provides `statistics()` and `summary()`:
```typescript
const stats = model.statistics();
// {
// rSquared, -- proportion of variance explained (0 to 1)
// adjustedRSquared, -- R² penalized for number of predictors
// standardErrors, -- uncertainty of each coefficient estimate
// tStatistics, -- coefficient / standard error for each predictor
// pValues, -- probability of observing the t-stat under H0 (no effect)
// confidenceIntervals, -- 95% confidence range for each coefficient
// fStatistic, -- overall model significance test
// fPValue, -- p-value for the F-test
// residualStandardError, -- estimated standard deviation of residuals
// aic, -- Akaike Information Criterion (lower = better fit/complexity trade-off)
// bic, -- Bayesian Information Criterion (stronger complexity penalty than AIC)
// degreesOfFreedom, -- n - k (observations minus parameters)
// nObservations, -- number of data points
// }
console.log(model.summary());
// Coefficients:
// Estimate Std. Error t value Pr(>|t|)
// (Intercept) 0.0600 0.1200 0.50 0.6300
// x1 2.0200 0.0400 50.20 0.0000 ***
// ---
// Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
```
Binary logistic regression provides classification metrics:
```typescript
const stats = logit.statistics();
// { accuracy, precision, recall, f1Score, confusionMatrix,
// pseudoRSquared, logLikelihood, aic, bic }
```
Multiclass logistic regression provides per-class metrics:
```typescript
const stats = multi.statistics();
// { accuracy, precision (per class), recall (per class),
// nClasses, logLikelihood }
```
## Diagnostics
Functions to validate model assumptions and detect problems.
| Function | What it does |
|----------|--------------|
| `residualDiagnostics(X, y, yHat)` | Returns raw residuals, studentized residuals, Cook's distance, and leverage for each observation. |
| `studentizedResiduals(X, y, yHat)` | Residuals scaled by their estimated standard deviation. Values > 2-3 suggest outliers. |
| `cooksDistance(X, y, yHat)` | Measures how much each observation influences the fitted model. Values > 4/n flag influential points. |
| `leverage(X)` | Hat matrix diagonal. Measures how far each observation's features are from the center. High leverage = unusual feature values. |
| `durbinWatson(residuals)` | Tests for autocorrelation in residuals. Returns statistic in [0,4]: ~2 = no autocorrelation, <2 = positive, >2 = negative. Critical for time series. |
| `breuschPagan(X, residuals)` | Tests for heteroscedasticity (non-constant variance). Low p-value = variance depends on X, meaning standard errors are unreliable. |
| `shapiroWilk(data)` | Tests whether data follows a normal distribution. Low p-value = non-normal. Important because p-values and CIs assume normal residuals. |
| `vif(X)` | Variance Inflation Factor for each feature. VIF > 10 signals multicollinearity (features are too correlated). |
| `correlationMatrix(X)` | Pairwise Pearson correlation matrix. Pairs with |r| > 0.9 suggest redundant features. |
| `conditionNumber(X)` | Ratio of largest to smallest singular value of X. Values > 30 signal numerical instability from multicollinearity. |
```typescript
import {
residualDiagnostics, leverage, cooksDistance, studentizedResiduals,
durbinWatson, breuschPagan, shapiroWilk,
vif, correlationMatrix, conditionNumber,
} from 'regressio';
const diag = residualDiagnostics(X, y, yHat);
const dw = durbinWatson(model.residuals());
const bp = breuschPagan(X, model.residuals());
const sw = shapiroWilk(model.residuals());
const vifs = vif(X);
const corr = correlationMatrix(X);
const kappa = conditionNumber(X);
```
## Preprocessing
Functions to prepare data before fitting models.
| Function | What it does |
|----------|--------------|
| `standardize(X)` | Z-score normalization: transforms each feature to mean=0, std=1. Essential for Lasso/Ridge/Elastic Net and neural networks. |
| `unstandardize(X, params)` | Reverses standardization back to the original scale. |
| `normalize(X)` | Min-max scaling: transforms each feature to [0, 1] range. |
| `unnormalize(X, params)` | Reverses normalization back to the original scale. |
| `oneHotEncode(column, categories?, dropFirst?)` | Converts categorical values to binary columns. Use `dropFirst=true` to avoid the multicollinearity trap. |
| `polynomialFeatures(X, degree)` | Generates polynomial terms (x, x², x³, ...) for each feature. Use with `LinearRegression` for polynomial fitting with multiple features. |
| `interactionFeatures(X, pairs?)` | Generates interaction terms (xi * xj) for all or specified feature pairs. |
| `dropMissing(X, y?)` | Removes rows containing NaN or null values. |
| `imputeMean(X)` | Replaces NaN values with the column mean. |
| `imputeMedian(X)` | Replaces NaN values with the column median. More robust to outliers than mean imputation. |
```typescript
import {
standardize, unstandardize, normalize, unnormalize,
oneHotEncode, polynomialFeatures, interactionFeatures,
dropMissing, imputeMean, imputeMedian,
} from 'regressio';
const { transformed, means, stds } = standardize(X);
const original = unstandardize(transformed, { means, stds });
const { transformed: normed, mins, maxs } = normalize(X);
const dummies = oneHotEncode(['cat', 'dog', 'cat'], undefined, true);
const polyX = polynomialFeatures(X, 3);
const interX = interactionFeatures(X);
const clean = dropMissing(X, y);
const imputed = imputeMean(X);
```
## Prediction Intervals
Functions to quantify prediction uncertainty.
| Function | What it does |
|----------|--------------|
| `confidenceInterval(X, y, yHat, newX, newYHat)` | Confidence interval on the **mean** prediction. Answers: "where is the true regression line?" Narrower near the center of the training data. |
| `predictionInterval(X, y, yHat, newX, newYHat)` | Prediction interval for a **new individual** observation. Always wider than the confidence interval because it includes observation noise. |
| `bootstrapCoefficients(X, y, nBootstrap?)` | Non-parametric bootstrap: resamples data with replacement, refits the model many times, and returns empirical confidence intervals on coefficients. No distributional assumptions. |
```typescript
import { confidenceInterval, predictionInterval, bootstrapCoefficients } from 'regressio';
const ci = confidenceInterval(X, y, yHat, newX, newYHat);
// [{ predicted, lower, upper }, ...]
const pi = predictionInterval(X, y, yHat, newX, newYHat);
// Always wider than ci
const boot = bootstrapCoefficients(X, y, 1000);
// { coefficients, confidenceIntervals, standardErrors }
```
## Advanced: Matrix Class
Low-level matrix operations for advanced users. Backed by `Float64Array` in row-major order.
```typescript
import { Matrix } from 'regressio';
const A = Matrix.fromArray([[1, 2], [3, 4]]);
const B = Matrix.identity(2);
const C = A.multiply(B);
console.log(C.determinant()); // -2
console.log(C.trace()); // 5
console.log(C.transpose().toArray());
```
## WASM Acceleration
regressio ships with a pre-compiled Rust/WASM engine that activates automatically — no configuration needed. When the WASM binary is available, heavy computations are dispatched to compiled Rust code for significantly faster execution.
**Accelerated operations:**
- Matrix: multiply, transpose, add, subtract, scale, dot product, norm, determinant
- Decompositions: QR, Cholesky, SVD, eigenvalues (tridiagonal QL)
- Solvers: forward/back substitution
- Models: Lasso/Elastic Net coordinate descent, logistic regression IRLS, softmax, KNN distance matrices
- Diagnostics: correlation matrix, VIF (via correlation matrix inverse)
- Predictions: bootstrap OLS (1000+ resamples in a single WASM call)
If WASM is unavailable (e.g. unsupported runtime), all operations fall back silently to pure TypeScript.
```typescript
import { isWasmActive } from 'regressio';
console.log(isWasmActive()); // true if WASM loaded
// Everything just works — WASM is used transparently
const model = new LinearRegression();
model.fit(X, y); // QR decomposition runs in Rust
```
### Rebuilding WASM
The pre-built WASM binary is included in the package. To rebuild from Rust source (requires [Rust](https://rustup.rs/) with `wasm32-unknown-unknown` target):
```bash
bun run build:wasm
```
## License
MIT