https://github.com/drorspei/varvar
Python package to model variance in different ways
https://github.com/drorspei/varvar
Last synced: 3 days ago
JSON representation
Python package to model variance in different ways
- Host: GitHub
- URL: https://github.com/drorspei/varvar
- Owner: drorspei
- License: mit
- Created: 2021-11-17T13:43:57.000Z (over 4 years ago)
- Default Branch: main
- Last Pushed: 2023-01-24T14:15:47.000Z (over 3 years ago)
- Last Synced: 2025-10-27T00:35:22.923Z (8 months ago)
- Language: Python
- Size: 37.1 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# varvar
Python package to model variance in different ways
# Multiplicative variance trees and the varvar algorithm
varvar is a greedy algorithm for multiplicative variance trees.
varvar is to variance as lightgbm/xgboost/... are to expectation.
There are currently two implementations of varvar algorithms:
1. using quantile search at every split (in `varvar.qtrees`)
2. using histograms, with binning before starting (in `varvar.htrees`)
Quantile search is much slower, but can be more accurate.
This is similar to the "exact" and "hist" modes in xgboost, except our "exact"
algorithm goes over a small (exact) subset of each feature.
Both implementation modules have a `multiplicative_variance_trees` function.
Use `varvar.predict` for prediction.
The trees are returned as plain python types and can be serialized with pickle
or even as json.
Here is an example:
```
from varvar.htrees import multiplicative_variance_trees
from varvar import predict
import numpy as np
random = np.random.RandomState(1729)
n = 200000
x = random.uniform(-1000, 1000, n)
correct_threshold = 300
sigma = 1 * (x <= correct_threshold) + 30 * (x > correct_threshold)
e = sigma * random.randn(n)
trees = multiplicative_variance_trees(
[x], e**2,
num_trees=2, max_depth=1, min_gain=1, learning_rate=1,
)
preds = predict(trees, [x])
found_threshold = trees[1][0][1]
print(correct_threshold, found_threshold) # 300, 295
print(np.sqrt(min(preds)), np.sqrt(max(preds))) # 1, 30
```
## conversion to xgboost booster
You can convert multiplicative variance trees to an xgboost booster.
This allows you to use xgboost's predict function (which actually seems to be a bit slower), and more importantly to use the shap package
to interpret varvar predictions.
```
from varvar import mvt_to_xgboost
booster = mvt_to_xgboost(trees, feature_names=["f1", "f2"])
```
You need xgboost 1.6.1 or higher installed to run this code.