https://github.com/jmcarpenter2/parfit
A package for parallelizing the fit and flexibly scoring of sklearn machine learning models, with visualization routines.
https://github.com/jmcarpenter2/parfit
Last synced: 19 days ago
JSON representation
A package for parallelizing the fit and flexibly scoring of sklearn machine learning models, with visualization routines.
- Host: GitHub
- URL: https://github.com/jmcarpenter2/parfit
- Owner: jmcarpenter2
- License: mit
- Created: 2017-11-22T20:17:51.000Z (over 7 years ago)
- Default Branch: master
- Last Pushed: 2024-02-13T04:16:38.000Z (about 1 year ago)
- Last Synced: 2025-03-29T14:12:29.323Z (26 days ago)
- Language: Python
- Homepage:
- Size: 1.53 MB
- Stars: 199
- Watchers: 5
- Forks: 29
- Open Issues: 8
-
Metadata Files:
- Readme: README.md
- License: LICENSE.txt
Awesome Lists containing this project
- awesome-python-machine-learning-resources - GitHub - 54% open · ⏱️ 04.04.2020): (超参数优化和AutoML)
README
# parfit
A package for parallelizing the fit and flexibly scoring of sklearn machine learning models, with visualization routines.# This python package is NO LONGER MAINTAINED.
## Alternatives
There are several fantastic alternatives that serve the same purpose as `parfit`, but do it even better.Below I list a few libraries that are very effective at solving the particular problem that parfit originally aimed to solve.
### Hyper-parameter optimization
* [Tune](https://ray.readthedocs.io/en/latest/tune.html)
* [Scikit-Optimize](https://scikit-optimize.github.io/stable/)
* [Scikit-learn](https://scikit-learn.org/stable/modules/grid_search.html)### Visualization of hyper-parameter optimizations
* [Tune + Tensorboard](https://ray.readthedocs.io/en/latest/tune-usage.html#tensorboard)
* [Scikit-Optimize plotting module](https://scikit-optimize.github.io/stable/modules/classes.html#module-skopt.plots)
* [Examples using Scikit-learn + seaborn](https://towardsdatascience.com/using-3d-visualizations-to-tune-hyperparameters-of-ml-models-with-python-ba2885eab2e9)# Deprecated
CURRENT VERSION == 0.220
Installation:
```
$pip install parfit # first time installation
$pip install -U parfit # upgrade to latest version
```and then import into your code using:
```
from parfit import bestFit # Necessary if you wish to use bestFit# Necessary if you wish to run each step sequentially
from parfit.fit import *
from parfit.score import *
from parfit.plot import *
from parfit.crossval import *
```Once imported, you can use bestFit() or other functions freely.
## Easy to use
```
grid = {
'min_samples_leaf': [1, 5, 10, 15, 20, 25],
'max_features': ['sqrt', 'log2', 0.5, 0.6, 0.7],
'n_estimators': [60],
'n_jobs': [-1],
'random_state': [42]
}
paramGrid = ParameterGrid(grid)best_model, best_score, all_models, all_scores = bestFit(RandomForestClassifier(), paramGrid,
X_train, y_train, X_val, y_val, # nfolds=5 [optional, instead of validation set]
metric=roc_auc_score, greater_is_better=True,
scoreLabel='AUC')print(best_model, best_score)
```
```
{max_features': 'sqrt', 'min_samples_leaf': 1, 'n_estimators': 60, 'n_jobs': -1, 'random_state': 42}
0.9627794057231478
```## Interpretable Visualizations
## Notes
1. You can either use **bestFit()** to automate the steps of the process, and optionally plot the scores over the parameter grid, OR you can do each step in order:> `fitModels()` -> `scoreModels()` -> `plotScores()` -> `getBestModel()` -> `getBestScore()`
or
> `crossvalModels()` -> `plotScores()` -> `getBestModel()` -> `getBestScore()`
2. Be sure to specify ALL parameters in the ParameterGrid, even the ones you are not searching over.
3. For example usage, see parfit_ex.ipynb. Each function is well-documented in the .py file. In Jupyter Notebooks, you can see the docs by pressing Shift+Tab(x3). Also, check out the complete documentation [here](docs/documentation.md) along with the [changelog](docs/changelog.md).
4. This package is designed for use with sklearn machine learning models, but in theory will work with any model that has a .fit(X,y) function. Furthermore, the sklearn scoring metrics are typically used, but any function that reads in two vectors and returns a score will work.
5. The plotScores() function will only work for up to a 3D parameterGrid object. That is, you can only view the scores of a grid varying over 1-3 parameters. Other parameters which do not vary can still be set, and you can still train and scores models over a higher dimensional grid.