https://github.com/jmcarpenter2/parfit

A package for parallelizing the fit and flexibly scoring of sklearn machine learning models, with visualization routines.
https://github.com/jmcarpenter2/parfit

Last synced: about 2 months ago
JSON representation

A package for parallelizing the fit and flexibly scoring of sklearn machine learning models, with visualization routines.

Host: GitHub
URL: https://github.com/jmcarpenter2/parfit
Owner: jmcarpenter2
License: mit
Created: 2017-11-22T20:17:51.000Z (over 7 years ago)
Default Branch: master
Last Pushed: 2024-02-13T04:16:38.000Z (over 1 year ago)
Last Synced: 2025-03-29T14:12:29.323Z (2 months ago)
Language: Python
Homepage:
Size: 1.53 MB
Stars: 199
Watchers: 5
Forks: 29
Open Issues: 8
Metadata Files:
- Readme: README.md
- License: LICENSE.txt

Awesome Lists containing this project

awesome-python-machine-learning-resources - GitHub - 54% open · ⏱️ 04.04.2020): (超参数优化和AutoML)

README

        # parfit

A package for parallelizing the fit and flexibly scoring of sklearn machine learning models, with visualization routines.

# This python package is NO LONGER MAINTAINED.

## Alternatives

There are several fantastic alternatives that serve the same purpose as `parfit`, but do it even better.

Below I list a few libraries that are very effective at solving the particular problem that parfit originally aimed to solve.

### Hyper-parameter optimization

* [Tune](https://ray.readthedocs.io/en/latest/tune.html)

* [Scikit-Optimize](https://scikit-optimize.github.io/stable/)

* [Scikit-learn](https://scikit-learn.org/stable/modules/grid_search.html)

### Visualization of hyper-parameter optimizations

* [Tune + Tensorboard](https://ray.readthedocs.io/en/latest/tune-usage.html#tensorboard)

* [Scikit-Optimize plotting module](https://scikit-optimize.github.io/stable/modules/classes.html#module-skopt.plots)

* [Examples using Scikit-learn + seaborn](https://towardsdatascience.com/using-3d-visualizations-to-tune-hyperparameters-of-ml-models-with-python-ba2885eab2e9)

# Deprecated

CURRENT VERSION == 0.220

Installation:

```

$pip install parfit # first time installation

$pip install -U parfit # upgrade to latest version

``` 

and then import into your code using:

```

from parfit import bestFit # Necessary if you wish to use bestFit

# Necessary if you wish to run each step sequentially

from parfit.fit import *

from parfit.score import *

from parfit.plot import *

from parfit.crossval import *

```

 Once imported, you can use bestFit() or other functions freely.

## Easy to use

```

grid = {

    'min_samples_leaf': [1, 5, 10, 15, 20, 25],

    'max_features': ['sqrt', 'log2', 0.5, 0.6, 0.7],

    'n_estimators': [60],

    'n_jobs': [-1],

    'random_state': [42]

}

paramGrid = ParameterGrid(grid)

best_model, best_score, all_models, all_scores = bestFit(RandomForestClassifier(), paramGrid,

                                                    X_train, y_train, X_val, y_val, # nfolds=5 [optional, instead of validation set]

                                                    metric=roc_auc_score, greater_is_better=True, 

                                                    scoreLabel='AUC')

print(best_model, best_score)

```

```

{max_features': 'sqrt', 'min_samples_leaf': 1, 'n_estimators': 60, 'n_jobs': -1, 'random_state': 42}

0.9627794057231478

```

## Interpretable Visualizations

![Alt text](/assets/scoring_grid_2D.png?raw=true)

## Notes

1. You can either use **bestFit()** to automate the steps of the process, and optionally plot the scores over the parameter grid, OR you can do each step in order: 

> `fitModels()` -> `scoreModels()` -> `plotScores()` -> `getBestModel()` -> `getBestScore()`

or

> `crossvalModels()` -> `plotScores()` -> `getBestModel()` -> `getBestScore()`

2. Be sure to specify ALL parameters in the ParameterGrid, even the ones you are not searching over.

3. For example usage, see parfit_ex.ipynb. Each function is well-documented in the .py file. In Jupyter Notebooks, you can see the docs by pressing Shift+Tab(x3). Also, check out the complete documentation [here](docs/documentation.md) along with the [changelog](docs/changelog.md).

4. This package is designed for use with sklearn machine learning models, but in theory will work with any model that has a .fit(X,y) function. Furthermore, the sklearn scoring metrics are typically used, but any function that reads in two vectors and returns a score will work.

5. The plotScores() function will only work for up to a 3D parameterGrid object. That is, you can only view the scores of a grid varying over 1-3 parameters. Other parameters which do not vary can still be set, and you can still train and scores models over a higher dimensional grid.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/jmcarpenter2/parfit

Awesome Lists containing this project

README