https://github.com/dmolitor/xgboost-gridsearch

Combine grid search with early stopping via cross validation
https://github.com/dmolitor/xgboost-gridsearch

gridsearch machine-learning xgboost

Last synced: about 2 months ago
JSON representation

Combine grid search with early stopping via cross validation

Host: GitHub
URL: https://github.com/dmolitor/xgboost-gridsearch
Owner: dmolitor
License: mit
Created: 2024-02-16T15:17:33.000Z (over 1 year ago)
Default Branch: main
Last Pushed: 2024-04-10T22:12:34.000Z (about 1 year ago)
Last Synced: 2025-02-03T20:53:26.817Z (3 months ago)
Topics: gridsearch, machine-learning, xgboost
Language: Python
Homepage:
Size: 20.5 KB
Stars: 1
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

        # XGBoost Gridsearch

This module allows you to combine grid-search tuning with early stopping via 

cross validation.

## Dependencies

This requires the following dependencies:

```python

pip install httpimport pandas tqdm xgboost

```

## Importing

You can import this code from Github via the `httpimport` module as follows:

```python

import httpimport as hi

with hi.github_repo("dmolitor", "xgboost-gridsearch", ref="main"):

    from xgb_gridsearch.grid_search import GridSearch

```

## Predicting proportion of working mothers with 1-year-olds

Two important things to note. 

- The `GridSearch` class takes as its first argument `param_grid` which is

 the same as in `sklearn.model_selection.GridSearchCV`.

- All other keyword arguments are passed directly to `xgboost.cv`.

Here is a list of all

[parameter options](https://xgboost.readthedocs.io/en/stable/parameter.html),

and here is the

[documentation](https://xgboost.readthedocs.io/en/latest/python/python_api.html#xgboost.cv)

for `xgboost.cv`.

```python

import httpimport as hi

import json

import pandas as pd

import xgboost as xgb

with hi.github_repo("dmolitor", "xgboost-gridsearch", ref="main"):

    from xgb_gridsearch.grid_search import GridSearch

# Import data

parents = pd.read_csv("https://raw.githubusercontent.com/info3370/info3370.github.io/main/data/parents.csv")

parents["sex"] = parents.sex.astype("category")

# Make a testing set

parents_test = pd.DataFrame([{"sex": "female", "child_age": 1}])

parents_test["sex"] = parents_test.sex.astype("category")

# Get into DMatrix form

data_train = xgb.DMatrix(

    data = parents.drop("at_work", axis=1),

    label=parents[["at_work"]],

    enable_categorical=True

)

data_test = xgb.DMatrix(data = parents_test, enable_categorical=True)

# Set parameters to grid search across

params = {

    "objective": ["binary:logistic"],

    "learning_rate": [0.1, 0.3],

    "max_depth": [4, 6, 8],

    "grow_policy": ["depthwise"],

    "min_child_weight": [0.5, 1, 3],

    "max_leaves": [0, 3, 5, 7]

}

# Create grid search object

grid_search = GridSearch(

    param_grid=params,

    dtrain=data_train,

    num_boost_round=1000,

    nfold=5,

    shuffle=True,

    metrics=["auc", "logloss"],

    early_stopping_rounds=5,

    verbose_eval=False

)

grid_search.fit(verbose=True, minimize_cv_metric=True, refit=True)

# Get predicted proportion of working mothers with 1-year-olds

prop_working = round(grid_search.best_model.predict(data_test)[0], 4)

# Print optimal parameters

print(json.dumps(grid_search.best_parameters, indent=4))

print("Predicted proportion working mothers:", prop_working)

```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/dmolitor/xgboost-gridsearch

Awesome Lists containing this project

README