Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/dmolitor/xgboost-gridsearch
Combine grid search with early stopping via cross validation
https://github.com/dmolitor/xgboost-gridsearch
gridsearch machine-learning xgboost
Last synced: 18 days ago
JSON representation
Combine grid search with early stopping via cross validation
- Host: GitHub
- URL: https://github.com/dmolitor/xgboost-gridsearch
- Owner: dmolitor
- License: mit
- Created: 2024-02-16T15:17:33.000Z (10 months ago)
- Default Branch: main
- Last Pushed: 2024-04-10T22:12:34.000Z (9 months ago)
- Last Synced: 2024-10-22T23:27:49.545Z (2 months ago)
- Topics: gridsearch, machine-learning, xgboost
- Language: Python
- Homepage:
- Size: 20.5 KB
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# XGBoost Gridsearch
This module allows you to combine grid-search tuning with early stopping via
cross validation.## Dependencies
This requires the following dependencies:
```python
pip install httpimport pandas tqdm xgboost
```## Importing
You can import this code from Github via the `httpimport` module as follows:
```python
import httpimport as hiwith hi.github_repo("dmolitor", "xgboost-gridsearch", ref="main"):
from xgb_gridsearch.grid_search import GridSearch
```## Predicting proportion of working mothers with 1-year-olds
Two important things to note.
- The `GridSearch` class takes as its first argument `param_grid` which is
the same as in `sklearn.model_selection.GridSearchCV`.
- All other keyword arguments are passed directly to `xgboost.cv`.Here is a list of all
[parameter options](https://xgboost.readthedocs.io/en/stable/parameter.html),
and here is the
[documentation](https://xgboost.readthedocs.io/en/latest/python/python_api.html#xgboost.cv)
for `xgboost.cv`.```python
import httpimport as hi
import json
import pandas as pd
import xgboost as xgbwith hi.github_repo("dmolitor", "xgboost-gridsearch", ref="main"):
from xgb_gridsearch.grid_search import GridSearch# Import data
parents = pd.read_csv("https://raw.githubusercontent.com/info3370/info3370.github.io/main/data/parents.csv")
parents["sex"] = parents.sex.astype("category")# Make a testing set
parents_test = pd.DataFrame([{"sex": "female", "child_age": 1}])
parents_test["sex"] = parents_test.sex.astype("category")# Get into DMatrix form
data_train = xgb.DMatrix(
data = parents.drop("at_work", axis=1),
label=parents[["at_work"]],
enable_categorical=True
)
data_test = xgb.DMatrix(data = parents_test, enable_categorical=True)# Set parameters to grid search across
params = {
"objective": ["binary:logistic"],
"learning_rate": [0.1, 0.3],
"max_depth": [4, 6, 8],
"grow_policy": ["depthwise"],
"min_child_weight": [0.5, 1, 3],
"max_leaves": [0, 3, 5, 7]
}# Create grid search object
grid_search = GridSearch(
param_grid=params,
dtrain=data_train,
num_boost_round=1000,
nfold=5,
shuffle=True,
metrics=["auc", "logloss"],
early_stopping_rounds=5,
verbose_eval=False
)
grid_search.fit(verbose=True, minimize_cv_metric=True, refit=True)# Get predicted proportion of working mothers with 1-year-olds
prop_working = round(grid_search.best_model.predict(data_test)[0], 4)# Print optimal parameters
print(json.dumps(grid_search.best_parameters, indent=4))
print("Predicted proportion working mothers:", prop_working)
```