Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/mdh266/randomforests
Random Forest Library In Python Compatible with Scikit-Learn
https://github.com/mdh266/randomforests
classification data-science decision-tree ensemble-learning machine-learning machine-learning-algorithms pandas python random-forest regression scikit-learn
Last synced: about 2 months ago
JSON representation
Random Forest Library In Python Compatible with Scikit-Learn
- Host: GitHub
- URL: https://github.com/mdh266/randomforests
- Owner: mdh266
- Created: 2017-02-03T20:09:46.000Z (almost 8 years ago)
- Default Branch: master
- Last Pushed: 2021-06-21T22:47:05.000Z (over 3 years ago)
- Last Synced: 2024-06-11T17:54:04.612Z (7 months ago)
- Topics: classification, data-science, decision-tree, ensemble-learning, machine-learning, machine-learning-algorithms, pandas, python, random-forest, regression, scikit-learn
- Language: Python
- Homepage:
- Size: 347 KB
- Stars: 15
- Watchers: 2
- Forks: 6
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
[![Build Status](https://travis-ci.com/mdh266/RandomForests.svg?branch=master)](https://travis-ci.com/mdh266/RandomForests)
[![codecov](https://codecov.io/gh/mdh266/RandomForests/branch/master/graph/badge.svg)](https://codecov.io/gh/mdh266/RandomForests)
[![made-with-python](https://img.shields.io/badge/Made%20with-Python-1f425f.svg)](https://www.python.org/)# Random Forests In Python
--------------## Intoduction
-------------
I started this project to better understand the way [Decision trees](https://en.wikipedia.org/wiki/Decision_tree) and [random forests](https://en.wikipedia.org/wiki/Random_forest) work. At this point the classifiers are only based off the gini-index and the regression models are based off the mean square error. Both the classifiers and regression models are built to work with [Pandas](http://pandas.pydata.org) and [Scikit-Learn](https://scikit-learn.org/)## Examples
Basic classification example using Scikit-learn:
from randomforests import RandomForestClassifier
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.model_selection import GridSearchCV
from sklearn.pipeline import Pipeline
from sklearn.metrics import accuracy_score
from sklearn.datasets import load_breast_cancer
dataset = load_breast_cancer()cols = [dataset.data[:,i] for i in range(4)]
X = pd.DataFrame({k:v for k,v in zip(dataset.feature_names,cols)})
y = pd.Series(dataset.target)X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=24)
pipe = Pipeline([("forest", RandomForestClassifier())])
params = {"forest__max_depth": [1,2,3]}
grid = GridSearchCV(pipe, params, cv=5, n_jobs=-1)
model = grid.fit(X_train,y_train)preds = model.predict(X_test)
print("Accuracy: ", accuracy_score(preds, y_test))
>> Accuracy: 0.9020979020979021
Basic regression example using Scikit-learn:
from randomforests import RandomForestRegressor
from sklearn.metrics import r2_score,
from sklearn.datasets import load_boston
dataset = load_boston()cols = [dataset.data[:,i] for i in range(4)]
X = pd.DataFrame({k:v for k,v in zip(dataset.feature_names,cols)})
y = pd.Series(dataset.target)X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=24)
pipe = Pipeline([("forest", RandomForestRegressor())])
params = {"forest__max_depth": [1,2,3]}
grid = GridSearchCV(pipe, params, cv=5, n_jobs=-1)
model = grid.fit(X,y)preds = model.predict(X_test)
print("R^2 : ", r2_score(y_test,preds))
>> R^2 : 0.37948488681649484
## Installing
-----------------Uses the `setup.py` generated by [PyScaffold](https://pypi.org/project/PyScaffold/). To install the library in development mode use the following:
python setup.py install
## Test
-----------------
Uses the `setup.py` generated by [PyScaffold](https://pypi.org/project/PyScaffold/):python setup.py test
## Dependencies
--------------
Dependencies are minimal:- Python (>= 3.6)
- [Scikit-Learn](https://scikit-learn.org/stable/) (>=0.23)
- [Pandas](https://pandas.pydata.org/) (>=1.0)## References
---------------
- [An Introduction To Statistical Learning](http://www-bcf.usc.edu/~gareth/ISL/)- [Elements Of Statistical Learning](http://statweb.stanford.edu/~tibs/ElemStatLearn/)
- [Scikit-learn Ensemble Methods](http://scikit-learn.org/stable/auto_examples/index.html#ensemble-methods)
- [Scikit-Learn Custom Estimators](https://scikit-learn.org/dev/developers/develop.html)
- [How to Implement Random Forest From Scratch In Python](http://machinelearningmastery.com/implement-random-forest-scratch-python/)
- [How To Implement A Decision Tree From Scratch In Python](http://machinelearningmastery.com/implement-decision-tree-algorithm-scratch-python)