https://github.com/civisanalytics/civisml-extensions
scikit-learn-compatible estimators from Civis Analytics
https://github.com/civisanalytics/civisml-extensions
Last synced: 7 months ago
JSON representation
scikit-learn-compatible estimators from Civis Analytics
- Host: GitHub
- URL: https://github.com/civisanalytics/civisml-extensions
- Owner: civisanalytics
- License: bsd-3-clause
- Created: 2017-09-11T17:20:15.000Z (about 8 years ago)
- Default Branch: master
- Last Pushed: 2021-11-04T02:19:22.000Z (about 4 years ago)
- Last Synced: 2025-04-15T14:04:35.977Z (7 months ago)
- Language: Python
- Size: 118 KB
- Stars: 59
- Watchers: 74
- Forks: 19
- Open Issues: 3
-
Metadata Files:
- Readme: README.rst
- Changelog: CHANGELOG.md
- Contributing: CONTRIBUTING.md
- License: LICENSE.md
- Code of conduct: CODE_OF_CONDUCT.md
Awesome Lists containing this project
- awesome-python-data-science - civisml-extensions - scikit-learn-compatible estimators from Civis Analytics. (Machine Learning Frameworks)
README
civisml-extensions
==================
.. image:: https://www.travis-ci.org/civisanalytics/civisml-extensions.svg?branch=master
:target: https://www.travis-ci.org/civisanalytics/civisml-extensions
scikit-learn-compatible estimators from Civis Analytics
Installation
------------
Installation with ``pip`` is recommended::
$ pip install civisml-extensions
For development, a few additional dependencies are needed::
$ pip install -r dev-requirements.txt
Contents and Usage
------------------
This package contains `scikit-learn`_-compatible estimators for stacking (
``StackedClassifier``, ``StackedRegressor``), non-negative linear regression (
``NonNegativeLinearRegression``), preprocessing pandas_ ``DataFrames`` (
``DataFrameETL``), and using Hyperband_ for cross-validating hyperparameters (
``HyperbandSearchCV``).
Usage of these estimators follows the standard sklearn conventions. Here is an
example of using the ``StackedClassifier``:
.. code-block:: python
>>> from sklearn.linear_model import LogisticRegression
>>> from sklearn.ensemble import RandomForestClassifier
>>> from civismlext.stacking import StackedClassifier
>>>
>>> # Define some Train data and labels
>>> Xtrain, ytrain = ,
>>>
>>> # Note that the final estimator 'metalr' is the meta-estimator
>>> estlist = [('rf', RandomForestClassifier()),
>>> ('lr', LogisticRegression()),
>>> ('metalr', LogisticRegression())]
>>>
>>> mysm = StackedClassifier(estlist)
>>> # Set some parameters, if you didn't set them at instantiation
>>> mysm.set_params(rf__random_state=7, lr__random_state=8,
>>> metalr__random_state=9, metalr__C=10**7)
>>>
>>> # Fit
>>> mysm.fit(Xtrain, ytrain)
>>>
>>> # Predict!
>>> ypred = mysm.predict_proba(Xtest)
You can learn more about stacking and see an example use of the ``StackedRegressor`` and ``NonNegativeLinearRegression`` estimators in `a talk presented at PyData NYC`_ in November, 2017.
See the doc strings of the various estimators for more information.
Contributing
------------
Please see ``CONTRIBUTING.md`` for information about contributing to this project.
License
-------
BSD-3
See ``LICENSE.md`` for details.
.. _scikit-learn: http://scikit-learn.org/
.. _pandas: http://pandas.pydata.org/
.. _Hyperband: https://arxiv.org/abs/1603.06560
.. _a talk presented at PyData NYC: https://www.youtube.com/watch?v=3gpf1lGwecA