Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/georgian-io-archive/foreshadow
An automatic machine learning system
https://github.com/georgian-io-archive/foreshadow
automatic-machine-learning automl machine-learning pandas python sklearn
Last synced: 12 days ago
JSON representation
An automatic machine learning system
- Host: GitHub
- URL: https://github.com/georgian-io-archive/foreshadow
- Owner: georgian-io-archive
- License: apache-2.0
- Archived: true
- Created: 2018-06-29T15:46:59.000Z (over 6 years ago)
- Default Branch: development
- Last Pushed: 2024-01-09T02:10:46.000Z (about 1 year ago)
- Last Synced: 2024-10-30T04:51:21.929Z (3 months ago)
- Topics: automatic-machine-learning, automl, machine-learning, pandas, python, sklearn
- Language: Python
- Homepage: https://foreshadow.readthedocs.io
- Size: 6.95 MB
- Stars: 29
- Watchers: 13
- Forks: 2
- Open Issues: 54
-
Metadata Files:
- Readme: README.rst
- License: LICENSE
Awesome Lists containing this project
README
Foreshadow: Simple Machine Learning Scaffolding
===============================================|BuildStatus| |DocStatus| |Coverage| |CodeStyle| |License|
Foreshadow is an automatic pipeline generation tool that makes creating, iterating,
and evaluating machine learning pipelines a fast and intuitive experience allowing
data scientists to spend more time on data science and less time on code... |BuildStatus| image:: https://dev.azure.com/georgianpartners/foreshadow/_apis/build/status/georgianpartners.foreshadow?branchName=master
:target: https://dev.azure.com/georgianpartners/foreshadow/_build/latest?definitionId=1&branchName=master.. |DocStatus| image:: https://readthedocs.org/projects/foreshadow/badge/?version=latest
:target: https://foreshadow.readthedocs.io/en/latest/?badge=latest
:alt: Documentation Status.. |Coverage| image:: https://img.shields.io/azure-devops/coverage/georgianpartners/foreshadow/1.svg
:target: https://dev.azure.com/georgianpartners/foreshadow/_build/latest?definitionId=1&branchName=master
:alt: Coverage.. |CodeStyle| image:: https://img.shields.io/badge/code%20style-black-000000.svg
:target: https://github.com/ambv/black
:alt: Code Style.. |License| image:: https://img.shields.io/badge/License-Apache%202.0-blue.svg
:target: https://github.com/georgianpartners/foreshadow/blob/master/LICENSE
:alt: LicenseKey Features
------------
- Scikit-Learn compatible
- Automatic column intent inference
- Numerical
- Categorical
- Text
- Droppable (All values in a column are either the same or different)
- Allow user override on column intent and transformation functions
- Automatic feature preprocessing depending on the column intent type
- Numerical: imputation followed by scaling
- Categorical: a variety of categorical encoding
- Text: TFIDF followed by SVD
- Automatic model selection
- Rapid pipeline development / iterationFeatures in the road map
------------------------
- Automatic feature engineering
- Automatic parameter optimizationForeshadow supports python 3.6+
Installing Foreshadow
---------------------.. code-block:: console
$ pip install foreshadow
Read the documentation to `set up the project from source`_.
.. _set up the project from source: https://foreshadow.readthedocs.io/en/development/developers.html#setting-up-the-project-from-source
Getting Started
---------------To get started with foreshadow, install the package using pip install. This will also
install the dependencies. Now create a simple python script that uses all the
defaults with Foreshadow.First import foreshadow
.. code-block:: python
from foreshadow.foreshadow import Foreshadow
from foreshadow.estimators import AutoEstimator
from foreshadow.utils import ProblemTypeAlso import sklearn, pandas, and numpy for the demo
.. code-block:: python
import pandas as pd
from sklearn.datasets import boston_housing
from sklearn.model_selection import train_test_splitNow load in the boston housing dataset from sklearn into pandas dataframes. This
is a common dataset for testing machine learning models and comes built in to
scikit-learn... code-block:: python
boston = load_boston()
bostonX_df = pd.DataFrame(boston.data, columns=boston.feature_names)
bostony_df = pd.DataFrame(boston.target, columns=['target'])Next, exactly as if working with an sklearn estimator, perform a train test
split on the data and pass the train data into the fit function of a new Foreshadow
object.. code-block:: python
X_train, X_test, y_train, y_test = train_test_split(bostonX_df,
bostony_df, test_size=0.2)problem_type = ProblemType.REGRESSION
estimator = AutoEstimator(
problem_type=problem_type,
auto="tpot",
estimator_kwargs={"max_time_mins": 1},
)
shadow = Foreshadow(estimator=estimator, problem_type=problem_type)
shadow.fit(X_train, y_train)Now `fs` is a fit Foreshadow object for which all feature engineering has been
performed and the estimator has been trained and optimized. It is now possible to
utilize this exactly as a fit sklearn estimator to make predictions... code-block:: python
shadow.score(X_test, y_test)
Great, you now have a working Foreshaow installation! Keep reading to learn how to
export, modify and construct pipelines of your own.Tutorial
------------
We also have a jupyter notebook tutorial to go through more details under the `examples` folder.Documentation
-------------
`Read the docs!`_.. _Read the docs!: https://foreshadow.readthedocs.io/en/development/index.html