https://github.com/alexandregazagnes/scikit-transformers
Very usefull package to enable and provide custom transformers such as LogColumnTransformer, BoolColumnTransformers and others fancy transformers.
https://github.com/alexandregazagnes/scikit-transformers
data data-science log python scikit-learn transformer
Last synced: 8 months ago
JSON representation
Very usefull package to enable and provide custom transformers such as LogColumnTransformer, BoolColumnTransformers and others fancy transformers.
- Host: GitHub
- URL: https://github.com/alexandregazagnes/scikit-transformers
- Owner: AlexandreGazagnes
- License: gpl-3.0
- Created: 2024-01-26T08:29:56.000Z (over 2 years ago)
- Default Branch: main
- Last Pushed: 2024-07-19T14:03:29.000Z (almost 2 years ago)
- Last Synced: 2024-10-30T04:54:29.635Z (over 1 year ago)
- Topics: data, data-science, log, python, scikit-learn, transformer
- Language: Python
- Homepage: https://alexandregazagnes.github.io/scikit-transformers
- Size: 10.1 MB
- Stars: 4
- Watchers: 5
- Forks: 2
- Open Issues: 12
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README

[](https://www.gnu.org/licenses/gpl-3.0)


[](https://www.python.org/dev/peps/pep-0008/)
[](https://python-poetry.org/)






# Scikit-transformers : Scikit-learn + Custom transformers
## About
**scikit-transformers** is a very usefull package to enable and provide custom transformers such as ```LogColumnTransformer```, ```BoolColumnTransformers``` and others fancy transformers.
It was created to provide a simple way to use custom transformers in ```scikit-learn``` pipelines, and allow to use them in a ```scikit-learn ```model, using ```GridSearchCV``` for testing and tuning hyperparameters.
The starting point was to provide a simple ```LogColumnTransformer```, which is a simple wrapper around the numpy log function, making possible to use a skew threshold to apply the log transformation only on columns with a skew superior to a given threshold.
With ```scikit-transformers```, it is now possible to use this ```LogColumnTransformer``` in transformer in a ```GridSearchCV``` using a skew threshold as hyperparameter to find what columns are good to log or not.
```LogColumnTransformer``` is one of the many transformers implemented in ```scikit-transformers```.
## Installation
Using regular pip and venv tools :
```bash
python3 -m venv .venv
source .venv/bin/activate
pip install scikit-transformers
```
## Usage
For a very basic usage :
```python
import pandas as pd
from sktransf.trasnformer import LogColumnTransformer
df = pd.DataFrame(
{ "a": range(10),
"b": range(10)
}
)
logger = LogColumnTransformer()
logger.fit_transform(df)
df_transf = logger.transform(df)
```
Using common transformers :
```python
import pandas as pd
from sktransf.transformer import LogColumnTransformer, BoolColumnTransformer
from sktransf.selector import DropUniqueColumnSelector
df = pd.DataFrame(
{ "a": range(10),
"b": range(10)
}
)
df_bool = BoolColumnTransformer().fit_transform(df)
df_unique = DropUniqueColumnTransformer().fit_transform(df)
df_logged = LogColumnTransformer().fit_transform(df)
```
Using a pipeline with a scikit-learn model :
```python
import pandas as pd
from sklearn.pipeline import Pipeline
from sklearn.linear_model import LinearRegression
from sktransf.transformer import LogColumnTransformer, BoolColumnTransformer
from sktransf.selector import DropUniqueColumnSelector
pipe = Pipeline([
('bool', BoolColumnTransformer()),
('unique', DropUniqueColumnTransformer()),
('log', LogColumnTransformer()),
('model', LinearRegression())
])
X = pd.DataFrame(
{ "a": range(10),
"b": range(10)
}
)
y = range(10)
pipe.fit(X, y)
y_pred = pipe.predict(X)
```
## Documentation
For more specific information, please refer to the notebooks:
* Transformers :
* [LogColumnTransformer notebook](https://github.com/AlexandreGazagnes/scikit-transformers/blob/main/docs/notebooks/transformer/LogColumnTransformer.ipynb)
* [BoolColumnTransformer notebook](https://github.com/AlexandreGazagnes/scikit-transformers/blob/main/docs/notebooks/transformer/BoolColumnTransformer.ipynb)
* Selectors :
* [DropUniqueColumnSelector notebook](https://github.com/AlexandreGazagnes/scikit-transformers/blob/main/docs/notebooks/selector/DropUniqueColumnSelector.ipynb)
* [DropSkuColumnSelector notebook](https://github.com/AlexandreGazagnes/scikit-transformers/blob/main/docs/notebooks/selector/DropSkuColumnSelector.ipynb)
* Pipelines :
* [Pipelines notebook](https://github.com/AlexandreGazagnes/scikit-transformers/blob/main/docs/notebooks/Pipelines.ipynb)
A complete documentation is be available on the [github page](https://alexandregazagnes.github.io/scikit-transformers/).
## Changelog, Releases and Roadmap
Please refer to the [changelog](https://alexandregazagnes.github.io/scikit-transformers/CHANGELOG/) page for more information.
## Contributing
Pull requests are welcome.
For major changes, please open an issue first to discuss what you would like to change.
For more information, please refer to the [contributing](https://alexandregazagnes.github.io/scikit-transformers/CONTRIBUTING/) page.
## License
[GPLv3](https://raw.githubusercontent.com/AlexandreGazagnes/scikit-transformers/main/LICENSE)