https://github.com/skrub-data/skrub

Machine learning with dataframes
https://github.com/skrub-data/skrub

data data-analysis data-cleaning data-preparation data-preprocessing data-science data-wrangling dataframe dataframes dirty-data machine-learning

Last synced: about 1 month ago
JSON representation

Machine learning with dataframes

Host: GitHub
URL: https://github.com/skrub-data/skrub
Owner: skrub-data
License: bsd-3-clause
Created: 2018-03-12T23:32:22.000Z (over 7 years ago)
Default Branch: main
Last Pushed: 2025-05-06T15:31:20.000Z (about 2 months ago)
Last Synced: 2025-05-06T16:50:08.415Z (about 2 months ago)
Topics: data, data-analysis, data-cleaning, data-preparation, data-preprocessing, data-science, data-wrangling, dataframe, dataframes, dirty-data, machine-learning
Language: Python
Homepage: https://skrub-data.org/
Size: 12.4 MB
Stars: 1,380
Watchers: 20
Forks: 128
Open Issues: 100
Metadata Files:
- Readme: README.rst
- Changelog: CHANGES.rst
- Contributing: CONTRIBUTING.rst
- License: LICENSE.txt
- Code of conduct: CODE_OF_CONDUCT.md

Awesome Lists containing this project

README

        skrub

=====

.. image:: https://skrub-data.github.io/stable/_static/skrub.svg

   :align: center

   :width: 50 %

   :alt: skrub logo

|py_ver| |pypi_var| |pypi_dl| |codecov| |circleci| |black|

.. |py_ver| image:: https://img.shields.io/pypi/pyversions/skrub

.. |pypi_var| image:: https://img.shields.io/pypi/v/skrub?color=informational

.. |pypi_dl| image:: https://img.shields.io/pypi/dm/skrub

.. |codecov| image:: https://img.shields.io/codecov/c/github/skrub-data/skrub/main

.. |circleci| image:: https://img.shields.io/circleci/build/github/skrub-data/skrub/main?label=CircleCI

.. |black| image:: https://img.shields.io/badge/code%20style-black-000000.svg

**skrub** (formerly *dirty_cat*) is a Python

library that facilitates prepping your tables for machine learning.

If you like the package, spread the word and ⭐ this repository!

You can also join the `discord server `_.

Website: https://skrub-data.org/

What can skrub do?

------------------

The goal of skrub is to bridge the gap between tabular data sources and machine-learning models.

skrub provides high-level tools for joining dataframes (``Joiner``, ``AggJoiner``, ...),

encoding columns (``MinHashEncoder``, ``ToCategorical``, ...), building a pipeline

(``TableVectorizer``, ``tabular_learner``, ...), and exploring interactively your data (``TableReport``).

>>> from skrub.datasets import fetch_employee_salaries

>>> dataset = fetch_employee_salaries()

>>> df = dataset.X

>>> y = dataset.y

>>> df.iloc[0]

gender                                                                     F

department                                                               POL

department_name                                         Department of Police

division                   MSB Information Mgmt and Tech Division Records...

assignment_category                                         Fulltime-Regular

employee_position_title                          Office Services Coordinator

date_first_hired                                                  09/22/1986

year_first_hired                                                        1986

>>> from sklearn.model_selection import cross_val_score

>>> from skrub import tabular_learner

>>> cross_val_score(tabular_learner('regressor'), df, y)

array([0.89370447, 0.89279068, 0.92282557, 0.92319094, 0.92162666])

See our `examples `_.

Installation

------------

skrub can easily be installed via ``pip`` or ``conda``. For more installation information, see

the `installation instructions `_.

Contributing

------------

The best way to support the development of skrub is to spread the word!

Also, if you already are a skrub user, we would love to hear about your use cases and challenges in the `Discussions `_ section.

To report a bug or suggest enhancements, please

`open an issue `_.

If you want to contribute directly to the library, then check the

`how to contribute `_ page on

the website for more information.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/skrub-data/skrub

Awesome Lists containing this project

README