
An open API service indexing awesome lists of open source software.

skimpy is a light weight tool that provides summary statistics about variables in data frames within the console.

data-science eda exploratory-data-analysis pandas statistics summary-statistics

Last synced: about 1 month ago
JSON representation

skimpy is a light weight tool that provides summary statistics about variables in data frames within the console.




# Skimpy

A light weight tool for creating summary statistics from dataframes.


[![Python Version](](
[![Read the documentation at](](
[![Google Colab](](


**skimpy** is a light weight tool that provides summary statistics about variables in **pandas** or **Polars** data frames within the console or your interactive Python window.

Think of it as a super-charged version of **pandas**' `df.describe()`.
[You can find the documentation here](

## Quickstart

`skim` a **pandas** dataframe and produce summary statistics within the console

from skimpy import skim


where `df` is a dataframe. Alternatively, use `skim_polars()` on **Polars** dataframes.

If you need to a dataset to try _skimpy_ out on, you can use the built-in test **Pandas** data frame:

from skimpy import skim, generate_test_data

df = generate_test_data()

╭──────────────────────────────────────────────── skimpy summary ─────────────────────────────────────────────────╮

Data Summary Data Types Categories
│ ┏━━━━━━━━━━━━━━━━━━━┳━━━━━━━━┓ ┏━━━━━━━━━━━━━┳━━━━━━━┓ ┏━━━━━━━━━━━━━━━━━━━━━━━┓ │
│ ┃ dataframe Values ┃ ┃ Column Type Count ┃ ┃ Categorical Variables ┃ │
│ ┡━━━━━━━━━━━━━━━━━━━╇━━━━━━━━┩ ┡━━━━━━━━━━━━━╇━━━━━━━┩ ┡━━━━━━━━━━━━━━━━━━━━━━━┩ │
│ │ Number of rows │ 1000 │ │ float64 │ 3 │ │ class │ │
│ │ Number of columns │ 13 │ │ category │ 2 │ │ location │ │
│ └───────────────────┴────────┘ │ datetime64 │ 2 │ └───────────────────────┘ │
│ │ object │ 2 │ │
│ │ int64 │ 1 │ │
│ │ bool │ 1 │ │
│ │ string │ 1 │ │
│ │ timedelta64 │ 1 │ │
│ └─────────────┴───────┘ │
│ ┏━━━━━━━━━━━━━━━━┳━━━━━━┳━━━━━━━━┳━━━━━━━━━┳━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━━┓ │
│ ┃ column_name NA NA % mean sd p0 p25 p50 p75 p100 hist ┃ │
│ ┡━━━━━━━━━━━━━━━━╇━━━━━━╇━━━━━━━━╇━━━━━━━━━╇━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━━┩ │
│ │ length 0 0 0.5 0.36 1.6e-06 0.13 0.5 0.86 1▇▃▃▃▅▇ │ │
│ │ width 0 0 2 1.9 0.0021 0.6 1.5 3 14 ▇▃▁ │ │
│ │ depth 0 0 10 3.2 2 8 10 12 20▁▃▇▆▃▁ │ │
│ │ rnd 118 11.8 -0.02 1 -2.8 -0.74 -0.00077 0.66 3.7▁▅▇▅▁ │ │
│ └────────────────┴──────┴────────┴─────────┴───────┴───────────┴────────┴───────────┴───────┴───────┴────────┘ │
│ ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━┓ │
│ ┃ column_name NA NA % ordered unique ┃ │
│ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━┩ │
│ │ class 0 0False 2 │ │
│ │ location 1 0.1False 5 │ │
│ └──────────────────────────────────┴───────────┴────────────────┴───────────────────────┴────────────────────┘ │
│ ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━┓ │
│ ┃ column_name true true rate hist ┃ │
│ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━┩ │
│ │ booly_col 516 0.52 ▇ ▇ │ │
│ └────────────────────────────────────┴─────────────────┴───────────────────────────────┴─────────────────────┘ │
│ ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┓ │
│ ┃ column_name NA NA % first last frequency ┃ │
│ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━┩ │
│ │ datetime 0 0 2018-01-31 2101-04-30 M │ │
│ │ datetime_no_freq 3 0.3 1992-01-05 2023-03-04 None │ │
│ └──────────────────────────────┴───────┴──────────┴────────────────────┴───────────────────┴─────────────────┘ │
<class ''>
│ ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┓ │
│ ┃ column_name NA NA % first last frequency ┃ │
│ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━┩ │
│ │ 0 02018-01-31 2101-04-30 M │ │
│ │ datetime.date_no_freq 0 01992-01-05 2023-03-04 None │ │
│ └──────────────────────────────────┴───────┴──────────┴──────────────────┴──────────────────┴────────────────┘ │
│ ┏━━━━━━━━━━━━━━━━━━┳━━━━━━┳━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┓ │
│ ┃ column_name NA NA % mean median max ┃ │
│ ┡━━━━━━━━━━━━━━━━━━╇━━━━━━╇━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━┩ │
│ │ time diff 5 0.5 8 days 00:05:47 0 days 00:00:00 26 days 00:00:00 │ │
│ └──────────────────┴──────┴─────────┴───────────────────────┴───────────────────────┴────────────────────────┘ │
│ ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━┓ │
│ ┃ column_name NA NA % words per row total words ┃ │
│ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━┩ │
│ │ text 6 0.6 5.8 5761 │ │
│ └───────────────────────────┴─────────┴────────────┴──────────────────────────────┴──────────────────────────┘ │
╰────────────────────────────────────────────────────── End ──────────────────────────────────────────────────────╯

It is recommended that you set your datatypes before using **skimpy** (for example converting any text columns to pandas string datatype), as this will produce richer statistical summaries. However, the `skim()` function will try and guess what the datatypes of your columns are.

## Requirements

You can find a full list of requirements in the [pyproject.toml]( file.

You can try this package out right now in your browser using this
[Google Colab notebook](
(requires a Google account). Note that the Google Colab notebook uses the latest package released on PyPI (rather than the development release).

## Installation

You can install the latest release of _skimpy_ via
[pip]( from [PyPI](

$ pip install skimpy

To install the development version from git, use:

$ pip install git+

For development, see [contributing](contributing.qmd).

## License

Distributed under the terms of the [MIT license](, _skimpy_ is free and open source software.

## Issues

If you encounter any problems, please [file an issue]( along with a detailed description.

## Credits

This project was generated from [\@cjolowicz](\'s [Hypermodern Python Cookiecutter]( template.

**skimpy** was inspired by the R package [**skimr**]( and by exploratory Python packages including [**ydata_profiling**]( and [**dataprep**](, from which the `clean_columns` function comes.

This package would not have been possible without the [**Rich**]( package.

The package is built with [poetry](, while the documentation is built with [Quarto]( and [Quartodoc]( (a Python package). Tests are run with [nox](

Using **skimpy** in your paper? Let us know by raising an issue beginning with "citation" and we'll add it to this page.