https://github.com/aeturrell/skimpy

skimpy is a light weight tool that provides summary statistics about variables in data frames within the console.
https://github.com/aeturrell/skimpy

data-science eda exploratory-data-analysis pandas statistics summary-statistics

Last synced: 2 months ago
JSON representation

skimpy is a light weight tool that provides summary statistics about variables in data frames within the console.

Host: GitHub
URL: https://github.com/aeturrell/skimpy
Owner: aeturrell
License: mit
Created: 2021-09-01T19:39:56.000Z (almost 4 years ago)
Default Branch: main
Last Pushed: 2025-03-25T15:07:59.000Z (4 months ago)
Last Synced: 2025-04-13T20:36:03.193Z (3 months ago)
Topics: data-science, eda, exploratory-data-analysis, pandas, statistics, summary-statistics
Language: Python
Homepage: https://aeturrell.github.io/skimpy/
Size: 4.81 MB
Stars: 453
Watchers: 7
Forks: 24
Open Issues: 6
Metadata Files:
- Readme: README.md
- Contributing: docs/contributing.qmd
- License: LICENSE
- Code of conduct: docs/code_of_conduct.qmd
- Citation: CITATION.cff

Awesome Lists containing this project

awesome-quarto - Documentation website from Jupyter Notebook - Quarto used to generate a website from a Jupyter notebook containing Python module documentation. (Real-life examples / Websites formats)

README

# Skimpy

A light weight tool for creating summary statistics from dataframes.
![png](docs/logo.png)

![](logo.png)

[![PyPI](https://img.shields.io/pypi/v/skimpy.svg)](https://pypi.org/project/skimpy/)
[![Status](https://img.shields.io/pypi/status/skimpy.svg)](https://pypi.org/project/skimpy/)
[![Python Version](https://img.shields.io/pypi/pyversions/skimpy)](https://pypi.org/project/skimpy)
[![License](https://img.shields.io/pypi/l/skimpy)](https://opensource.org/licenses/MIT)
[![Read the documentation at https://aeturrell.github.io/skimpy/](https://img.shields.io/badge/docs-passing-brightgreen)](https://aeturrell.github.io/skimpy/)
[![Tests](https://github.com/aeturrell/skimpy/workflows/Tests/badge.svg)](https://github.com/aeturrell/skimpy/actions?workflow=Tests)
[![Codecov](https://codecov.io/gh/aeturrell/skimpy/branch/main/graph/badge.svg)](https://codecov.io/gh/aeturrell/skimpy)
[![pre-commit](https://img.shields.io/badge/pre--commit-enabled-brightgreen?logo=pre-commit&logoColor=white)](https://github.com/pre-commit/pre-commit)
[![Ruff](https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/astral-sh/ruff/main/assets/badge/v2.json)](https://github.com/astral-sh/ruff)
[![Google Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/gist/aeturrell/7bf183c559dc1d15ab7e7aaac39ea0ed/skimpy_demo.ipynb)
[![Downloads](https://static.pepy.tech/badge/skimpy)](https://pepy.tech/projects/skimpy)
[![Source](https://img.shields.io/badge/source%20code-github-lightgrey?style=for-the-badge)](https://github.com/aeturrell/skimpy)

![Linux](https://img.shields.io/badge/Linux-FCC624?style=for-the-badge&logo=linux&logoColor=black)
![macOS](https://img.shields.io/badge/mac%20os-000000?style=for-the-badge&logo=macos&logoColor=F0F0F0)
![Windows](https://img.shields.io/badge/Windows-0078D6?style=for-the-badge&logo=windows&logoColor=white)

**skimpy** is a light weight tool that provides summary statistics about variables in **pandas** or **Polars** data frames within the console or your interactive Python window.

Think of it as a super-charged version of **pandas**' `df.describe()`.
[You can find the documentation here](https://aeturrell.github.io/skimpy/).

## Quickstart

`skim` a **pandas** or **polars** dataframe and produce summary statistics within the console
using:

```python
from skimpy import skim

skim(df)
```

where `df` is a **pandas** or **polars** dataframe.

If you need to a dataset to try *skimpy* out on, you can use the built-in test **Pandas** data frame:

```python
from skimpy import generate_test_data, skim

df = generate_test_data()
skim(df)
```

╭──────────────────────────────────────────────── skimpy summary ─────────────────────────────────────────────────╮

│          Data Summary                Data Types               Categories                                        │

│ ┏━━━━━━━━━━━━━━━━━━━┳━━━━━━━━┓ ┏━━━━━━━━━━━━━┳━━━━━━━┓ ┏━━━━━━━━━━━━━━━━━━━━━━━┓                                │

│ ┃ Dataframe         ┃ Values ┃ ┃ Column Type ┃ Count ┃ ┃ Categorical Variables ┃                                │

│ ┡━━━━━━━━━━━━━━━━━━━╇━━━━━━━━┩ ┡━━━━━━━━━━━━━╇━━━━━━━┩ ┡━━━━━━━━━━━━━━━━━━━━━━━┩                                │

│ │ Number of rows    │ 1000   │ │ float64     │ 3     │ │ class                 │                                │

│ │ Number of columns │ 13     │ │ category    │ 2     │ │ location              │                                │

│ └───────────────────┴────────┘ │ datetime64  │ 2     │ └───────────────────────┘                                │

│                                │ object      │ 2     │                                                          │

│                                │ int64       │ 1     │                                                          │

│                                │ bool        │ 1     │                                                          │

│                                │ string      │ 1     │                                                          │

│                                │ timedelta64 │ 1     │                                                          │

│                                └─────────────┴───────┘                                                          │

│                                                     number                                                      │

│ ┏━━━━━━━━━┳━━━━━━┳━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━┳━━━━━━━━┓  │

│ ┃ column  ┃ NA   ┃ NA %  ┃ mean      ┃ sd      ┃ p0         ┃ p25     ┃ p50        ┃ p75    ┃ p100  ┃ hist   ┃  │

│ ┡━━━━━━━━━╇━━━━━━╇━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━╇━━━━━━━━┩  │

│ │ length  │    0 │     0 │    0.5016 │  0.3597 │  1.573e-06 │   0.134 │     0.4976 │ 0.8602 │     1 │ ▇▃▃▃▅▇ │  │

│ │ width   │    0 │     0 │     2.037 │   1.929 │   0.002057 │   0.603 │      1.468 │  2.953 │ 13.91 │  ▇▃▁   │  │

│ │ depth   │    0 │     0 │     10.02 │   3.208 │          2 │       8 │         10 │     12 │    20 │ ▁▃▇▆▃▁ │  │

│ │ rnd     │  118 │  11.8 │  -0.01977 │   1.002 │     -2.809 │ -0.7355 │ -0.0007736 │ 0.6639 │ 3.717 │ ▁▅▇▅▁  │  │

│ └─────────┴──────┴───────┴───────────┴─────────┴────────────┴─────────┴────────────┴────────┴───────┴────────┘  │

│                                                    category                                                     │

│ ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━┓  │

│ ┃ column                      ┃ NA         ┃ NA %            ┃ ordered                 ┃ unique              ┃  │

│ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━┩  │

│ │ class                       │          0 │               0 │ False                   │                   2 │  │

│ │ location                    │          1 │             0.1 │ False                   │                   5 │  │

│ └─────────────────────────────┴────────────┴─────────────────┴─────────────────────────┴─────────────────────┘  │

│                                                      bool                                                       │

│ ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━┓  │

│ ┃ column                          ┃ true             ┃ true rate                      ┃ hist                 ┃  │

│ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━┩  │

│ │ booly_col                       │              516 │                           0.52 │        ▇    ▇        │  │

│ └─────────────────────────────────┴──────────────────┴────────────────────────────────┴──────────────────────┘  │

│                                                    datetime                                                     │

│ ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┓  │

│ ┃ column                       ┃ NA    ┃ NA %     ┃ first              ┃ last              ┃ frequency       ┃  │

│ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━┩  │

│ │ datetime                     │     0 │        0 │     2018-01-31     │    2101-04-30     │ ME              │  │

│ │ datetime_no_freq             │     3 │      0.3 │     1992-01-05     │    2023-03-04     │ None            │  │

│ └──────────────────────────────┴───────┴──────────┴────────────────────┴───────────────────┴─────────────────┘  │

│                                            <class 'datetime.date'>                                              │

│ ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┓  │

│ ┃ column                           ┃ NA    ┃ NA %     ┃ first            ┃ last             ┃ frequency      ┃  │

│ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━┩  │

│ │ datetime.date                    │     0 │        0 │ 2018-01-31       │ 2101-04-30       │ ME             │  │

│ │ datetime.date_no_freq            │     0 │        0 │ 1992-01-05       │ 2023-03-04       │ None           │  │

│ └──────────────────────────────────┴───────┴──────────┴──────────────────┴──────────────────┴────────────────┘  │

│                                                  timedelta64                                                    │

│ ┏━━━━━━━━━━━━━━━━┳━━━━━━┳━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┓  │

│ ┃ column         ┃ NA   ┃ NA %    ┃ mean                   ┃ median                 ┃ max                    ┃  │

│ ┡━━━━━━━━━━━━━━━━╇━━━━━━╇━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━┩  │

│ │ time diff      │    5 │     0.5 │        8 days 00:05:47 │        0 days 00:00:00 │       26 days 00:00:00 │  │

│ └────────────────┴──────┴─────────┴────────────────────────┴────────────────────────┴────────────────────────┘  │

│                                                     string                                                      │

│ ┏━━━━━━━━┳━━━━┳━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━┓  │

│ ┃        ┃    ┃      ┃            ┃           ┃            ┃           ┃ chars per  ┃ words per ┃ total      ┃  │

│ ┃ column ┃ NA ┃ NA % ┃ shortest   ┃ longest   ┃ min        ┃ max       ┃ row        ┃ row       ┃ words      ┃  │

│ ┡━━━━━━━━╇━━━━╇━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━┩  │

│ │ text   │  6 │  0.6 │ How are    │ Indeed,   │ How are    │ What      │       31.1 │       5.8 │       5761 │  │

│ │        │    │      │ you?       │ it was    │ you?       │ weather!  │            │           │            │  │

│ │        │    │      │            │ the most  │            │           │            │           │            │  │

│ │        │    │      │            │ outrageou │            │           │            │           │            │  │

│ │        │    │      │            │ sly       │            │           │            │           │            │  │

│ │        │    │      │            │ pompous   │            │           │            │           │            │  │

│ │        │    │      │            │ cat I     │            │           │            │           │            │  │

│ │        │    │      │            │ have ever │            │           │            │           │            │  │

│ │        │    │      │            │ seen.     │            │           │            │           │            │  │

│ └────────┴────┴──────┴────────────┴───────────┴────────────┴───────────┴────────────┴───────────┴────────────┘  │

│                                                     object                                                      │

│ ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┓  │

│ ┃ column                                                                  ┃ NA           ┃ NA %              ┃  │

│ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━┩  │

│ │ datetime.date                                                           │            0 │                 0 │  │

│ │ datetime.date_no_freq                                                   │            0 │                 0 │  │

│ └─────────────────────────────────────────────────────────────────────────┴──────────────┴───────────────────┘  │

╰────────────────────────────────────────────────────── End ──────────────────────────────────────────────────────╯

It is recommended that you set your datatypes before using **skimpy** (for example converting any text columns to pandas string datatype), as this will produce richer statistical summaries. However, the `skim()` function will try and guess what the datatypes of your columns are.

## Requirements

You can find a full list of requirements in the [pyproject.toml](https://github.com/aeturrell/skimpy/blob/main/pyproject.toml) file.

You can try this package out right now in your browser using this
[Google Colab notebook](https://colab.research.google.com/gist/aeturrell/7bf183c559dc1d15ab7e7aaac39ea0ed/skimpy_demo.ipynb)
(requires a Google account). Note that the Google Colab notebook uses the latest package released on PyPI (rather than the development release).

## Installation

You can install the latest release of *skimpy* via
[pip](https://pip.pypa.io/) from [PyPI](https://pypi.org/):

```bash
$ pip install skimpy
```

To install the development version from git, use:

```bash
$ pip install git+https://github.com/aeturrell/skimpy.git
```

For development, see [contributing](contributing.qmd).

## License

Distributed under the terms of the [MIT license](https://opensource.org/licenses/MIT), *skimpy* is free and open source software.

## Issues

If you encounter any problems, please [file an issue](https://github.com/aeturrell/skimpy/issues) along with a detailed description.

## Credits

This project was generated from [\@cjolowicz](https://github.com/cjolowicz)\'s [Hypermodern Python Cookiecutter](https://github.com/cjolowicz/cookiecutter-hypermodern-python) template.

**skimpy** was inspired by the R package [**skimr**](https://docs.ropensci.org/skimr/articles/skimr.html) and by exploratory Python packages including [**ydata_profiling**](https://docs.profiling.ydata.ai) and [**dataprep**](https://dataprep.ai/), from which the `clean_columns` function comes.

This package would not have been possible without the [**Rich**](https://github.com/Textualize/rich) package.

The package is built with [poetry](https://python-poetry.org/), while the documentation is built with [Quarto](https://quarto.org/) and [Quartodoc](https://github.com/machow/quartodoc) (a Python package). Tests are run with [nox](https://nox.thea.codes/en/stable/).

Using **skimpy** in your paper? Let us know by raising an issue beginning with "citation" and we'll add it to this page.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/aeturrell/skimpy

Awesome Lists containing this project

README