https://github.com/kpj/pyskim
Quickly create summary statistics for a given dataframe.
https://github.com/kpj/pyskim
Last synced: 24 days ago
JSON representation
Quickly create summary statistics for a given dataframe.
- Host: GitHub
- URL: https://github.com/kpj/pyskim
- Owner: kpj
- License: mit
- Created: 2020-11-01T17:38:21.000Z (over 4 years ago)
- Default Branch: master
- Last Pushed: 2024-02-19T12:47:25.000Z (over 1 year ago)
- Last Synced: 2025-04-10T11:37:53.138Z (about 1 month ago)
- Language: Python
- Size: 47.9 KB
- Stars: 4
- Watchers: 1
- Forks: 0
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# pyskim
[](https://pypi.python.org/pypi/pyskim)
[](https://github.com/kpj/pyskim/actions/workflows/main.yaml)Quickly create summary statistics for a given dataframe.
This package aspires to be as awesome as [skimr](https://github.com/ropensci/skimr).
## Installation
```bash
$ pip install pyskim
```## Usage
### Commandline tool
`pyskim` can be used from the commandline:
```bash
$ pyskim iris.csv
── Data Summary ────────────────────────────────────────────────────────────────────────────────────
type value
----------------- -------
Number of rows 150
Number of columns 5
──────────────────────────────────────────────────
Column type frequency:
Count
------- -------
Float64 4
string 1── Variable type: number ───────────────────────────────────────────────────────────────────────────
name na_count mean sd p0 p25 p50 p75 p100 hist
-- ------------ ---------- ------ ----- ---- ----- ----- ----- ------ ----------
0 sepal_length 0 5.84 0.828 4.3 5.1 5.8 6.4 7.9 ▂▆▃▇▄▇▅▁▁▁
1 sepal_width 0 3.06 0.436 2 2.8 3 3.3 4.4 ▁▁▄▅▇▆▂▂▁▁
2 petal_length 0 3.76 1.77 1 1.6 4.35 5.1 6.9 ▇▃▁▁▂▅▆▄▃▁
3 petal_width 0 1.2 0.762 0.1 0.3 1.3 1.8 2.5 ▇▂▁▂▂▆▁▄▂▃── Variable type: string ───────────────────────────────────────────────────────────────────────────
name na_count n_unique top_counts
-- ------- ---------- ---------- -----------------------------------------
0 species 0 3 setosa: 50, versicolor: 50, virginica: 50
```Full overview:
```bash
$ pyskim --help
Usage: pyskim [OPTIONS]Quickly create summary statistics for a given dataframe.
Options:
-d, --delimiter TEXT Delimiter of file.
-i, --interactive Open prompt with dataframe as `df` after displaying
summary.
--no-dtype-conversion Skip automatic dtype conversion.
--groupby TEXT Group dataframe by this/these variable(s).
--help Show this message and exit.
```### Python API
Alternatively, it is possible to use it in code:
```python
>>> from pyskim import skim
>>> from seaborn import load_dataset>>> iris = load_dataset('iris')
>>> skim(iris)
# ── Data Summary ────────────────────────────────────────────────────────────────────────────────────
# type value
# ----------------- -------
# Number of rows 150
# Number of columns 5
# ──────────────────────────────────────────────────
# Column type frequency:
# Count
# ------- -------
# float64 4
# string 1
#
# ── Variable type: number ───────────────────────────────────────────────────────────────────────────
# name na_count mean sd p0 p25 p50 p75 p100 hist
# -- ------------ ---------- ------ ----- ---- ----- ----- ----- ------ ----------
# 0 sepal_length 0 5.84 0.828 4.3 5.1 5.8 6.4 7.9 ▂▆▃▇▄▇▅▁▁▁
# 1 sepal_width 0 3.06 0.436 2 2.8 3 3.3 4.4 ▁▁▄▅▇▆▂▂▁▁
# 2 petal_length 0 3.76 1.77 1 1.6 4.35 5.1 6.9 ▇▃▁▁▂▅▆▄▃▁
# 3 petal_width 0 1.2 0.762 0.1 0.3 1.3 1.8 2.5 ▇▂▁▂▂▆▁▄▂▃
#
# ── Variable type: string ───────────────────────────────────────────────────────────────────────────
# name na_count n_unique top_counts
# -- --------------- ---------- ---------- -----------------------------------------
# 0 species 0 3 versicolor: 50, setosa: 50, virginica: 50
```