{"id":13737747,"url":"https://github.com/chris1610/sidetable","last_synced_at":"2025-04-13T00:40:20.321Z","repository":{"id":55642190,"uuid":"266654506","full_name":"chris1610/sidetable","owner":"chris1610","description":"sidetable builds simple but useful summary tables of your data","archived":false,"fork":false,"pushed_at":"2022-10-29T21:08:52.000Z","size":61,"stargazers_count":388,"open_issues_count":7,"forks_count":30,"subscribers_count":9,"default_branch":"master","last_synced_at":"2025-04-04T03:07:38.831Z","etag":null,"topics":["pandas","pandas-dataframe","python3"],"latest_commit_sha":null,"homepage":"https://pbpython.com","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/chris1610.png","metadata":{"files":{"readme":"README.md","changelog":"HISTORY.md","contributing":"CONTRIBUTING.md","funding":".github/FUNDING.yml","license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null},"funding":{"github":"chris1610"}},"created_at":"2020-05-25T01:16:14.000Z","updated_at":"2025-03-24T08:30:22.000Z","dependencies_parsed_at":"2022-08-15T05:20:23.273Z","dependency_job_id":null,"html_url":"https://github.com/chris1610/sidetable","commit_stats":null,"previous_names":[],"tags_count":9,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chris1610%2Fsidetable","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chris1610%2Fsidetable/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chris1610%2Fsidetable/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chris1610%2Fsidetable/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/chris1610","download_url":"https://codeload.github.com/chris1610/sidetable/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248650414,"owners_count":21139671,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["pandas","pandas-dataframe","python3"],"created_at":"2024-08-03T03:01:59.472Z","updated_at":"2025-04-13T00:40:20.295Z","avatar_url":"https://github.com/chris1610.png","language":"Python","funding_links":["https://github.com/sponsors/chris1610"],"categories":["Python"],"sub_categories":[],"readme":"# sidetable\n\n\n[![Pypi link](https://img.shields.io/pypi/v/sidetable.svg)](https://pypi.python.org/pypi/sidetable)\n![PyPI - Downloads](https://img.shields.io/pypi/dw/sidetable)\n\nsidetable started as a supercharged combination of pandas `value_counts` plus `crosstab` \nthat builds simple but useful summary tables of your pandas DataFrame. It has since expanded \nto provide support for common and useful pandas tasks such as adding subtotals to your \nDataFrame or flattening hierarchical columns.\n\n\nUsage is straightforward. Install and `import sidetable`. Then access it through the \nnew `.stb` accessor on your DataFrame. \n\nFor the Titanic data: `df.stb.freq(['class'])` will build a frequency table like this:\n\n|    | class   |   count |   percent |   cumulative_count |   cumulative_percent |\n|---:|:--------|--------:|----------:|-------------------:|---------------------:|\n|  0 | Third   |     491 |   55.1066 |                491 |              55.1066 |\n|  1 | First   |     216 |   24.2424 |                707 |              79.349  |\n|  2 | Second  |     184 |   20.651  |                891 |             100      |\n\nYou can also summarize missing values with `df.stb.missing()`:\n\n|             |   missing |   total |   percent |\n|:------------|----------:|--------:|----------:|\n| deck        |       688 |     891 | 77.2166   |\n| age         |       177 |     891 | 19.8653   |\n| embarked    |         2 |     891 |  0.224467 |\n| embark_town |         2 |     891 |  0.224467 |\n| survived    |         0 |     891 |  0        |\n| pclass      |         0 |     891 |  0        |\n| sex         |         0 |     891 |  0        |\n| sibsp       |         0 |     891 |  0        |\n| parch       |         0 |     891 |  0        |\n| fare        |         0 |     891 |  0        |\n| class       |         0 |     891 |  0        |\n| who         |         0 |     891 |  0        |\n| adult_male  |         0 |     891 |  0        |\n| alive       |         0 |     891 |  0        |\n| alone       |         0 |     891 |  0        |\n\nYou can group the data and add subtotals and grand totals with `stb.subtotal()`:\n\n```python\ndf.groupby(['sex', 'class']).agg({'fare': ['sum']}).stb.subtotal()\n```\n\n\u003ctable border=\"1\" class=\"dataframe\"\u003e\n  \u003cthead\u003e\n    \u003ctr\u003e\n      \u003cth\u003e\u003c/th\u003e\n      \u003cth\u003e\u003c/th\u003e\n      \u003cth\u003efare\u003c/th\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n      \u003cth\u003e\u003c/th\u003e\n      \u003cth\u003e\u003c/th\u003e\n      \u003cth\u003esum\u003c/th\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n      \u003cth\u003esex\u003c/th\u003e\n      \u003cth\u003eclass\u003c/th\u003e\n      \u003cth\u003e\u003c/th\u003e\n    \u003c/tr\u003e\n  \u003c/thead\u003e\n  \u003ctbody\u003e\n    \u003ctr\u003e\n      \u003cth rowspan=\"4\" valign=\"top\"\u003efemale\u003c/th\u003e\n      \u003cth\u003eFirst\u003c/th\u003e\n      \u003ctd\u003e9975.8250\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n      \u003cth\u003eSecond\u003c/th\u003e\n      \u003ctd\u003e1669.7292\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n      \u003cth\u003eThird\u003c/th\u003e\n      \u003ctd\u003e2321.1086\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n      \u003cth\u003efemale - subtotal\u003c/th\u003e\n      \u003ctd\u003e13966.6628\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n      \u003cth rowspan=\"4\" valign=\"top\"\u003emale\u003c/th\u003e\n      \u003cth\u003eFirst\u003c/th\u003e\n      \u003ctd\u003e8201.5875\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n      \u003cth\u003eSecond\u003c/th\u003e\n      \u003ctd\u003e2132.1125\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n      \u003cth\u003eThird\u003c/th\u003e\n      \u003ctd\u003e4393.5865\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n      \u003cth\u003emale - subtotal\u003c/th\u003e\n      \u003ctd\u003e14727.2865\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n      \u003cth\u003egrand_total\u003c/th\u003e\n      \u003cth\u003e\u003c/th\u003e\n      \u003ctd\u003e28693.9493\u003c/td\u003e\n    \u003c/tr\u003e\n  \u003c/tbody\u003e\n\u003c/table\u003e\n\nYou can also turn a hierarchical column structure into this:\n\n```python\ntitanic.groupby(['embark_town', 'class', 'sex']).agg({'fare': ['sum'], 'age': ['mean']}).unstack().stb.flatten()\n```\n\n|    | embark_town   | class   |   fare_sum_female |   fare_sum_male |   age_mean_female |   age_mean_male |\n|---:|:--------------|:--------|------------------:|----------------:|------------------:|----------------:|\n|  0 | Cherbourg     | First   |          4972.53  |        3928.54  |           36.0526 |         40.1111 |\n|  1 | Cherbourg     | Second  |           176.879 |         254.212 |           19.1429 |         25.9375 |\n|  2 | Cherbourg     | Third   |           337.983 |         402.146 |           14.0625 |         25.0168 |\n|  3 | Queenstown    | First   |            90     |          90     |           33      |         44      |\n|  4 | Queenstown    | Second  |            24.7   |          12.35  |           30      |         57      |\n|  5 | Queenstown    | Third   |           340.159 |         465.046 |           22.85   |         28.1429 |\n|  6 | Southampton   | First   |          4753.29  |        4183.05  |           32.7045 |         41.8972 |\n|  7 | Southampton   | Second  |          1468.15  |        1865.55  |           29.7197 |         30.8759 |\n|  8 | Southampton   | Third   |          1642.97  |        3526.39  |           23.2237 |         26.5748 |\n\n\nsidetable has several useful features:\n\n* See total counts and their relative percentages in one table. This is roughly equivalent to combining the\n  output of `value_counts()` and `value_counts(normalize=True)` into one table.\n* Include cumulative totals and percentages to better understand your thresholds. \n  The [Pareto principle](https://en.wikipedia.org/wiki/Pareto_principle) applies to many different scenarios\n  and this function makes it easy to see how your data is cumulatively distributed.\n* Aggregate multiple columns together to see frequency counts for grouped data.\n* Provide a threshold point above which all data is grouped into a single bucket. This is useful for\n  quickly identifying the areas to focus your analysis.\n* Get a count of the missing values in your data.\n* Count the number of unique values for each column.\n* Add grand totals on any DataFrame and subtotals to any grouped DataFrame.\n* Pretty print columns\n\n## Table of Contents:\n\n- [Quick Start](#quickstart)\n- [Rationale](#rationale)\n- [Installation](#installation)\n- [Usage](#usage)\n  - [freq](#freq)\n  - [counts](#counts)\n  - [missing](#missing)\n  - [subtotal](#subtotal)\n  - [flatten](#flatten)\n  - [prettyprint](#prettyprint)\n- [Caveats](#caveats)\n- [TODO](#todo)\n- [Contributing](#contributing)\n- [Credits](#credits)\n\n## Quickstart\nFor the impatient:\n\n```batch\n$ python -m pip install sidetable\n```\n\n```python\nimport sidetable\nimport pandas as pd\n\n# Create your DataFrame\ndf = pd.read_csv(myfile.csv)\n\n# Build a frequency table for one or more columns\ndf.stb.freq(['column1', 'column2'])\n\n# See what data is missing\ndf.stb.missing()\n\n# Group data and add a subtotal\ndf.groupby(['column1', 'column2'])['col3'].sum().stb.subtotal()\n```\nThat's it. \n\nRead on for more details and more examples of what you can do sidetable.\n\n## Rationale\nThe idea behind sidetable is that there are a handful of useful data analysis tasks that\nyou might run on any data set early in the data analysis process. While each of these\ntasks can be done in a handful of lines of pandas code, it is a lot of typing and \ndifficult to remember.\n\nIn addition to providing useful functionality, this project is also a test to see how to\nbuild custom accessors using some of pandas relatively new API. I am hopeful this can\nserve as a model for other projects whether open source or just for your own usage.\nPlease check out the [release announcement](https://pbpython.com/sidetable.html) for more\ninformation about the usage and how to use this as a model for your own projects.\n\nThe solutions in sidetable are heavily based on three sources:\n\n- This [tweet thread](https://twitter.com/pmbaumgartner/status/1235925419012087809) by Peter Baumgartner\n- An [excellent article](https://opendatascience.com/frequencies-and-chaining-in-python-pandas/)\n  by Steve Miller that lays out many of the code concepts incorporated into sidetable.\n- Ted Petrou's [post](https://medium.com/dunder-data/finding-the-percentage-of-missing-values-in-a-pandas-dataframe-a04fa00f84ab) \n  on finding the percentage of missing values in a DataFrame.\n\nI very much appreciate the work that all three authors did to point me in this direction.\n\n## Installation\n\n```batch\n\n$  python -m pip install -U sidetable\n```\n\nThis is the preferred method to install sidetable, as it will always\ninstall the most recent stable release. sidetable requires pandas 1.0 or higher and no\nadditional dependencies. It should run anywhere that pandas runs.\n\nIf you prefer to use conda, sidetable is available on conda-forge:\n\n```batch\n$ conda install -c conda-forge sidetable\n```\n\n## Usage\n```python\nimport pandas as pd\nimport sidetable\nimport seaborn as sns\n\ndf = sns.load_dataset('titanic')\n```\n\nsidetable uses the pandas DataFrame [accessor api](https://pandas.pydata.org/pandas-docs/stable/development/extending.html) \nto add a `.stb` accessor to all of your DataFrames. Once you `import sidetable` you are ready to \ngo. In these examples, I will be using seaborn's Titanic dataset as an example but\nseaborn is not a direct dependency.\n\n### freq\nIf you have used `value_counts()` before, you have probably wished it were easier to\ncombine the values with percentage distribution.\n\n```python\ndf['class'].value_counts()\n\nThird     491\nFirst     216\nSecond    184\nName: class, dtype: int64\n\ndf['class'].value_counts(normalize=True)\n\nThird     0.551066\nFirst     0.242424\nSecond    0.206510\nName: class, dtype: float64\n```\n\nWhich can be done, but is messy and a lot of typing and remembering:\n\n```python\npd.concat([df['class'].value_counts().rename('count'), \n        df['class'].value_counts(normalize=True).mul(100).rename('percentage')], axis=1)\n```\n|        |   count |   percentage |\n|:-------|--------:|-------------:|\n| Third  |     491 |      55.1066 |\n| First  |     216 |      24.2424 |\n| Second |     184 |      20.651  |\n\nUsing sidetable is much simpler and you get cumulative totals, percents and more flexibility:\n\n```python\ndf.stb.freq(['class'])\n```\n|    | class   |   count |   percent |   cumulative_count |   cumulative_percent |\n|---:|:--------|--------:|----------:|-------------------:|---------------------:|\n|  0 | Third   |     491 |   55.1066 |                491 |              55.1066 |\n|  1 | First   |     216 |   24.2424 |                707 |              79.349  |\n|  2 | Second  |     184 |   20.651  |                891 |             100      |\n\nIf you want to style the results so percentages and large numbers are easier to read, \nuse `style=True`:\n\n```python\ndf.stb.freq(['class'], style=True)\n```\n|    | class   |   count |   percent |   cumulative_count |   cumulative_percent |\n|---:|:--------|--------:|----------:|-------------------:|---------------------:|\n|  0 | Third   |     491 |  55.11%   |                491 |               55.11% |\n|  1 | First   |     216 |  24.24%   |                707 |               79.35% |\n|  2 | Second  |     184 |  20.65%   |                891 |              100.00% |\n\n\n\nIn addition, you can group columns together. If we want to see the breakdown among\nclass and sex:\n\n```python\ndf.stb.freq(['sex', 'class'])\n```\n|    | sex    | class   |   count |   percent |   cumulative_count |   cumulative_percent |\n|---:|:-------|:--------|--------:|----------:|-------------------:|---------------------:|\n|  0 | male   | Third   |     347 |  38.945   |                347 |              38.945  |\n|  1 | female | Third   |     144 |  16.1616  |                491 |              55.1066 |\n|  2 | male   | First   |     122 |  13.6925  |                613 |              68.7991 |\n|  3 | male   | Second  |     108 |  12.1212  |                721 |              80.9203 |\n|  4 | female | First   |      94 |  10.5499  |                815 |              91.4703 |\n|  5 | female | Second  |      76 |   8.52974 |                891 |             100      |\n\nYou can use as many groupings as you would like.\n\nBy default, sidetable counts the data. However, you can specify a `value` argument to \nindicate that the data should be summed based on the data in another column. \nFor this data set, we can see how the fares are distributed by class:\n\n```python\ndf.stb.freq(['class'], value='fare')\n```\n|    | class   |     fare |   percent |   cumulative_fare |   cumulative_percent |\n|---:|:--------|---------:|----------:|------------------:|---------------------:|\n|  0 | First   | 18177.4  |   63.3493 |           18177.4 |              63.3493 |\n|  1 | Third   |  6714.7  |   23.4011 |           24892.1 |              86.7504 |\n|  2 | Second  |  3801.84 |   13.2496 |           28693.9 |             100      |\n\nAnother feature of sidetable is that you can specify a threshold. For many data analysis,\nyou may want to break down into large groupings to focus on and ignore others. You can use\nthe `thresh` argument to define a threshold and group all entries above that threshold \ninto an \"other\" grouping:\n\n```python\ndf.stb.freq(['class', 'who'], value='fare', thresh=80)\n```\n|    | class   | who    |    fare |   percent |   cumulative_fare |   cumulative_percent |\n|---:|:--------|:-------|--------:|----------:|------------------:|---------------------:|\n|  0 | First   | woman  | 9492.94 |  33.0834  |           9492.94 |              33.0834 |\n|  1 | First   | man    | 7848.18 |  27.3513  |          17341.1  |              60.4348 |\n|  2 | Third   | man    | 3617.53 |  12.6073  |          20958.6  |              73.042  |\n|  3 | Second  | man    | 1886.36 |   6.57406 |          22845    |              79.6161 |\n|  4 | others  | others | 5848.95 |  20.3839  |          28693.9  |             100      |\n\nYou can further customize by specifying the label to use for all the others:\n```python\ndf.stb.freq(['class', 'who'], value='fare', thresh=80, other_label='All others')\n```\n|    | class      | who        |    fare |   percent |   cumulative_fare |   cumulative_percent |\n|---:|:-----------|:-----------|--------:|----------:|------------------:|---------------------:|\n|  0 | First      | woman      | 9492.94 |  33.0834  |           9492.94 |              33.0834 |\n|  1 | First      | man        | 7848.18 |  27.3513  |          17341.1  |              60.4348 |\n|  2 | Third      | man        | 3617.53 |  12.6073  |          20958.6  |              73.042  |\n|  3 | Second     | man        | 1886.36 |   6.57406 |          22845    |              79.6161 |\n|  4 | All others | All others | 5848.95 |  20.3839  |          28693.9  |             100      |\n\n### counts\nThe `counts()` function shows how many unique values are in each column as well as \nthe most and least frequent values \u0026 their total counts. This summary view can help you determine if you need\nto convert data to a categorical value. It can also help you understand the high \nlevel structure of your data.\n\n```python\ndf.stb.counts()\n```\n|             |   count |   unique | most_freq   |   most_freq_count | least_freq   |   least_freq_count |\n|:------------|--------:|---------:|:------------|------------------:|:-------------|-------------------:|\n| survived    |     891 |        2 | 0           |               549 | 1            |                342 |\n| sex         |     891 |        2 | male        |               577 | female       |                314 |\n| adult_male  |     891 |        2 | True        |               537 | False        |                354 |\n| alive       |     891 |        2 | no          |               549 | yes          |                342 |\n| alone       |     891 |        2 | True        |               537 | False        |                354 |\n| pclass      |     891 |        3 | 3           |               491 | 2            |                184 |\n| embarked    |     889 |        3 | S           |               644 | Q            |                 77 |\n| class       |     891 |        3 | Third       |               491 | Second       |                184 |\n| who         |     891 |        3 | man         |               537 | child        |                 83 |\n| embark_town |     889 |        3 | Southampton |               644 | Queenstown   |                 77 |\n| sibsp       |     891 |        7 | 0           |               608 | 5            |                  5 |\n| parch       |     891 |        7 | 0           |               678 | 6            |                  1 |\n| deck        |     203 |        7 | C           |                59 | G            |                  4 |\n| age         |     714 |       88 | 24.0        |                30 | 20.5         |                  1 |\n| fare        |     891 |      248 | 8.05        |                43 | 63.3583      |                  1 |\n\nBy default, all data types are included but you may use the `exclude` and `include` parameters\nto select specific types of columns. The syntax is the same as pandas \n[select_dtypes](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.select_dtypes.html)\n\nFor example,\n```python\ndf.stb.counts(exclude='number')\n```\n\n|             |   count |   unique | most_freq   |   most_freq_count | least_freq   |   least_freq_count |\n|:------------|--------:|---------:|:------------|------------------:|:-------------|-------------------:|\n| sex         |     891 |        2 | male        |               577 | female       |                314 |\n| adult_male  |     891 |        2 | True        |               537 | False        |                354 |\n| alive       |     891 |        2 | no          |               549 | yes          |                342 |\n| alone       |     891 |        2 | True        |               537 | False        |                354 |\n| embarked    |     889 |        3 | S           |               644 | Q            |                 77 |\n| class       |     891 |        3 | Third       |               491 | Second       |                184 |\n| who         |     891 |        3 | man         |               537 | child        |                 83 |\n| embark_town |     889 |        3 | Southampton |               644 | Queenstown   |                 77 |\n| deck        |     203 |        7 | C           |                59 | G            |                  4 |\n\n### missing\nsidetable also includes a summary table that shows the missing values in\nyour data by count and percentage of total missing values in a column.\n\n```python\ndf.stb.missing()\n```\n|             |   missing |   total |   percent |\n|:------------|----------:|--------:|----------:|\n| deck        |       688 |     891 | 77.2166   |\n| age         |       177 |     891 | 19.8653   |\n| embarked    |         2 |     891 |  0.224467 |\n| embark_town |         2 |     891 |  0.224467 |\n| survived    |         0 |     891 |  0        |\n| pclass      |         0 |     891 |  0        |\n| sex         |         0 |     891 |  0        |\n| sibsp       |         0 |     891 |  0        |\n| parch       |         0 |     891 |  0        |\n| fare        |         0 |     891 |  0        |\n| class       |         0 |     891 |  0        |\n| who         |         0 |     891 |  0        |\n| adult_male  |         0 |     891 |  0        |\n| alive       |         0 |     891 |  0        |\n| alone       |         0 |     891 |  0        |\n\nIf you wish to see the results with styles applied to the Percent and Total column,\nuse:\n\n```python\ndf.stb.missing(style=True)\n```\n\n|             |   missing |   total |    percent |\n|:------------|----------:|--------:|-----------:|\n| deck        |       688 |     891 | 77.22%     |\n| age         |       177 |     891 | 19.87%     |\n| embarked    |         2 |     891 | 0.22%      |\n| embark_town |         2 |     891 | 0.22%      |\n| survived    |         0 |     891 | 0          |\n| pclass      |         0 |     891 | 0          |\n| sex         |         0 |     891 | 0          |\n| sibsp       |         0 |     891 | 0          |\n| parch       |         0 |     891 | 0          |\n| fare        |         0 |     891 | 0          |\n| class       |         0 |     891 | 0          |\n| who         |         0 |     891 | 0          |\n| adult_male  |         0 |     891 | 0          |\n| alive       |         0 |     891 | 0          |\n| alone       |         0 |     891 | 0          |\n\nFinally, you can exclude the columns that have 0 missing values using\nthe `clip_0=True` parameter:\n\n```python\ndf.stb.missing(clip_0=True, style=True)\n```\n|             |   missing |   total |   percent |\n|:------------|----------:|--------:|----------:|\n| deck        |       688 |     891 | 77.22%    |\n| age         |       177 |     891 | 19.87%    |\n| embarked    |         2 |     891 |  0.22%    |\n| embark_town |         2 |     891 |  0.22%    |\n\n\n### subtotal\nAnother useful function is the subtotal function. Trying to add a subtotal \nto grouped pandas data is not easy. sidetable adds a `subtotal()` function that\nmakes adds a subtotal at one or more levels of a DataFrame.\n\nThe subtotal function can be applied to a simple DataFrame in order to add a Grand Total\nlabel:\n\n```python\ndf.stb.subtotal()\n```\n\n|             |   survived |   pclass | sex    |     age |   sibsp |   parch |     fare | embarked   | class   | who   |   adult_male | deck   | embark_town   | alive   |   alone |\n|:------------|-----------:|---------:|:-------|--------:|--------:|--------:|---------:|:-----------|:--------|:------|-------------:|:-------|:--------------|:--------|--------:|\n| 887         |          1 |        1 | female |    19   |       0 |       0 |    30    | S          | First   | woman |            0 | B      | Southampton   | yes     |       1 |\n| 888         |          0 |        3 | female |   nan   |       1 |       2 |    23.45 | S          | Third   | woman |            0 | nan    | Southampton   | no      |       0 |\n| 889         |          1 |        1 | male   |    26   |       0 |       0 |    30    | C          | First   | man   |            1 | C      | Cherbourg     | yes     |       1 |\n| 890         |          0 |        3 | male   |    32   |       0 |       0 |     7.75 | Q          | Third   | man   |            1 | nan    | Queenstown    | no      |       1 |\n| grand_total |        342 |     2057 | nan    | 21205.2 |     466 |     340 | 28693.9  | nan        | nan     | nan   |          537 | nan    | nan           | nan     |     537 |\n\nThe real power of subtotal is being able to add it to one or more levels of your \ngrouped data. For example, you can group the data and add a subtotal at each level:\n\n```python\ndf.groupby(['sex', 'class', 'embark_town']).agg({'fare': ['sum']}).stb.subtotal()\n```\n\nWhich yields this view (truncated for simplicity):\n\n\u003ctable border=\"1\" class=\"dataframe\"\u003e\n  \u003cthead\u003e\n    \u003ctr\u003e\n      \u003cth\u003e\u003c/th\u003e\n      \u003cth\u003e\u003c/th\u003e\n      \u003cth\u003e\u003c/th\u003e\n      \u003cth\u003efare\u003c/th\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n      \u003cth\u003e\u003c/th\u003e\n      \u003cth\u003e\u003c/th\u003e\n      \u003cth\u003e\u003c/th\u003e\n      \u003cth\u003esum\u003c/th\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n      \u003cth\u003esex\u003c/th\u003e\n      \u003cth\u003eclass\u003c/th\u003e\n      \u003cth\u003eembark_town\u003c/th\u003e\n      \u003cth\u003e\u003c/th\u003e\n    \u003c/tr\u003e\n  \u003c/thead\u003e\n  \u003ctbody\u003e\n    \u003ctr\u003e\n      \u003cth rowspan=\"13\" valign=\"top\"\u003efemale\u003c/th\u003e\n      \u003cth rowspan=\"4\" valign=\"top\"\u003eFirst\u003c/th\u003e\n      \u003cth\u003eCherbourg\u003c/th\u003e\n      \u003ctd\u003e4972.5333\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n      \u003cth\u003eQueenstown\u003c/th\u003e\n      \u003ctd\u003e90.0000\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n      \u003cth\u003eSouthampton\u003c/th\u003e\n      \u003ctd\u003e4753.2917\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n      \u003cth\u003efemale | First - subtotal\u003c/th\u003e\n      \u003ctd\u003e9815.8250\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n      \u003cth rowspan=\"4\" valign=\"top\"\u003eSecond\u003c/th\u003e\n      \u003cth\u003eCherbourg\u003c/th\u003e\n      \u003ctd\u003e176.8792\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n      \u003cth\u003eQueenstown\u003c/th\u003e\n      \u003ctd\u003e24.7000\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n      \u003cth\u003eSouthampton\u003c/th\u003e\n      \u003ctd\u003e1468.1500\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n      \u003cth\u003efemale | Second - subtotal\u003c/th\u003e\n      \u003ctd\u003e1669.7292\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n      \u003cth rowspan=\"4\" valign=\"top\"\u003eThird\u003c/th\u003e\n      \u003cth\u003eCherbourg\u003c/th\u003e\n      \u003ctd\u003e337.9833\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n      \u003cth\u003eQueenstown\u003c/th\u003e\n      \u003ctd\u003e340.1585\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n      \u003cth\u003eSouthampton\u003c/th\u003e\n      \u003ctd\u003e1642.9668\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n      \u003cth\u003efemale | Third - subtotal\u003c/th\u003e\n      \u003ctd\u003e2321.1086\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n      \u003cth\u003efemale - subtotal\u003c/th\u003e\n      \u003cth\u003e\u003c/th\u003e\n      \u003ctd\u003e13806.6628\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n      \u003cth rowspan=\"2\" valign=\"top\"\u003emale\u003c/th\u003e\n      \u003cth rowspan=\"2\" valign=\"top\"\u003eFirst\u003c/th\u003e\n      \u003cth\u003eCherbourg\u003c/th\u003e\n      \u003ctd\u003e3928.5417\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n      \u003cth\u003eQueenstown\u003c/th\u003e\n      \u003ctd\u003e90.0000\u003c/td\u003e\n    \u003c/tr\u003e\n  \u003c/tbody\u003e\n\u003c/table\u003e\n\nBy default, every level in the DataFrame will be subtotaled but you can control this behavior\nby using the `sub_level` argument. For instance, you can subtotal on `sex` and `class` by \npassing the argument `sub_level=[1,2]`\n\n```python\nsummary_table = df.groupby(['sex', 'class', 'embark_town']).agg({'fare': ['sum']})\nsummary_table.stb.subtotal(sub_level=[1, 2])\n```\n\nThe `subtotal` function also allows the user to configure the labels and separators used in \nthe subtotal and Grand Total by using the `grand_label`, `sub_label`, `show_sep` and `sep`\narguments. \n\n### flatten\nWhen grouping and pivoting data, you can end up with a DataFrame that has a multiindex.\nOften times, you want a simple flat representation of the data.\n\nFor example, we can build a table using a `groupby()` plus `unstack()` that looks like this:\n\n```python\ndf.groupby(['embark_town', 'class', 'sex']).agg({'fare': ['sum'], 'age': ['mean']}).unstack()\n```\n\n\u003ctable border=\"1\" class=\"dataframe\"\u003e\n  \u003cthead\u003e\n    \u003ctr\u003e\n      \u003cth\u003e\u003c/th\u003e\n      \u003cth\u003e\u003c/th\u003e\n      \u003cth colspan=\"2\" halign=\"left\"\u003efare\u003c/th\u003e\n      \u003cth colspan=\"2\" halign=\"left\"\u003eage\u003c/th\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n      \u003cth\u003e\u003c/th\u003e\n      \u003cth\u003e\u003c/th\u003e\n      \u003cth colspan=\"2\" halign=\"left\"\u003esum\u003c/th\u003e\n      \u003cth colspan=\"2\" halign=\"left\"\u003emean\u003c/th\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n      \u003cth\u003e\u003c/th\u003e\n      \u003cth\u003esex\u003c/th\u003e\n      \u003cth\u003efemale\u003c/th\u003e\n      \u003cth\u003emale\u003c/th\u003e\n      \u003cth\u003efemale\u003c/th\u003e\n      \u003cth\u003emale\u003c/th\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n      \u003cth\u003eembark_town\u003c/th\u003e\n      \u003cth\u003eclass\u003c/th\u003e\n      \u003cth\u003e\u003c/th\u003e\n      \u003cth\u003e\u003c/th\u003e\n      \u003cth\u003e\u003c/th\u003e\n      \u003cth\u003e\u003c/th\u003e\n    \u003c/tr\u003e\n  \u003c/thead\u003e\n  \u003ctbody\u003e\n    \u003ctr\u003e\n      \u003cth rowspan=\"3\" valign=\"top\"\u003eCherbourg\u003c/th\u003e\n      \u003cth\u003eFirst\u003c/th\u003e\n      \u003ctd\u003e4972.5333\u003c/td\u003e\n      \u003ctd\u003e3928.5417\u003c/td\u003e\n      \u003ctd\u003e36.052632\u003c/td\u003e\n      \u003ctd\u003e40.111111\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n      \u003cth\u003eSecond\u003c/th\u003e\n      \u003ctd\u003e176.8792\u003c/td\u003e\n      \u003ctd\u003e254.2125\u003c/td\u003e\n      \u003ctd\u003e19.142857\u003c/td\u003e\n      \u003ctd\u003e25.937500\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n      \u003cth\u003eThird\u003c/th\u003e\n      \u003ctd\u003e337.9833\u003c/td\u003e\n      \u003ctd\u003e402.1462\u003c/td\u003e\n      \u003ctd\u003e14.062500\u003c/td\u003e\n      \u003ctd\u003e25.016800\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n      \u003cth rowspan=\"3\" valign=\"top\"\u003eQueenstown\u003c/th\u003e\n      \u003cth\u003eFirst\u003c/th\u003e\n      \u003ctd\u003e90.0000\u003c/td\u003e\n      \u003ctd\u003e90.0000\u003c/td\u003e\n      \u003ctd\u003e33.000000\u003c/td\u003e\n      \u003ctd\u003e44.000000\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n      \u003cth\u003eSecond\u003c/th\u003e\n      \u003ctd\u003e24.7000\u003c/td\u003e\n      \u003ctd\u003e12.3500\u003c/td\u003e\n      \u003ctd\u003e30.000000\u003c/td\u003e\n      \u003ctd\u003e57.000000\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n      \u003cth\u003eThird\u003c/th\u003e\n      \u003ctd\u003e340.1585\u003c/td\u003e\n      \u003ctd\u003e465.0458\u003c/td\u003e\n      \u003ctd\u003e22.850000\u003c/td\u003e\n      \u003ctd\u003e28.142857\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n      \u003cth rowspan=\"3\" valign=\"top\"\u003eSouthampton\u003c/th\u003e\n      \u003cth\u003eFirst\u003c/th\u003e\n      \u003ctd\u003e4753.2917\u003c/td\u003e\n      \u003ctd\u003e4183.0458\u003c/td\u003e\n      \u003ctd\u003e32.704545\u003c/td\u003e\n      \u003ctd\u003e41.897188\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n      \u003cth\u003eSecond\u003c/th\u003e\n      \u003ctd\u003e1468.1500\u003c/td\u003e\n      \u003ctd\u003e1865.5500\u003c/td\u003e\n      \u003ctd\u003e29.719697\u003c/td\u003e\n      \u003ctd\u003e30.875889\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n      \u003cth\u003eThird\u003c/th\u003e\n      \u003ctd\u003e1642.9668\u003c/td\u003e\n      \u003ctd\u003e3526.3945\u003c/td\u003e\n      \u003ctd\u003e23.223684\u003c/td\u003e\n      \u003ctd\u003e26.574766\u003c/td\u003e\n    \u003c/tr\u003e\n  \u003c/tbody\u003e\n\u003c/table\u003e\n\nIf you wish to flatten it, use `stb.flatten()`:\n\n```python\ndf.groupby(['embark_town', 'class', 'sex']).agg({'fare': ['sum'], 'age': ['mean']}).unstack().stb.flatten()\n```\n\n|    | embark_town   | class   |   fare_sum_female |   fare_sum_male |   age_mean_female |   age_mean_male |\n|---:|:--------------|:--------|------------------:|----------------:|------------------:|----------------:|\n|  0 | Cherbourg     | First   |          4972.53  |        3928.54  |           36.0526 |         40.1111 |\n|  1 | Cherbourg     | Second  |           176.879 |         254.212 |           19.1429 |         25.9375 |\n|  2 | Cherbourg     | Third   |           337.983 |         402.146 |           14.0625 |         25.0168 |\n|  3 | Queenstown    | First   |            90     |          90     |           33      |         44      |\n|  4 | Queenstown    | Second  |            24.7   |          12.35  |           30      |         57      |\n|  5 | Queenstown    | Third   |           340.159 |         465.046 |           22.85   |         28.1429 |\n|  6 | Southampton   | First   |          4753.29  |        4183.05  |           32.7045 |         41.8972 |\n|  7 | Southampton   | Second  |          1468.15  |        1865.55  |           29.7197 |         30.8759 |\n|  8 | Southampton   | Third   |          1642.97  |        3526.39  |           23.2237 |         26.5748 |\n\nflatten will also take additional arguments:\n* Add a custom separator using the `sep` argument - `stb.flatten(sep='|')`\n* Control whether or not to reset the index using `reset` argument - `stb.flatten(reset=False)`\n* Reorganize the output levels using `levels` argument `levels=2`\n  * `levels` can also take a list of valid levels if you want to reorganize the display\n     `levels=[0,2]`\n\n```python\nfares = df.groupby(['embark_town', 'class', 'sex']).agg({'fare': ['sum'], 'age': ['mean']}).unstack()\nfares.stb.flatten(sep='|', reset=False, levels=[0,2])\n```\n\n\u003ctable border=\"1\" class=\"dataframe\"\u003e\n  \u003cthead\u003e\n    \u003ctr style=\"text-align: right;\"\u003e\n      \u003cth\u003e\u003c/th\u003e\n      \u003cth\u003e\u003c/th\u003e\n      \u003cth\u003efare|female\u003c/th\u003e\n      \u003cth\u003efare|male\u003c/th\u003e\n      \u003cth\u003efare|female\u003c/th\u003e\n      \u003cth\u003efare|male\u003c/th\u003e\n      \u003cth\u003eage|female\u003c/th\u003e\n      \u003cth\u003eage|male\u003c/th\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n      \u003cth\u003eembark_town\u003c/th\u003e\n      \u003cth\u003eclass\u003c/th\u003e\n      \u003cth\u003e\u003c/th\u003e\n      \u003cth\u003e\u003c/th\u003e\n      \u003cth\u003e\u003c/th\u003e\n      \u003cth\u003e\u003c/th\u003e\n      \u003cth\u003e\u003c/th\u003e\n      \u003cth\u003e\u003c/th\u003e\n    \u003c/tr\u003e\n  \u003c/thead\u003e\n  \u003ctbody\u003e\n    \u003ctr\u003e\n      \u003cth rowspan=\"3\" valign=\"top\"\u003eCherbourg\u003c/th\u003e\n      \u003cth\u003eFirst\u003c/th\u003e\n      \u003ctd\u003e4972.5333\u003c/td\u003e\n      \u003ctd\u003e3928.5417\u003c/td\u003e\n      \u003ctd\u003e115.640309\u003c/td\u003e\n      \u003ctd\u003e93.536707\u003c/td\u003e\n      \u003ctd\u003e36.052632\u003c/td\u003e\n      \u003ctd\u003e40.111111\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n      \u003cth\u003eSecond\u003c/th\u003e\n      \u003ctd\u003e176.8792\u003c/td\u003e\n      \u003ctd\u003e254.2125\u003c/td\u003e\n      \u003ctd\u003e25.268457\u003c/td\u003e\n      \u003ctd\u003e25.421250\u003c/td\u003e\n      \u003ctd\u003e19.142857\u003c/td\u003e\n      \u003ctd\u003e25.937500\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n      \u003cth\u003eThird\u003c/th\u003e\n      \u003ctd\u003e337.9833\u003c/td\u003e\n      \u003ctd\u003e402.1462\u003c/td\u003e\n      \u003ctd\u003e14.694926\u003c/td\u003e\n      \u003ctd\u003e9.352237\u003c/td\u003e\n      \u003ctd\u003e14.062500\u003c/td\u003e\n      \u003ctd\u003e25.016800\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n      \u003cth rowspan=\"3\" valign=\"top\"\u003eQueenstown\u003c/th\u003e\n      \u003cth\u003eFirst\u003c/th\u003e\n      \u003ctd\u003e90.0000\u003c/td\u003e\n      \u003ctd\u003e90.0000\u003c/td\u003e\n      \u003ctd\u003e90.000000\u003c/td\u003e\n      \u003ctd\u003e90.000000\u003c/td\u003e\n      \u003ctd\u003e33.000000\u003c/td\u003e\n      \u003ctd\u003e44.000000\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n      \u003cth\u003eSecond\u003c/th\u003e\n      \u003ctd\u003e24.7000\u003c/td\u003e\n      \u003ctd\u003e12.3500\u003c/td\u003e\n      \u003ctd\u003e12.350000\u003c/td\u003e\n      \u003ctd\u003e12.350000\u003c/td\u003e\n      \u003ctd\u003e30.000000\u003c/td\u003e\n      \u003ctd\u003e57.000000\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n      \u003cth\u003eThird\u003c/th\u003e\n      \u003ctd\u003e340.1585\u003c/td\u003e\n      \u003ctd\u003e465.0458\u003c/td\u003e\n      \u003ctd\u003e10.307833\u003c/td\u003e\n      \u003ctd\u003e11.924251\u003c/td\u003e\n      \u003ctd\u003e22.850000\u003c/td\u003e\n      \u003ctd\u003e28.142857\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n      \u003cth rowspan=\"3\" valign=\"top\"\u003eSouthampton\u003c/th\u003e\n      \u003cth\u003eFirst\u003c/th\u003e\n      \u003ctd\u003e4753.2917\u003c/td\u003e\n      \u003ctd\u003e4183.0458\u003c/td\u003e\n      \u003ctd\u003e99.026910\u003c/td\u003e\n      \u003ctd\u003e52.949947\u003c/td\u003e\n      \u003ctd\u003e32.704545\u003c/td\u003e\n      \u003ctd\u003e41.897188\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n      \u003cth\u003eSecond\u003c/th\u003e\n      \u003ctd\u003e1468.1500\u003c/td\u003e\n      \u003ctd\u003e1865.5500\u003c/td\u003e\n      \u003ctd\u003e21.912687\u003c/td\u003e\n      \u003ctd\u003e19.232474\u003c/td\u003e\n      \u003ctd\u003e29.719697\u003c/td\u003e\n      \u003ctd\u003e30.875889\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n      \u003cth\u003eThird\u003c/th\u003e\n      \u003ctd\u003e1642.9668\u003c/td\u003e\n      \u003ctd\u003e3526.3945\u003c/td\u003e\n      \u003ctd\u003e18.670077\u003c/td\u003e\n      \u003ctd\u003e13.307149\u003c/td\u003e\n      \u003ctd\u003e23.223684\u003c/td\u003e\n      \u003ctd\u003e26.574766\u003c/td\u003e\n    \u003c/tr\u003e\n  \u003c/tbody\u003e\n\u003c/table\u003e\n\n### prettyprint\nThis function interprets the magnitude of your numeric results and returns a nicely\nformatted version of all the numbers. This can be used on a full DataFrame or during\nyour analysis of aggregated data.\n\nFor instance, if you are summarizing data, you may get something that looks like this:\n\n```python\ndf.groupby(['pclass', 'sex']).agg({'fare': 'sum'})\n```\n\u003ctable border=\"1\" class=\"dataframe\"\u003e\n  \u003cthead\u003e\n    \u003ctr style=\"text-align: right;\"\u003e\n      \u003cth\u003e\u003c/th\u003e\n      \u003cth\u003e\u003c/th\u003e\n      \u003cth\u003efare\u003c/th\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n      \u003cth\u003epclass\u003c/th\u003e\n      \u003cth\u003esex\u003c/th\u003e\n      \u003cth\u003e\u003c/th\u003e\n    \u003c/tr\u003e\n  \u003c/thead\u003e\n  \u003ctbody\u003e\n    \u003ctr\u003e\n      \u003cth rowspan=\"2\" valign=\"top\"\u003e1\u003c/th\u003e\n      \u003cth\u003efemale\u003c/th\u003e\n      \u003ctd\u003e9975.8250\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n      \u003cth\u003emale\u003c/th\u003e\n      \u003ctd\u003e8201.5875\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n      \u003cth rowspan=\"2\" valign=\"top\"\u003e2\u003c/th\u003e\n      \u003cth\u003efemale\u003c/th\u003e\n      \u003ctd\u003e1669.7292\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n      \u003cth\u003emale\u003c/th\u003e\n      \u003ctd\u003e2132.1125\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n      \u003cth rowspan=\"2\" valign=\"top\"\u003e3\u003c/th\u003e\n      \u003cth\u003efemale\u003c/th\u003e\n      \u003ctd\u003e2321.1086\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n      \u003cth\u003emale\u003c/th\u003e\n      \u003ctd\u003e4393.5865\u003c/td\u003e\n    \u003c/tr\u003e\n  \u003c/tbody\u003e\n\u003c/table\u003e\n\nUse `stb.pretty()` to format it nicely so you can have the same order or magnitude for all numbers:\n\n```python\ndf.groupby(['pclass', 'sex']).agg({'fare': 'sum'}).div(df['fare'].sum()).stb.pretty()\n```\n\u003ctable id=\"T_1e94c\"\u003e\n  \u003cthead\u003e\n    \u003ctr\u003e\n      \u003cth class=\"blank\" \u003e\u0026nbsp;\u003c/th\u003e\n      \u003cth class=\"blank level0\" \u003e\u0026nbsp;\u003c/th\u003e\n      \u003cth id=\"T_1e94c_level0_col0\" class=\"col_heading level0 col0\" \u003efare\u003c/th\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n      \u003cth class=\"index_name level0\" \u003epclass\u003c/th\u003e\n      \u003cth class=\"index_name level1\" \u003esex\u003c/th\u003e\n      \u003cth class=\"blank col0\" \u003e\u0026nbsp;\u003c/th\u003e\n    \u003c/tr\u003e\n  \u003c/thead\u003e\n  \u003ctbody\u003e\n    \u003ctr\u003e\n      \u003cth id=\"T_1e94c_level0_row0\" class=\"row_heading level0 row0\" rowspan=\"2\"\u003e1\u003c/th\u003e\n      \u003cth id=\"T_1e94c_level1_row0\" class=\"row_heading level1 row0\" \u003efemale\u003c/th\u003e\n      \u003ctd id=\"T_1e94c_row0_col0\" class=\"data row0 col0\" \u003e9.98k\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n      \u003cth id=\"T_1e94c_level1_row1\" class=\"row_heading level1 row1\" \u003emale\u003c/th\u003e\n      \u003ctd id=\"T_1e94c_row1_col0\" class=\"data row1 col0\" \u003e8.20k\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n      \u003cth id=\"T_1e94c_level0_row2\" class=\"row_heading level0 row2\" rowspan=\"2\"\u003e2\u003c/th\u003e\n      \u003cth id=\"T_1e94c_level1_row2\" class=\"row_heading level1 row2\" \u003efemale\u003c/th\u003e\n      \u003ctd id=\"T_1e94c_row2_col0\" class=\"data row2 col0\" \u003e1.67k\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n      \u003cth id=\"T_1e94c_level1_row3\" class=\"row_heading level1 row3\" \u003emale\u003c/th\u003e\n      \u003ctd id=\"T_1e94c_row3_col0\" class=\"data row3 col0\" \u003e2.13k\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n      \u003cth id=\"T_1e94c_level0_row4\" class=\"row_heading level0 row4\" rowspan=\"2\"\u003e3\u003c/th\u003e\n      \u003cth id=\"T_1e94c_level1_row4\" class=\"row_heading level1 row4\" \u003efemale\u003c/th\u003e\n      \u003ctd id=\"T_1e94c_row4_col0\" class=\"data row4 col0\" \u003e2.32k\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n      \u003cth id=\"T_1e94c_level1_row5\" class=\"row_heading level1 row5\" \u003emale\u003c/th\u003e\n      \u003ctd id=\"T_1e94c_row5_col0\" class=\"data row5 col0\" \u003e4.39k\u003c/td\u003e\n    \u003c/tr\u003e\n  \u003c/tbody\u003e\n\u003c/table\u003e\n\nHere's an example of a percentage format:\n\n```python\ndf.groupby(['pclass', 'sex']).agg({'fare': 'sum'}).div(df['fare'].sum()).stb.pretty(precision=0, caption=\"Fare Percentage\")\n```\n\n\u003ctable id=\"T_e031b\"\u003e\n\u003ccaption\u003eFare Percentage\u003c/caption\u003e\n  \u003cthead\u003e\n    \u003ctr\u003e\n      \u003cth class=\"blank\" \u003e\u0026nbsp;\u003c/th\u003e\n      \u003cth class=\"blank level0\" \u003e\u0026nbsp;\u003c/th\u003e\n      \u003cth id=\"T_e031b_level0_col0\" class=\"col_heading level0 col0\" \u003efare\u003c/th\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n      \u003cth class=\"index_name level0\" \u003epclass\u003c/th\u003e\n      \u003cth class=\"index_name level1\" \u003esex\u003c/th\u003e\n      \u003cth class=\"blank col0\" \u003e\u0026nbsp;\u003c/th\u003e\n    \u003c/tr\u003e\n  \u003c/thead\u003e\n  \u003ctbody\u003e\n    \u003ctr\u003e\n      \u003cth id=\"T_e031b_level0_row0\" class=\"row_heading level0 row0\" rowspan=\"2\"\u003e1\u003c/th\u003e\n      \u003cth id=\"T_e031b_level1_row0\" class=\"row_heading level1 row0\" \u003efemale\u003c/th\u003e\n      \u003ctd id=\"T_e031b_row0_col0\" class=\"data row0 col0\" \u003e35%\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n      \u003cth id=\"T_e031b_level1_row1\" class=\"row_heading level1 row1\" \u003emale\u003c/th\u003e\n      \u003ctd id=\"T_e031b_row1_col0\" class=\"data row1 col0\" \u003e29%\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n      \u003cth id=\"T_e031b_level0_row2\" class=\"row_heading level0 row2\" rowspan=\"2\"\u003e2\u003c/th\u003e\n      \u003cth id=\"T_e031b_level1_row2\" class=\"row_heading level1 row2\" \u003efemale\u003c/th\u003e\n      \u003ctd id=\"T_e031b_row2_col0\" class=\"data row2 col0\" \u003e6%\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n      \u003cth id=\"T_e031b_level1_row3\" class=\"row_heading level1 row3\" \u003emale\u003c/th\u003e\n      \u003ctd id=\"T_e031b_row3_col0\" class=\"data row3 col0\" \u003e7%\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n      \u003cth id=\"T_e031b_level0_row4\" class=\"row_heading level0 row4\" rowspan=\"2\"\u003e3\u003c/th\u003e\n      \u003cth id=\"T_e031b_level1_row4\" class=\"row_heading level1 row4\" \u003efemale\u003c/th\u003e\n      \u003ctd id=\"T_e031b_row4_col0\" class=\"data row4 col0\" \u003e8%\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n      \u003cth id=\"T_e031b_level1_row5\" class=\"row_heading level1 row5\" \u003emale\u003c/th\u003e\n      \u003ctd id=\"T_e031b_row5_col0\" class=\"data row5 col0\" \u003e15%\u003c/td\u003e\n    \u003c/tr\u003e\n  \u003c/tbody\u003e\n\u003c/table\u003e\n\n\nBehind the scenes, `pretty` will attempt to normalize the values. You can control the\n`precision`, `rows` add a `caption`.\n\n\n## Caveats\nsidetable supports grouping on any data type in a pandas DataFrame. This means that\nyou could try something like:\n\n```python\ndf.stb.freq(['fare'])\n```\nIn some cases where there are a fairly small discrete number of this may be useful. However,\nif you have a lot of unique values, you should [bin the data](https://pbpython.com/pandas-qcut-cut.html)\nfirst. In the example, above the data would include 248 rows and not be terribly useful.\n\nOne alternative could be:\n\n```python\ndf['fare_bin'] = pd.qcut(df['fare'], q=4, labels=['low', 'medium', 'high', 'x-high'])\ndf.stb.freq(['fare_bin'])\n```\n|    | fare_bin   |   count |   percent |   cumulative_count |   cumulative_percent |\n|---:|:-----------|--------:|----------:|-------------------:|---------------------:|\n|  0 | medium     |     224 |   25.1403 |                224 |              25.1403 |\n|  1 | low        |     223 |   25.0281 |                447 |              50.1684 |\n|  2 | x-high     |     222 |   24.9158 |                669 |              75.0842 |\n|  3 | high       |     222 |   24.9158 |                891 |             100      |\n\nThe other caveat is that null or missing values can cause data to drop out while aggregating.\nFor instance, if we look at the `deck` variable, there are a lot of missing values.\n\n```python\ndf.stb.freq(['deck'])\n```\n|    | deck   |   count |   percent |   cumulative_count |   cumulative_percent |\n|---:|:-------|--------:|----------:|-------------------:|---------------------:|\n|  0 | C      |      59 |  29.064   |                 59 |              29.064  |\n|  1 | B      |      47 |  23.1527  |                106 |              52.2167 |\n|  2 | D      |      33 |  16.2562  |                139 |              68.4729 |\n|  3 | E      |      32 |  15.7635  |                171 |              84.2365 |\n|  4 | A      |      15 |   7.38916 |                186 |              91.6256 |\n|  5 | F      |      13 |   6.40394 |                199 |              98.0296 |\n|  6 | G      |       4 |   1.97044 |                203 |             100      |\n\n\nThe total cumulative count only goes up to 203 not the 891 we have seen in other examples.\nFuture versions of sidetable may handle this differently. For now, it is up to you to \ndecide how best to handle unknowns. For example, this version of the Titanic data set\nhas a categorical value for `deck` so using `fillna` requires an extra step:\n\n```python\ndf['deck_fillna'] = df['deck'].cat.add_categories('UNK').fillna('UNK')\ndf.stb.freq(['deck_fillna'])\n```\n|    | deck_fillna   |   count |   percent |   cumulative_count |   cumulative_percent |\n|---:|:--------------|--------:|----------:|-------------------:|---------------------:|\n|  0 | UNK           |     688 | 77.2166   |                688 |              77.2166 |\n|  1 | C             |      59 |  6.62177  |                747 |              83.8384 |\n|  2 | B             |      47 |  5.27497  |                794 |              89.1134 |\n|  3 | D             |      33 |  3.7037   |                827 |              92.8171 |\n|  4 | E             |      32 |  3.59147  |                859 |              96.4085 |\n|  5 | A             |      15 |  1.6835   |                874 |              98.092  |\n|  6 | F             |      13 |  1.45903  |                887 |              99.5511 |\n|  7 | G             |       4 |  0.448934 |                891 |             100      |\n\nAnother variant is that there might be certain groupings where there are no valid counts.\n\nFor instance, if we look at the `deck` and `class`:\n\n```python\ndf.stb.freq(['deck', 'class'])\n```\n|    | deck   | class   |   count |   percent |   cumulative_count |   cumulative_percent |\n|---:|:-------|:--------|--------:|----------:|-------------------:|---------------------:|\n|  0 | C      | First   |      59 |  29.064   |                 59 |              29.064  |\n|  1 | B      | First   |      47 |  23.1527  |                106 |              52.2167 |\n|  2 | D      | First   |      29 |  14.2857  |                135 |              66.5025 |\n|  3 | E      | First   |      25 |  12.3153  |                160 |              78.8177 |\n|  4 | A      | First   |      15 |   7.38916 |                175 |              86.2069 |\n|  5 | F      | Second  |       8 |   3.94089 |                183 |              90.1478 |\n|  6 | F      | Third   |       5 |   2.46305 |                188 |              92.6108 |\n|  7 | G      | Third   |       4 |   1.97044 |                192 |              94.5813 |\n|  8 | E      | Second  |       4 |   1.97044 |                196 |              96.5517 |\n|  9 | D      | Second  |       4 |   1.97044 |                200 |              98.5222 |\n| 10 | E      | Third   |       3 |   1.47783 |                203 |             100      |\n\n\nThere are only 11 combinations. If we want to see all - even if there are not any passengers\nfitting that criteria, use `clip_0=False` \n\n```python\ndf.stb.freq(['deck', 'class'], clip_0=False)\n```\n|    | deck   | class   |   count |   percent |   cumulative_count |   cumulative_percent |\n|---:|:-------|:--------|--------:|----------:|-------------------:|---------------------:|\n|  0 | C      | First   |      59 |  29.064   |                 59 |              29.064  |\n|  1 | B      | First   |      47 |  23.1527  |                106 |              52.2167 |\n|  2 | D      | First   |      29 |  14.2857  |                135 |              66.5025 |\n|  3 | E      | First   |      25 |  12.3153  |                160 |              78.8177 |\n|  4 | A      | First   |      15 |   7.38916 |                175 |              86.2069 |\n|  5 | F      | Second  |       8 |   3.94089 |                183 |              90.1478 |\n|  6 | F      | Third   |       5 |   2.46305 |                188 |              92.6108 |\n|  7 | G      | Third   |       4 |   1.97044 |                192 |              94.5813 |\n|  8 | E      | Second  |       4 |   1.97044 |                196 |              96.5517 |\n|  9 | D      | Second  |       4 |   1.97044 |                200 |              98.5222 |\n| 10 | E      | Third   |       3 |   1.47783 |                203 |             100      |\n| 11 | G      | Second  |       0 |   0       |                203 |             100      |\n| 12 | G      | First   |       0 |   0       |                203 |             100      |\n| 13 | F      | First   |       0 |   0       |                203 |             100      |\n| 14 | D      | Third   |       0 |   0       |                203 |             100      |\n| 15 | C      | Third   |       0 |   0       |                203 |             100      |\n| 16 | C      | Second  |       0 |   0       |                203 |             100      |\n| 17 | B      | Third   |       0 |   0       |                203 |             100      |\n| 18 | B      | Second  |       0 |   0       |                203 |             100      |\n| 19 | A      | Third   |       0 |   0       |                203 |             100      |\n| 20 | A      | Second  |       0 |   0       |                203 |             100      |\n\nIn many cases this might be too much data, but sometimes the fact that a combination is \nmissing could be insightful.\n\nThe final caveat relates to `subtotal`. When working with the `subtotal` function, sidetable \nconvert a Categorical MultiIndex to a plain index in order to easily add the subtotal labels.\n\n## TODO\n\n- [ ] Handle NaN values more effectively\n- [ ] Offer binning options for continuous variables\n- [ ] Offer more options, maybe plotting?\n\n\n## Contributing\n\nContributions are welcome, and they are greatly appreciated! Every\nlittle bit helps, and credit will always be given. If you have a new idea for a simple table\nthat we should add, please submit a ticket.\n\nFor more info please click [here](./CONTRIBUTING.md)\n\n## Credits\n\nThis package was created with Cookiecutter and the `oldani/cookiecutter-simple-pypackage` project template.\nThe code used in this package is heavily based on the posts from Peter Baumgartner, Steve Miller\nand Ted Petrou. Thank you!\n\n- [Cookiecutter](https://github.com/audreyr/cookiecutter)\n- [oldani/cookiecutter-simple-pypackage](https://github.com/oldani/cookiecutter-simple-pypackage)\n- Peter Baumgartner - [tweet thread](https://twitter.com/pmbaumgartner/status/1235925419012087809)\n- Steve Miller - [article](https://opendatascience.com/frequencies-and-chaining-in-python-pandas/)\n- Ted Petrou - [post](https://medium.com/dunder-data/finding-the-percentage-of-missing-values-in-a-pandas-dataframe-a04fa00f84ab)","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fchris1610%2Fsidetable","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fchris1610%2Fsidetable","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fchris1610%2Fsidetable/lists"}