Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/drhagen/tabeline

User-friendly data frame and data grammar library for Python
https://github.com/drhagen/tabeline

data-grammar data-table dplyr

Last synced: about 2 months ago
JSON representation

User-friendly data frame and data grammar library for Python

Host: GitHub
URL: https://github.com/drhagen/tabeline
Owner: drhagen
License: mit
Created: 2022-02-12T20:52:33.000Z (almost 3 years ago)
Default Branch: master
Last Pushed: 2024-08-19T18:23:38.000Z (4 months ago)
Last Synced: 2024-09-18T00:06:10.744Z (3 months ago)
Topics: data-grammar, data-table, dplyr
Language: Python
Homepage: https://tabeline.drhagen.com
Size: 1.61 MB
Stars: 14
Watchers: 2
Forks: 1
Open Issues: 0
Metadata Files:
- Readme: README.md
- Contributing: docs/contributing.md
- License: LICENSE

Awesome Lists containing this project

README

# Tabeline

Tabeline is a data frame and data grammar library. You write the expressions in strings and supply them to methods on the `DataFrame` class. The strings are parsed by Parsita and converted into Polars for execution.

Tabeline draws inspiration from dplyr, the data grammar of R's tidyverse, especially for its methods names. The `filter`, `mutate`, `group_by`, and `summarize` methods should all feel familiar. But Tabeline is as proper a Python library as can be, using methods instead of pipes, like is standard in R.

Tabeline uses Polars under the hood, but adds a lot of handling of edge cases from Polars, which otherwise result in crashes or behavior that is not type stable.

See the [Documentation](https://tabeline.drhagen.com) for the full user guide.

## Installation

It is recommended to install Tabeline from PyPI using `pip`.

```shell
pip install tabeline
```

## Motivating example

```python
from tabeline import DataFrame

# Construct a data frame using clean syntax
# from_csv, from_pandas, and from_polars are also available
df = DataFrame(
id=[0, 0, 0, 0, 1, 1, 1, 1, 1],
t=[0, 6, 12, 24, 0, 6, 12, 24, 48],
y=[0, 2, 3, 1, 0, 4, 3, 2, 1],
)

# Use data grammar methods and string expressions to define
# transformed data frames
analysis = (
df
.filter("t <= 24")
.group_by("id")
.summarize(auc="trapz(t, y)")
)

print(analysis)
# shape: (2, 2)
# ┌─────┬──────┐
# │ id ┆ auc │
# │ --- ┆ --- │
# │ i64 ┆ f64 │
# ╞═════╪══════╡
# │ 0 ┆ 45.0 │
# ├╌╌╌╌╌┼╌╌╌╌╌╌┤
# │ 1 ┆ 63.0 │
# └─────┴──────┘
```