https://github.com/Quantco/dataframely
A declarative, 🐻❄️-native data frame validation library.
https://github.com/Quantco/dataframely
dataframe polars validation
Last synced: 6 months ago
JSON representation
A declarative, 🐻❄️-native data frame validation library.
- Host: GitHub
- URL: https://github.com/Quantco/dataframely
- Owner: Quantco
- License: bsd-3-clause
- Created: 2025-04-17T13:06:59.000Z (7 months ago)
- Default Branch: main
- Last Pushed: 2025-04-25T13:23:26.000Z (6 months ago)
- Last Synced: 2025-04-25T14:34:54.841Z (6 months ago)
- Topics: dataframe, polars, validation
- Language: Python
- Homepage: https://dataframely.readthedocs.io
- Size: 170 KB
- Stars: 136
- Watchers: 10
- Forks: 3
- Open Issues: 4
-
Metadata Files:
- Readme: README.md
- License: LICENSE
- Codeowners: .github/CODEOWNERS
Awesome Lists containing this project
- awesome-polars - dataframely - Polars plugin that provides schema and other rule validation for Polars DataFrames by [@Quantco](https://github.com/Quantco). (Libraries/Packages/Scripts / Polars plugins)
- trackawesomelist - dataframely (⭐139) - Polars plugin that provides schema and other rule validation for Polars DataFrames by [@Quantco](https://github.com/Quantco). (Recently Updated / [Apr 28, 2025](/content/2025/04/28/README.md))
README
dataframely — A declarative, 🐻❄️-native data frame validation library
[](https://github.com/quantco/dataframely/actions/workflows/ci.yml)
[](https://prefix.dev/channels/conda-forge/packages/dataframely)
[](https://pypi.org/project/dataframely)
[](https://pypi.org/project/dataframely)
[](https://codecov.io/gh/Quantco/dataframely)
## 🗂 Table of Contents
- [Introduction](#-introduction)
- [Installation](#-installation)
- [Usage](#-usage)
## 📖 Introduction
Dataframely is a Python package to validate the schema and content of [`polars`](https://pola.rs/) data frames. Its
purpose is to make data pipelines more robust by ensuring that data meets expectations and more readable by adding
schema information to data frame type hints.
## 💿 Installation
You can install `dataframely` using your favorite package manager, e.g., `pixi` or `pip`:
```bash
pixi add dataframely
pip install dataframely
```
## 🎯 Usage
### Defining a data frame schema
```python
import dataframely as dy
import polars as pl
class HouseSchema(dy.Schema):
zip_code = dy.String(nullable=False, min_length=3)
num_bedrooms = dy.UInt8(nullable=False)
num_bathrooms = dy.UInt8(nullable=False)
price = dy.Float64(nullable=False)
@dy.rule()
def reasonable_bathroom_to_bedrooom_ratio() -> pl.Expr:
ratio = pl.col("num_bathrooms") / pl.col("num_bedrooms")
return (ratio >= 1 / 3) & (ratio <= 3)
@dy.rule(group_by=["zip_code"])
def minimum_zip_code_count() -> pl.Expr:
return pl.len() >= 2
```
### Validating data against schema
```python
import polars as pl
df = pl.DataFrame({
"zip_code": ["01234", "01234", "1", "213", "123", "213"],
"num_bedrooms": [2, 2, 1, None, None, 2],
"num_bathrooms": [1, 2, 1, 1, 0, 8],
"price": [100_000, 110_000, 50_000, 80_000, 60_000, 160_000]
})
# Validate the data and cast columns to expected types
validated_df: dy.DataFrame[HouseSchema] = HouseSchema.validate(df, cast=True)
```
See more advanced usage examples in the [documentation](https://dataframely.readthedocs.io/en/latest/).