https://github.com/Quantco/dataframely

A declarative, 🐻‍❄️-native data frame validation library.
https://github.com/Quantco/dataframely

dataframe polars validation

Last synced: 7 months ago
JSON representation

A declarative, 🐻‍❄️-native data frame validation library.

Host: GitHub
URL: https://github.com/Quantco/dataframely
Owner: Quantco
License: bsd-3-clause
Created: 2025-04-17T13:06:59.000Z (8 months ago)
Default Branch: main
Last Pushed: 2025-04-25T13:23:26.000Z (7 months ago)
Last Synced: 2025-04-25T14:34:54.841Z (7 months ago)
Topics: dataframe, polars, validation
Language: Python
Homepage: https://dataframely.readthedocs.io
Size: 170 KB
Stars: 136
Watchers: 10
Forks: 3
Open Issues: 4
Metadata Files:
- Readme: README.md
- License: LICENSE
- Codeowners: .github/CODEOWNERS

Awesome Lists containing this project

awesome-polars - dataframely - Polars plugin that provides schema and other rule validation for Polars DataFrames by [@Quantco](https://github.com/Quantco). (Libraries/Packages/Scripts / Polars plugins)
trackawesomelist - dataframely (⭐139) - Polars plugin that provides schema and other rule validation for Polars DataFrames by [@Quantco](https://github.com/Quantco). (Recently Updated / [Apr 28, 2025](/content/2025/04/28/README.md))

README

          





  


  dataframely — A declarative, 🐻‍❄️-native data frame validation library

  


[![CI](https://img.shields.io/github/actions/workflow/status/quantco/dataframely/ci.yml?style=flat-square&branch=main)](https://github.com/quantco/dataframely/actions/workflows/ci.yml)

[![conda-forge](https://img.shields.io/conda/vn/conda-forge/dataframely?logoColor=white&logo=conda-forge&style=flat-square)](https://prefix.dev/channels/conda-forge/packages/dataframely)

[![pypi-version](https://img.shields.io/pypi/v/dataframely.svg?logo=pypi&logoColor=white&style=flat-square)](https://pypi.org/project/dataframely)

[![python-version](https://img.shields.io/pypi/pyversions/dataframely?logoColor=white&logo=python&style=flat-square)](https://pypi.org/project/dataframely)

[![codecov](https://codecov.io/gh/Quantco/dataframely/graph/badge.svg)](https://codecov.io/gh/Quantco/dataframely)



## 🗂 Table of Contents

- [Introduction](#-introduction)

- [Installation](#-installation)

- [Usage](#-usage)

## 📖 Introduction

Dataframely is a Python package to validate the schema and content of [`polars`](https://pola.rs/) data frames. Its

purpose is to make data pipelines more robust by ensuring that data meets expectations and more readable by adding

schema information to data frame type hints.

## 💿 Installation

You can install `dataframely` using your favorite package manager, e.g., `pixi` or `pip`:

```bash

pixi add dataframely

pip install dataframely

```

## 🎯 Usage

### Defining a data frame schema

```python

import dataframely as dy

import polars as pl

class HouseSchema(dy.Schema):

    zip_code = dy.String(nullable=False, min_length=3)

    num_bedrooms = dy.UInt8(nullable=False)

    num_bathrooms = dy.UInt8(nullable=False)

    price = dy.Float64(nullable=False)

    @dy.rule()

    def reasonable_bathroom_to_bedrooom_ratio() -> pl.Expr:

        ratio = pl.col("num_bathrooms") / pl.col("num_bedrooms")

        return (ratio >= 1 / 3) & (ratio <= 3)

    @dy.rule(group_by=["zip_code"])

    def minimum_zip_code_count() -> pl.Expr:

        return pl.len() >= 2

```

### Validating data against schema

```python

import polars as pl

df = pl.DataFrame({

    "zip_code": ["01234", "01234", "1", "213", "123", "213"],

    "num_bedrooms": [2, 2, 1, None, None, 2],

    "num_bathrooms": [1, 2, 1, 1, 0, 8],

    "price": [100_000, 110_000, 50_000, 80_000, 60_000, 160_000]

})

# Validate the data and cast columns to expected types

validated_df: dy.DataFrame[HouseSchema] = HouseSchema.validate(df, cast=True)

```

See more advanced usage examples in the [documentation](https://dataframely.readthedocs.io/en/latest/).

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/Quantco/dataframely

Awesome Lists containing this project

README

`dataframely` — A declarative, 🐻‍❄️-native data frame validation library