https://github.com/Quantco/diffly
Utility package for comparing polars data frames.
https://github.com/Quantco/diffly
comparison-tool dataframe polars
Last synced: about 1 month ago
JSON representation
Utility package for comparing polars data frames.
- Host: GitHub
- URL: https://github.com/Quantco/diffly
- Owner: Quantco
- License: bsd-3-clause
- Created: 2026-01-29T15:49:13.000Z (3 months ago)
- Default Branch: main
- Last Pushed: 2026-03-27T09:40:53.000Z (about 2 months ago)
- Last Synced: 2026-03-27T19:55:02.037Z (about 2 months ago)
- Topics: comparison-tool, dataframe, polars
- Language: Python
- Homepage: https://diffly.readthedocs.io/stable/
- Size: 265 KB
- Stars: 26
- Watchers: 1
- Forks: 0
- Open Issues: 11
-
Metadata Files:
- Readme: README.md
- License: LICENSE
- Codeowners: .github/CODEOWNERS
- Security: SECURITY.md
- Agents: AGENTS.md
Awesome Lists containing this project
- awesome-polars - diffly - Python utility for programatically identifying differences between Polars DataFrames including schema differences, row-level mismatches, and column value changes by [@Quantco](https://github.com/Quantco). (Libraries/Packages/Scripts / Polars plugins)
README
diffly — A utility package for comparing 🐻❄️ DataFrames
[](https://github.com/Quantco/diffly/actions/workflows/ci.yml)
[](https://prefix.dev/channels/conda-forge/packages/diffly)
[](https://pypi.org/project/diffly)
[](https://pypi.org/project/diffly)
[](https://codecov.io/gh/Quantco/diffly)
## 🗂 Table of Contents
- [Introduction](#-introduction)
- [Installation](#-installation)
- [Usage](#-usage)
## 📖 Introduction
Diffly is a Python package for comparing [Polars](https://pola.rs/) DataFrames with detailed analysis capabilities. It identifies differences between datasets including schema differences, row-level mismatches, missing rows, and column value changes.
## 💿 Installation
You can install `diffly` using your favorite package manager:
```bash
pixi add diffly
conda install diffly
uv add diffly
pip install diffly
```
## 🎯 Usage
```python
import polars as pl
from diffly import compare_frames
left = pl.DataFrame({
"id": ["a", "b", "c"],
"value": [1.0, 2.0, 3.0],
})
right = pl.DataFrame({
"id": ["a", "b", "d"],
"value": [1.0, 2.5, 4.0],
})
comparison = compare_frames(left, right, primary_key="id")
if not comparison.equal():
summary = comparison.summary(
top_k_column_changes=1,
show_sample_primary_key_per_change=True
)
print(summary)
```
```
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Diffly Summary ┃
┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛
Primary key: id
Schemas
▔▔▔▔▔▔▔
Schemas match exactly (column count: 2).
Rows
▔▔▔▔
Left count Right count
3 (no change) 3
┏━┯━┯━┯━┯━┓
┃-│-│-│-│-┃ 1 left only (33.33%)
┠─┼─┼─┼─┼─┨╌╌╌┏━┯━┯━┯━┯━┓╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╮
┃ │ │ │ │ ┃ = ┃ │ │ │ │ ┃ 1 equal (50.00%) │
┠─┼─┼─┼─┼─┨╌╌╌┠─┼─┼─┼─┼─┨╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌├╴ 2 joined
┃ │ │ │ │ ┃ ≠ ┃ │ │ │ │ ┃ 1 unequal (50.00%) │
┗━┷━┷━┷━┷━┛╌╌╌┠─┼─┼─┼─┼─┨╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╯
┃+│+│+│+│+┃ 1 right only (33.33%)
┗━┷━┷━┷━┷━┛
Columns
▔▔▔▔▔▔▔
┌───────┬────────┬───────────────────────────┐
│ value │ 50.00% │ 2.0 -> 2.5 (1x, e.g. "b") │
└───────┴────────┴───────────────────────────┘
```
See more examples in the [documentation](https://diffly.readthedocs.io/stable/).