
An open API service indexing awesome lists of open source software.

Pandas and Spark DataFrame comparison for humans and more!

compare dask data data-science dataframes fugue numpy pandas polars pyspark python spark

Last synced: about 2 months ago
JSON representation

Pandas and Spark DataFrame comparison for humans and more!




# DataComPy

![PyPI - Python Version](
[![Code style: black](](
[![PyPI version](](
[![Anaconda-Server Badge](](
![PyPI - Downloads](

DataComPy is a package to compare two Pandas DataFrames. Originally started to
be something of a replacement for SAS's ``PROC COMPARE`` for Pandas DataFrames
with some more functionality than just ``Pandas.DataFrame.equals(Pandas.DataFrame)``
(in that it prints out some stats, and lets you tweak how accurate matches have to be).
Then extended to carry that functionality over to Spark Dataframes.

## Quick Installation

pip install datacompy


conda install datacompy

### Installing extras

If you would like to use Spark or any other backends please make sure you install via extras:

pip install datacompy[spark]
pip install datacompy[dask]
pip install datacompy[duckdb]
pip install datacompy[polars]
pip install datacompy[ray]


## Supported backends

- Pandas: ([See documentation](
- Spark: ([See documentation](
- Polars (Experimental): ([See documentation](
- Fugue is a Python library that provides a unified interface for data processing on Pandas, DuckDB, Polars, Arrow,
Spark, Dask, Ray, and many other backends. DataComPy integrates with Fugue to provide a simple way to compare data
across these backends. Please note that Fugue will use the Pandas (Native) logic at its lowest level
([See documentation](

## Contributors

We welcome and appreciate your contributions! Before we can accept any contributions, we ask that you please be sure to
sign the [Contributor License Agreement (CLA)](

This project adheres to the [Open Source Code of Conduct](
By participating, you are expected to honor this code.

## Roadmap

Roadmap details can be found [here](