Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/capitalone/datacompy
Pandas and Spark DataFrame comparison for humans and more!
https://github.com/capitalone/datacompy
compare dask data data-science dataframes fugue numpy pandas polars pyspark python spark
Last synced: about 2 months ago
JSON representation
Pandas and Spark DataFrame comparison for humans and more!
- Host: GitHub
- URL: https://github.com/capitalone/datacompy
- Owner: capitalone
- License: apache-2.0
- Created: 2018-03-23T13:16:03.000Z (about 6 years ago)
- Default Branch: develop
- Last Pushed: 2024-03-25T18:57:19.000Z (about 2 months ago)
- Last Synced: 2024-03-26T15:54:33.357Z (about 2 months ago)
- Topics: compare, dask, data, data-science, dataframes, fugue, numpy, pandas, polars, pyspark, python, spark
- Language: Python
- Homepage: https://capitalone.github.io/datacompy/
- Size: 8.83 MB
- Stars: 352
- Watchers: 25
- Forks: 119
- Open Issues: 12
-
Metadata Files:
- Readme: README.md
- License: LICENSE
- Codeowners: CODEOWNERS
- Roadmap: ROADMAP.rst
Lists
- awesome-stars - capitalone/datacompy - Pandas and Spark DataFrame comparison for humans and more! (Python)
- awesome-machine-learning - DataComPy - A library to compare Pandas, Polars, and Spark data frames. It provides stats and lets users adjust for match accuracy. (Python / General-Purpose Machine Learning)
- awesome-data-engineering - datacompy - DataComPy is a Python library that facilitates the comparison of two DataFrames in pandas, Polars, Spark and more. The library goes beyond basic equality checks by providing detailed insights into discrepancies at both row and column levels. (Data Comparison)
- trackawesomelist - datacompy (⭐354) - DataComPy is a Python library that facilitates the comparison of two DataFrames in pandas, Polars, Spark and more. The library goes beyond basic equality checks by providing detailed insights into discrepancies at both row and column levels. (Recently Updated / [Apr 09, 2024](/content/2024/04/09/README.md))
README
# DataComPy
![PyPI - Python Version](https://img.shields.io/pypi/pyversions/datacompy)
[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/ambv/black)
[![PyPI version](https://badge.fury.io/py/datacompy.svg)](https://badge.fury.io/py/datacompy)
[![Anaconda-Server Badge](https://anaconda.org/conda-forge/datacompy/badges/version.svg)](https://anaconda.org/conda-forge/datacompy)
![PyPI - Downloads](https://img.shields.io/pypi/dm/datacompy)DataComPy is a package to compare two Pandas DataFrames. Originally started to
be something of a replacement for SAS's ``PROC COMPARE`` for Pandas DataFrames
with some more functionality than just ``Pandas.DataFrame.equals(Pandas.DataFrame)``
(in that it prints out some stats, and lets you tweak how accurate matches have to be).
Then extended to carry that functionality over to Spark Dataframes.## Quick Installation
```shell
pip install datacompy
```or
```shell
conda install datacompy
```### Installing extras
If you would like to use Spark or any other backends please make sure you install via extras:
```shell
pip install datacompy[spark]
pip install datacompy[dask]
pip install datacompy[duckdb]
pip install datacompy[polars]
pip install datacompy[ray]```
## Supported backends
- Pandas: ([See documentation](https://capitalone.github.io/datacompy/pandas_usage.html))
- Spark: ([See documentation](https://capitalone.github.io/datacompy/spark_usage.html))
- Polars (Experimental): ([See documentation](https://capitalone.github.io/datacompy/polars_usage.html))
- Fugue is a Python library that provides a unified interface for data processing on Pandas, DuckDB, Polars, Arrow,
Spark, Dask, Ray, and many other backends. DataComPy integrates with Fugue to provide a simple way to compare data
across these backends. Please note that Fugue will use the Pandas (Native) logic at its lowest level
([See documentation](https://capitalone.github.io/datacompy/fugue_usage.html))## Contributors
We welcome and appreciate your contributions! Before we can accept any contributions, we ask that you please be sure to
sign the [Contributor License Agreement (CLA)](https://cla-assistant.io/capitalone/datacompy).This project adheres to the [Open Source Code of Conduct](https://developer.capitalone.com/resources/code-of-conduct/).
By participating, you are expected to honor this code.## Roadmap
Roadmap details can be found [here](https://github.com/capitalone/datacompy/blob/develop/ROADMAP.rst)