https://github.com/ericfreese/data-comparison
A tool for comparing similar rectangular data sets
https://github.com/ericfreese/data-comparison
Last synced: 10 months ago
JSON representation
A tool for comparing similar rectangular data sets
- Host: GitHub
- URL: https://github.com/ericfreese/data-comparison
- Owner: ericfreese
- Created: 2024-08-23T17:02:22.000Z (almost 2 years ago)
- Default Branch: master
- Last Pushed: 2024-08-30T19:04:10.000Z (almost 2 years ago)
- Last Synced: 2025-01-23T03:23:17.791Z (over 1 year ago)
- Language: Python
- Size: 12.7 KB
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Data Comparison Tool
Takes two CSV files with overlapping column names and generates a comparison report (printed to stdout in CSV format) by grouping the data in each file by specified qualitative columns, aggregating (summing) specified quantitative columns, joining the aggregated data together on the grouped columns, and reporting before/after values of the quantitative aggregates.
## Example
([`xsv`](https://github.com/BurntSushi/xsv) used for illustration purposes)
Assume you have two files `before.csv` and `after.csv`:
```
$ xsv table before.csv
name color amount
foo red 3
foo green 2
foo blue 4
bar red 5
bar green 3
$ xsv table after.csv
name color amount
foo red 5
foo green 2
bar red 5
bar green 2
bar blue 4
```
You can compare them in the following ways:
```
$ data-compare before.csv after.csv -m amount -g color | xsv table
color b_amount a_amount d_amount
blue 4.000 4.000 0.000
green 5.000 4.000 -1.000
red 8.000 10.000 2.000
$ data-compare before.csv after.csv -m amount -g name | xsv table
name b_amount a_amount d_amount
bar 8.000 11.000 3.000
foo 9.000 7.000 -2.000
$ data-compare before.csv after.csv -m amount -g color name | xsv table
color name b_amount a_amount d_amount
blue bar 4.000 4.000
blue foo 4.000 -4.000
green bar 3.000 2.000 -1.000
green foo 2.000 2.000 0.000
red bar 5.000 5.000 0.000
red foo 3.000 5.000 2.000
$ data-compare /tmp/before.csv /tmp/after.csv -m amount -g color=red name | xsv table
color name b_amount a_amount d_amount
red bar 5.000 5.000 0.000
red foo 3.000 5.000 2.000
```
## Usage
```
usage: data-compare [-h] -m col [col ...] -g col [col ...] before after
positional arguments:
before file to start with
after file to compare against
options:
-h, --help show this help message and exit
-m col [col ...], --measures col [col ...]
columns to sum and compare
-g col [col ...], --group-cols col [col ...]
columns to group and join by
```
Filters can be applied to a grouped column by suffixing it with `=` so using `-g foo=bar` would group by the `foo` column and filter rows to only those where `foo` is equal to `bar`
## Initializing virtual env for dev
```sh
pipenv sync --dev
```
## Run in dev environment
```sh
pipenv run python data-compare.py
```
## Build standalone executable
```sh
pipenv run pyinstaller data-compare.spec
```