An open API service indexing awesome lists of open source software.

https://github.com/renien/doc-diff

:snake: Support app to get a diff results from two document :green_book:
https://github.com/renien/doc-diff

comparison-reports csv datamining datascience doc-diff python

Last synced: 3 months ago
JSON representation

:snake: Support app to get a diff results from two document :green_book:

Awesome Lists containing this project

README

          


doc-diff


doc-diff


Generate the diff data between two files




License


Travis Build

## Summary

Implementation was started mainly focusing as a support app for **Data Science** work. The current implementation helps to analyse recommendations ‘CSV’ files. (i.e If you need to analyse two algorithm results this lib will be very handy)

## Recommendation Format
The CSV file contains list of **key-value** in each line. The key is product (productCode/porductID) and the value is list of recommended products (productCode/porductID). The product code and the recommendation list is separated with _**‘TAB’**_.

```
Sample Recommendation
---------------------
1098808 1597549,1974410,1850731
1161889 1095554
1706909 2078866
1815368 2215327
1847624 2179582,2085753
```

## Installation
```
$ pip install doc-diff
```

## Features
- Generate the following comparison reports
- common_in_doc1-and-doc2-%Y-%m-%d.csv
- common_key_with_diff_values-%Y-%m-%d.csv
- exclusive_in_doc1-%Y-%m-%d.csv
- exclusive_in_doc2-%Y-%m-%d.csv
- Compare two files and return following **'dicts(prodCode, recommendation)'**
- common_in_doc1_and_doc2_list = dicts()
- common_key_with_diff_values_list = dicts()
- exclusive_in_doc1_list = dicts()
- exclusive_in_doc2_list = dicts()

## Usage

- Allow to generate the evaluation result files
- Able to extract the comparsion results as key-value list
- Using the diifferent dictionary objects you can present the results as you like (i.e Graphs, Venn diagram)

## Comparison Report Format

- In CSV file each line contains the product code and the corresponding recommendation. The product code and the recommendation list is separated with **‘TAB’**.
- In **‘common_key_with_diff_values-%Y-%m-%d.csv’** file the result format is slightly different. To show the un-matching recommendation in each line after product code TAB separation you will find the result of **‘A’ algorithm** and the **‘B' algorithm** result separated with two pipes **‘||’**.

```
Sample common_key_with_diff_values-%Y-%m-%d.csv
------------------------------------------------
c36623 2256360,2398464,2503472,c27214||2256360,2398464,2503472,c27214,c79033
c973955 1965886,c340951,c752950,c973951||1965886,c24224,c340951,c752950,c906950,c973951
c25749 c25982||c205950,c25982,c65977
```

## Package Directory Layout

```
doc-diff
├── LICENSE # Contains License Agreement file
├── README.md # Contains the details of doc-diff lib
├── doc_diff # Root package
│   ├── Diff.py # Diff class
│   ├── __init__.py # Package declaration
├── setup.py # Setup file for packaging
└── test # Test module (Includes the useage)
├── __init__.py # Package declaration
├── data # Sample data
│   ├── a-priori.csv # A-Priori algo results
│   └── pfp.csv # FP-Growth algo results
└── doc_diff_app.py # Main method file
```

## Current Published Artifacts

- [doc-diff-1.0.0](https://github.com/Renien/doc-diff/tree/1.0.0/tarball)

## Contribute
For any problem/question or if you think a feature that could make doc-diff lib more useful, do not hesitate to open an issue.

## Thanks
Thanks [Flat Icon](http://www.flaticon.com/) for the free logo.

## License
MIT © [Renien](https://twitter.com/RenienJoseph)