https://github.com/renien/doc-diff
:snake: Support app to get a diff results from two document :green_book:
https://github.com/renien/doc-diff
comparison-reports csv datamining datascience doc-diff python
Last synced: 3 months ago
JSON representation
:snake: Support app to get a diff results from two document :green_book:
- Host: GitHub
- URL: https://github.com/renien/doc-diff
- Owner: Renien
- License: mit
- Created: 2016-06-30T04:15:43.000Z (over 9 years ago)
- Default Branch: master
- Last Pushed: 2017-04-02T04:11:49.000Z (almost 9 years ago)
- Last Synced: 2025-09-30T14:32:03.628Z (6 months ago)
- Topics: comparison-reports, csv, datamining, datascience, doc-diff, python
- Language: Python
- Homepage:
- Size: 94.7 KB
- Stars: 5
- Watchers: 3
- Forks: 2
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
doc-diff
Generate the diff data between two files
## Summary
Implementation was started mainly focusing as a support app for **Data Science** work. The current implementation helps to analyse recommendations ‘CSV’ files. (i.e If you need to analyse two algorithm results this lib will be very handy)
## Recommendation Format
The CSV file contains list of **key-value** in each line. The key is product (productCode/porductID) and the value is list of recommended products (productCode/porductID). The product code and the recommendation list is separated with _**‘TAB’**_.
```
Sample Recommendation
---------------------
1098808 1597549,1974410,1850731
1161889 1095554
1706909 2078866
1815368 2215327
1847624 2179582,2085753
```
## Installation
```
$ pip install doc-diff
```
## Features
- Generate the following comparison reports
- common_in_doc1-and-doc2-%Y-%m-%d.csv
- common_key_with_diff_values-%Y-%m-%d.csv
- exclusive_in_doc1-%Y-%m-%d.csv
- exclusive_in_doc2-%Y-%m-%d.csv
- Compare two files and return following **'dicts(prodCode, recommendation)'**
- common_in_doc1_and_doc2_list = dicts()
- common_key_with_diff_values_list = dicts()
- exclusive_in_doc1_list = dicts()
- exclusive_in_doc2_list = dicts()
## Usage
- Allow to generate the evaluation result files
- Able to extract the comparsion results as key-value list
- Using the diifferent dictionary objects you can present the results as you like (i.e Graphs, Venn diagram)
## Comparison Report Format
- In CSV file each line contains the product code and the corresponding recommendation. The product code and the recommendation list is separated with **‘TAB’**.
- In **‘common_key_with_diff_values-%Y-%m-%d.csv’** file the result format is slightly different. To show the un-matching recommendation in each line after product code TAB separation you will find the result of **‘A’ algorithm** and the **‘B' algorithm** result separated with two pipes **‘||’**.
```
Sample common_key_with_diff_values-%Y-%m-%d.csv
------------------------------------------------
c36623 2256360,2398464,2503472,c27214||2256360,2398464,2503472,c27214,c79033
c973955 1965886,c340951,c752950,c973951||1965886,c24224,c340951,c752950,c906950,c973951
c25749 c25982||c205950,c25982,c65977
```
## Package Directory Layout
```
doc-diff
├── LICENSE # Contains License Agreement file
├── README.md # Contains the details of doc-diff lib
├── doc_diff # Root package
│ ├── Diff.py # Diff class
│ ├── __init__.py # Package declaration
├── setup.py # Setup file for packaging
└── test # Test module (Includes the useage)
├── __init__.py # Package declaration
├── data # Sample data
│ ├── a-priori.csv # A-Priori algo results
│ └── pfp.csv # FP-Growth algo results
└── doc_diff_app.py # Main method file
```
## Current Published Artifacts
- [doc-diff-1.0.0](https://github.com/Renien/doc-diff/tree/1.0.0/tarball)
## Contribute
For any problem/question or if you think a feature that could make doc-diff lib more useful, do not hesitate to open an issue.
## Thanks
Thanks [Flat Icon](http://www.flaticon.com/) for the free logo.
## License
MIT © [Renien](https://twitter.com/RenienJoseph)