Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/zavierferodova/py-cspdf
Python Check Similarity PDF
https://github.com/zavierferodova/py-cspdf
pdf pdf-diff pdf-difference pdf-similarity
Last synced: 4 days ago
JSON representation
Python Check Similarity PDF
- Host: GitHub
- URL: https://github.com/zavierferodova/py-cspdf
- Owner: zavierferodova
- Created: 2024-01-07T12:52:44.000Z (10 months ago)
- Default Branch: main
- Last Pushed: 2024-01-08T06:01:10.000Z (10 months ago)
- Last Synced: 2024-01-08T14:55:19.106Z (10 months ago)
- Topics: pdf, pdf-diff, pdf-difference, pdf-similarity
- Language: Python
- Homepage:
- Size: 3.91 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Python-CSPDF
Python Check Similarity PDF from active directory and store it to csv file. Project inspired by [diff-pdf](https://github.com/luke-cha/diff-pdf)### Installation
```sh
pip install -r requirements.py
```### Before Use !!
1. Install all required depedencies.
2. Copy `cspdf.py` into directory that contains pdf file to be compared.
3. Run `cspdf.py` script.
4. Note: This script just work on pdf files only, if you have word document please convert it into pdf first.### Usage
1. Check similarity all pdf files on current active directory
```sh
python cspdf.py -a -o comparison.csv
```
2. Check similarity one pdf file then compare with all pdf files on current active directory
```sh
python cspdf.py -t a.pdf -o comparison.csv
```
3. Check similarity including image comparison (slow processing)
```sh
# Just add -i or --image argument
python cspdf.py -i -t a.pdf -o comparison.csv
```
5. Get help
```sh
python cspdf.py -h
```### Similarity Check Methods
1. Text similarity with Sequence Matcher
2. Image similarity with Structural Similarity Index (SSIM)### Libraries
1. [PDFMiner](https://pypi.org/project/pdfminer/)
2. [PyMuPDF](https://pymupdf.readthedocs.io/)
3. [OpenCV Python](https://opencv.org/get-started/)
4. [Scikit Image](https://scikit-image.org)
5. [TQDM Progress Bar](https://tqdm.github.io)### Credits
Made by Zavier, enjoyy ✨