Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/defeo/pylagiarist
Plagiate detection python script
https://github.com/defeo/pylagiarist
Last synced: about 4 hours ago
JSON representation
Plagiate detection python script
- Host: GitHub
- URL: https://github.com/defeo/pylagiarist
- Owner: defeo
- License: unlicense
- Created: 2014-04-25T21:02:40.000Z (over 10 years ago)
- Default Branch: master
- Last Pushed: 2016-08-14T21:06:36.000Z (over 8 years ago)
- Last Synced: 2024-04-15T03:04:11.440Z (7 months ago)
- Language: Python
- Size: 2.93 KB
- Stars: 8
- Watchers: 4
- Forks: 3
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
Pylagiarist
===========Pylagiarist is a plagiate detection script written in Python.
It recursively scans folders for files whose names match a certain
pattern, compares each pair of files, and reports those whose
similarity is beyond a given threshold.Pylagiarist uses difflib's SequenceMatcher to compute similarities. If
[python-Levenshtein](https://github.com/ztane/python-Levenshtein/) is
installed, it also reports Levenshtein ratios for similar files.Usage
-----Just run
pylagiarist.py
in the folder containing the files you want to compare. Pylagiarist
can take some switches, typepylagiarist.py -h
to learn about them.
Examples
--------Scan folders `src1` and `src2` for files with names ending in `.html`
or `.htm`, but not matching `index`pylagiarist -i '.html$' -i '.htm$' -x index src1 src2
Report similarities above 0.4 (computed by difflib)
pylagiarist -t 0.4
Print progress on stderr
pylagiarist -v