Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/defeo/pylagiarist

Plagiate detection python script
https://github.com/defeo/pylagiarist

Last synced: about 4 hours ago
JSON representation

Plagiate detection python script

Awesome Lists containing this project

README

        

Pylagiarist
===========

Pylagiarist is a plagiate detection script written in Python.

It recursively scans folders for files whose names match a certain
pattern, compares each pair of files, and reports those whose
similarity is beyond a given threshold.

Pylagiarist uses difflib's SequenceMatcher to compute similarities. If
[python-Levenshtein](https://github.com/ztane/python-Levenshtein/) is
installed, it also reports Levenshtein ratios for similar files.

Usage
-----

Just run

pylagiarist.py

in the folder containing the files you want to compare. Pylagiarist
can take some switches, type

pylagiarist.py -h

to learn about them.

Examples
--------

Scan folders `src1` and `src2` for files with names ending in `.html`
or `.htm`, but not matching `index`

pylagiarist -i '.html$' -i '.htm$' -x index src1 src2

Report similarities above 0.4 (computed by difflib)

pylagiarist -t 0.4

Print progress on stderr

pylagiarist -v