Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/dimits-ts/plagiarismdetection

A program that scans all text files in a given directory and finds the pairs that are more likely to have copied / plagiariazed text.
https://github.com/dimits-ts/plagiarismdetection

artificial-intelligence machine-learning tf-idf

Last synced: about 9 hours ago
JSON representation

A program that scans all text files in a given directory and finds the pairs that are more likely to have copied / plagiariazed text.

Awesome Lists containing this project

README

        

# PlagiarismDetection

Scans all text files in a given directory and compares each one to all others as to find the pairs that are more likely to have copied / plagiariazed text.

Utilizes the [TF-IDF](https://en.wikipedia.org/wiki/Tf%E2%80%93idf) algorithm to handle comparisons between files somewhat intelligently.

Automatically ignores binary / empty files. By default only looks for .txt documents, but can be told to scan all file types anyway.

Used via console, program parameters are determined at runtime and saved to a dedicated settings file. Test files are included in the "Tests" folder.