Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/dimits-ts/plagiarismdetection
A program that scans all text files in a given directory and finds the pairs that are more likely to have copied / plagiariazed text.
https://github.com/dimits-ts/plagiarismdetection
artificial-intelligence machine-learning tf-idf
Last synced: about 9 hours ago
JSON representation
A program that scans all text files in a given directory and finds the pairs that are more likely to have copied / plagiariazed text.
- Host: GitHub
- URL: https://github.com/dimits-ts/plagiarismdetection
- Owner: dimits-ts
- Created: 2021-07-11T15:34:51.000Z (over 3 years ago)
- Default Branch: main
- Last Pushed: 2021-07-11T15:47:37.000Z (over 3 years ago)
- Last Synced: 2024-04-22T02:45:10.301Z (7 months ago)
- Topics: artificial-intelligence, machine-learning, tf-idf
- Language: Python
- Homepage:
- Size: 13.7 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# PlagiarismDetection
Scans all text files in a given directory and compares each one to all others as to find the pairs that are more likely to have copied / plagiariazed text.
Utilizes the [TF-IDF](https://en.wikipedia.org/wiki/Tf%E2%80%93idf) algorithm to handle comparisons between files somewhat intelligently.
Automatically ignores binary / empty files. By default only looks for .txt documents, but can be told to scan all file types anyway.
Used via console, program parameters are determined at runtime and saved to a dedicated settings file. Test files are included in the "Tests" folder.