https://github.com/ahmedraja1/plagiarism-detector
Plagiarism Detectors Implementation Hub
https://github.com/ahmedraja1/plagiarism-detector
plagiarism plagiarism-check plagiarism-checker plagiarism-detection plagiarism-detector plagiarism-prevention
Last synced: 7 months ago
JSON representation
Plagiarism Detectors Implementation Hub
- Host: GitHub
- URL: https://github.com/ahmedraja1/plagiarism-detector
- Owner: AhmedRaja1
- License: mit
- Created: 2021-09-02T19:01:24.000Z (about 4 years ago)
- Default Branch: main
- Last Pushed: 2022-01-07T13:34:00.000Z (almost 4 years ago)
- Last Synced: 2025-01-17T23:31:25.936Z (9 months ago)
- Topics: plagiarism, plagiarism-check, plagiarism-checker, plagiarism-detection, plagiarism-detector, plagiarism-prevention
- Language: C++
- Homepage: https://ahmedrajawrites.medium.com/lets-code-a-plagiarism-detector-1ff5abe55d45
- Size: 1.56 MB
- Stars: 2
- Watchers: 3
- Forks: 2
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Plagiarism Detector in C++
Uses naive methods to detect observable plagiarism in text files.## Tests implemented
#### 1. Token frequency matching
>Target file's word count is matched with each individual textfile's token counts in the database folder to find a high degree of similarity in the tokens and their frequency of use.#### 2. N-Gram matching
>A direct match of consecutive tokens *(or ngrams)* was performed to detect similarity in patterns and neighbourhood of tokens. The value of **N** for the N-Gram generation was varied and a cumulative result was obtained by a weighted average over all of the results.#### 3. Cosine matching
>Cosine of the angle between the vectors obtained from the target and the base text files is computed to estimate the simiilarity in the token vectors of both the files.## Getting started
**1.** Place all the reference text files in the **database** directory.**2.** Place all the text files required to be checked in the **target** directory.
**3.** *(Optional)* Edit the `stopwords.txt` text file as per requirement, to add words which are to be ignored in the analysis.
**4.** Change the `database` and `target_folder` variables with the actual location of them.
**5.** Compile `run.cpp` in C++11 (or above), with the command `g++ run.cpp -std=c++11`.
**6.** Run the generated executable with the command `./a.out` *(in Linux environment).*
# Image of the project
