https://github.com/ahmedraja1/plagiarism-detector

Plagiarism Detectors Implementation Hub
https://github.com/ahmedraja1/plagiarism-detector

plagiarism plagiarism-check plagiarism-checker plagiarism-detection plagiarism-detector plagiarism-prevention

Last synced: 7 months ago
JSON representation

Plagiarism Detectors Implementation Hub

Host: GitHub
URL: https://github.com/ahmedraja1/plagiarism-detector
Owner: AhmedRaja1
License: mit
Created: 2021-09-02T19:01:24.000Z (about 4 years ago)
Default Branch: main
Last Pushed: 2022-01-07T13:34:00.000Z (almost 4 years ago)
Last Synced: 2025-01-17T23:31:25.936Z (9 months ago)
Topics: plagiarism, plagiarism-check, plagiarism-checker, plagiarism-detection, plagiarism-detector, plagiarism-prevention
Language: C++
Homepage: https://ahmedrajawrites.medium.com/lets-code-a-plagiarism-detector-1ff5abe55d45
Size: 1.56 MB
Stars: 2
Watchers: 3
Forks: 2
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

# Plagiarism Detector in C++
Uses naive methods to detect observable plagiarism in text files.

## Tests implemented

#### 1. Token frequency matching
>Target file's word count is matched with each individual textfile's token counts in the database folder to find a high degree of similarity in the tokens and their frequency of use.

#### 2. N-Gram matching
>A direct match of consecutive tokens *(or ngrams)* was performed to detect similarity in patterns and neighbourhood of tokens. The value of **N** for the N-Gram generation was varied and a cumulative result was obtained by a weighted average over all of the results.

#### 3. Cosine matching
>Cosine of the angle between the vectors obtained from the target and the base text files is computed to estimate the simiilarity in the token vectors of both the files.

## Getting started
**1.** Place all the reference text files in the **database** directory.

**2.** Place all the text files required to be checked in the **target** directory.

**3.** *(Optional)* Edit the `stopwords.txt` text file as per requirement, to add words which are to be ignored in the analysis.

**4.** Change the `database` and `target_folder` variables with the actual location of them.

**5.** Compile `run.cpp` in C++11 (or above), with the command `g++ run.cpp -std=c++11`.

**6.** Run the generated executable with the command `./a.out` *(in Linux environment).*

# Image of the project
![](https://raw.githubusercontent.com/AhmedRaja1/Plagiarism-Detector/main/WhatsApp%20Image%202021-03-18%20at%2011.05.43%20PM.jpeg)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/ahmedraja1/plagiarism-detector

Awesome Lists containing this project

README