Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/saifadin1/copyshield

Simple Plagiarism detection tool
https://github.com/saifadin1/copyshield

Last synced: about 7 hours ago
JSON representation

Simple Plagiarism detection tool

Awesome Lists containing this project

README

        

# CopyShield

## what is CopyShield ?

CopyShield is a simple Plagiarism Detection tool, which reads collection of documents and checks for similarity between them. It can be used to detect plagiarism in documents or source codes.

## How it works ?

1. **Text Preprocessing**: The code from each file is preprocessed to remove comments and whitespace, and all characters are converted to lowercase.

2. **n-grams Generation**: Each processed code snippet is divided into n-grams

3. **Hashing**: The n-grams are hashed to reduce the dimensionality of the feature space.

4. **Fingerprinting**: A sliding window approach is used to create fingerprints from the hashed n-grams, allowing efficient comparison.

5. **Similarity Calculation**: The program computes Jaccard Similarity between fingerprints of each pair of files. If similarity exceeds a threshold , it flags the files as likely duplicates.

## Usage

1. Compile the code using the following command:
```bash
g++ -std=c++17 main.cpp -o main
```

2. Run the compiled code using the following command:
```bash
.\main .\
```

## Options

* Set the threshold value for similarity
```bash
--threshold, -t
```

* Set the window size for fingerprinting
```bash
--window-size, -w
```

* Set the n-gram size
```bash
--grams, -g
```

* Set the prime value for hashing
```bash
--prime, -p
```

* Exclude specific files (problem)
```bash
--exclude-problems, -e
```

* Include only specific files (problem)
```bash
--include-problems, -i
```

* Display the help message showing the available options and their descriptions
```bash
--help, -h
```

### Example

```bash
.\main .\problems -t 70 -w 5 -g 3 -p 101 -e problem1,problem2
```