Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/saifadin1/copyshield
Simple Plagiarism detection tool
https://github.com/saifadin1/copyshield
Last synced: about 7 hours ago
JSON representation
Simple Plagiarism detection tool
- Host: GitHub
- URL: https://github.com/saifadin1/copyshield
- Owner: saifadin1
- Created: 2024-11-03T00:13:05.000Z (13 days ago)
- Default Branch: main
- Last Pushed: 2024-11-14T16:23:37.000Z (1 day ago)
- Last Synced: 2024-11-14T17:27:59.544Z (1 day ago)
- Language: C++
- Homepage:
- Size: 639 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# CopyShield
## what is CopyShield ?
CopyShield is a simple Plagiarism Detection tool, which reads collection of documents and checks for similarity between them. It can be used to detect plagiarism in documents or source codes.
## How it works ?
1. **Text Preprocessing**: The code from each file is preprocessed to remove comments and whitespace, and all characters are converted to lowercase.
2. **n-grams Generation**: Each processed code snippet is divided into n-grams
3. **Hashing**: The n-grams are hashed to reduce the dimensionality of the feature space.
4. **Fingerprinting**: A sliding window approach is used to create fingerprints from the hashed n-grams, allowing efficient comparison.
5. **Similarity Calculation**: The program computes Jaccard Similarity between fingerprints of each pair of files. If similarity exceeds a threshold , it flags the files as likely duplicates.
## Usage
1. Compile the code using the following command:
```bash
g++ -std=c++17 main.cpp -o main
```2. Run the compiled code using the following command:
```bash
.\main .\
```## Options
* Set the threshold value for similarity
```bash
--threshold, -t
```* Set the window size for fingerprinting
```bash
--window-size, -w
```* Set the n-gram size
```bash
--grams, -g
```* Set the prime value for hashing
```bash
--prime, -p
```* Exclude specific files (problem)
```bash
--exclude-problems, -e
```* Include only specific files (problem)
```bash
--include-problems, -i
```* Display the help message showing the available options and their descriptions
```bash
--help, -h
```### Example
```bash
.\main .\problems -t 70 -w 5 -g 3 -p 101 -e problem1,problem2
```