{"id":20525561,"url":"https://github.com/saifadin1/copyshield","last_synced_at":"2025-09-25T12:30:54.703Z","repository":{"id":262855827,"uuid":"882503039","full_name":"saifadin1/CopyShield","owner":"saifadin1","description":"Simple Plagiarism detection tool for competitive programming competitions","archived":false,"fork":false,"pushed_at":"2024-12-21T10:59:29.000Z","size":1749,"stargazers_count":4,"open_issues_count":0,"forks_count":1,"subscribers_count":1,"default_branch":"main","last_synced_at":"2024-12-21T11:30:35.286Z","etag":null,"topics":["codeforces","competitive-programming","cpp","plagiarism-detection","vjudge"],"latest_commit_sha":null,"homepage":"","language":"C++","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/saifadin1.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-11-03T00:13:05.000Z","updated_at":"2024-12-21T10:59:33.000Z","dependencies_parsed_at":"2024-12-04T17:25:53.111Z","dependency_job_id":"462f83a6-ceaf-43c6-a715-5f3c05186cfb","html_url":"https://github.com/saifadin1/CopyShield","commit_stats":null,"previous_names":["saifadin1/copyshield"],"tags_count":2,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/saifadin1%2FCopyShield","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/saifadin1%2FCopyShield/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/saifadin1%2FCopyShield/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/saifadin1%2FCopyShield/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/saifadin1","download_url":"https://codeload.github.com/saifadin1/CopyShield/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":234192747,"owners_count":18793993,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["codeforces","competitive-programming","cpp","plagiarism-detection","vjudge"],"created_at":"2024-11-15T23:06:12.350Z","updated_at":"2025-09-25T12:30:53.605Z","avatar_url":"https://github.com/saifadin1.png","language":"C++","funding_links":[],"categories":[],"sub_categories":[],"readme":"# CopyShield 🛡️\n\n## Table of Contents\n\n- [What is CopyShield ?](#what-is-copyshield-)\n- [Report Generation](#report-generation)\n  * [CSV Reports](#csv-reports)\n  * [HTML Report](#html-report)\n    + [Code Comparison Visualization](#Code-comparison-visualization)\n      - [How it works ?](#how-it-works--)\n- [How it works ?](#how-it-works-)\n- [Getting Started](#Getting-Started)\n  * [Setting up the environment](#setting-up-the-environment)\n  * [Fetching Submissions](#fetching-submissions)\n    + [**Vjudge**](#vjudge)\n    + [**CodeForces**](#CodeForces)\n  * [Compile cpp code](#compile-cpp-code)\n  * [Getting the reports](#getting-the-reports)\n    * [Sending emails](#sending-emails)\n        + [Prepare a CSV file](#prepare-a-csv-file)\n        + [Set up Mailjet API credentials](#set-up-mailjet-api-credentials)\n        + [Run the following command to send the emails](#run-the-following-command-to-send-the-emails)\n    * [Command-Line options](#command-line-options)\n        + [Example](#example)\n- [TODO](#todo)\n\n## What is CopyShield ? 🤔\n\nCopyShield is a simple Plagiarism Detection tool, which reads collection of documents and checks for similarity between them. It  can be used to detect plagiarism in documents or source codes.\n\n## Report Generation\n\n### CSV Reports\n\nThe report generation feature creates three separate files with detailed information as follows:\n\n1. **Detected Plagiarism and Similarities**: This file contains the pairs of files that are flagged as likely duplicates along with the similarity percentage.\n\n2. **Pairwise Similarities**: This file contains list of similarity percentages between each pair of participants.\n\n3. **Participant Plagiarism Scores**: This file contains the plagiarism score of each participant.\n\n\n\n### HTML Report\n\nThe program generates an HTML report containing the code snippets of the all pairs of files that are flagged as likely duplicates.\n\n![HTML report](https://github.com/saifadin1/CopyShield/blob/main/res/image2.png)\n\n![HTML report](https://github.com/saifadin1/CopyShield/blob/main/res/image4.png)\n\n#### Code Comparison Visualization 📊\n\nOur application includes a Code Comparison Visualization feature that makes it easy to identify differences between two sets of code\n\n##### How it works ?\n\nThe left side displays the first (who submit first) participant's code and the right side displays the second (who submit second) participant's code.\nthe differences are highlighted as follows:\n- Green: The code that second participant added.\n- Red: The code that second participant removed.\n- Blue : The code that is common between the two participants.\n\nnote: the order of the participants in submission time is only available in codeforces submissions, (not in vjudge case cuz can't know who submit first ¯\\\\_(ツ)_/¯ ).\n\nyou can see the example below to understand it better 👇👇.\n\n![HTML report](https://github.com/saifadin1/CopyShield/blob/main/res/image3.png)\n\n![HTML report](https://github.com/saifadin1/CopyShield/blob/main/res/image5.png)\n\n\n\n\n## How it works ? 🛠️\n\n1. **Text Preprocessing**: The code from each file is preprocessed to remove comments and whitespace, and all characters are converted to lowercase.\n\n2. **n-grams Generation**: Each processed code snippet is divided into n-grams \n\n3. **Hashing**: The n-grams are hashed to reduce the dimensionality of the feature space.\n\n4. **Fingerprinting**: A sliding window approach is used to create fingerprints from the hashed n-grams, allowing efficient comparison.\n\n5. **Similarity Calculation**: The program computes Jaccard Similarity between fingerprints of each pair of files. If similarity exceeds a threshold , it flags the files as likely duplicates.\n\n## Getting Started 🚀\n\n### Setting up the environment\n\n1. Clone the repository\n\n```bash\ngit clone https://github.com/saifadin1/CopyShield.git\n```\n\n2. Install the required packages\n\n```bash\npip install -r requirements.txt\n```\n\n3. Create the `.env` file: Copy the contents of the [`.env.example`](https://github.com/saifadin1/CopyShield/blob/main/.env.example) file to create a new `.env` file in the project root directory and set the required environment variables if needed.\n\n\n\n\n### Fetching Submissions ⬇️\n\nFirst, the submissions should be fetched from the online judge (Vjudge or CodeForces especially).\n\n#### **Vjudge**\n\n Simply download the submissions from the contest page as a zip file and files names will be formatted correctly as: `\u003csubmission Id\u003e_\u003cVerdict\u003e_\u003cusername\u003e_\u003cproblem name\u003e`\n the image below shows the export submissions button in the contest page of Vjudge.\n\n![Vjudge export submissions](https://github.com/saifadin1/CopyShield/blob/main/res/image6.png)\n\n#### **CodeForces**\n\nSimilarly, download the submissions as a zip file from the contest page. However, there's a slight issue: the filenames are not formatted as needed. To fix this, we need to reformat them to match the required format:` \u003csubmission Id\u003e_\u003cVerdict\u003e_\u003cusername\u003e_\u003cproblem name\u003e`.\n[`CodeForcesSubmissionsReformatting`](https://github.com/saifadin1/CopyShield/tree/main/src/CodeForcesSubmissionsReformatting)\nthis directory contains two scripts to help you with that:\n1. `codeforces_api_client.py` : this script will fetch the metadata of the submissions and save it in a json file.\n2. `rename_submissions.py` : this script will rename the files in  `./src/CodeForcesSubmissionsReformatting/submissions` to be formatted so the fetched submissions should be in this path.\n\nyou can find the contest admin page in the following path: `https://codeforces.com/group/\u003cgroup_id\u003e/contest/\u003ccontest_id\u003e/admin` and the image below shows the export submissions button in the contest admin page of codeforces.\n\n![Codeforces export submissions](https://github.com/saifadin1/CopyShield/blob/main/res/image7.png)\n\n\n\n### Compile cpp code 🔨\n\n1. Navigate to the `src` directory using the following command:\n\n```bash\ncd ./src\n```\n\n2. Compile the code using the following command:\n\n```bash\ng++ *.cpp -o main\n```\n\n3. Run the compiled code using the following command:\n\n```bash\n./src/main ./\u003cpath to the directory containing the files to be checked\u003e\n```\n\n### Getting the reports 🗂️\n\nThe reports will be generated in `./src/reports` directory as follows structure:\n\n```bash\n| reports\n|---| result.csv\n|---| pairs.csv\n|---| participants.csv\n|---| index.html\n|---| problems_data\n|---|---| A\n|---|---|---|HTMLreports\n|---|---|---|index.html\n|---|---| B\n|---|---|---|HTMLreports\n|---|---|---|index.html\n|---|---|..\n|---|---|..\n```\n\nto view the HTML report, open the `index.html` file in the browser.\n\n\n### Sending emails 📩\n\nYou should flag participants who have been verified as cheaters to send them emails in `reports/praticapnts.csv` , all participants will be marked with `False` by defualte in the `Flag` column, \nso if you confirmed that they are cheaters change the value to `True` and\nyou can send emails to the flagged participants by the following steps\n\n#### Prepare a CSV file\nAdd a csv file with the following name `group_data.csv` in the following path `./src/sending_mails` and contains the following columns:\n\n```bash\n| Handle | Email | Name |\n```\n\n#### Set up [Mailjet](https://www.mailjet.com/) API credentials\n\nEnsure the following environment variables are set in the `.env` file:\n\n```bash\nMAILJET_API_KEY=\"\u003cyour-api-key\u003e\"\nMAILJET_API_SECRET=\"\u003cyour-api-secret\u003e\"\nMAILJET_SENDER_EMAIL=\"\u003cyour-sender-mail\u003e\"\n```\n\n#### Run the following command to send the emails\n\n```bash\npython .\\src\\sending_mails\\send_mails.py\n```\n\n\n\n\n\n\n## Command-Line options ☰\n\n* Set the threshold value for similarity\n    ```bash\n    --threshold, -t \u003cvalue\u003e\n    ```\n\n* Set the window size for fingerprinting\n    ```bash\n    --window-size, -w \u003cvalue\u003e\n    ```\n\n* Set the n-gram size\n    ```bash\n    --grams, -g \u003cvalue\u003e\n    ```\n\n* Exclude specific files (problem) \n    ```bash\n    --exclude-problems, -e \u003cproblem1,problem2,...\u003e\n    ```\n\n* Include only specific files (problem)\n    ```bash\n    --include-problems, -i \u003cproblem1,problem2,...\u003e\n    ```\n\n* Include only specific users\n    ```bash\n    --include-users, -u \u003cuser1,user2,...\u003e\n    ```\n\n* Display the help message showing the available options and their descriptions\n    ```bash\n    --help, -h\n    ```\n\n### Example\n\n```bash\n.\\src\\main .\\problems -t 70 -w 5 -g 3 -e problem1,problem2\n```\n\n## TODO 📝\n\n- [x] Add support for highlighting the similer blocks in the HTML report\n- [ ] Add better hashing function\n- [ ] Add more efficient similarity calculation algorithm\n\n\n\n\n\n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsaifadin1%2Fcopyshield","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsaifadin1%2Fcopyshield","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsaifadin1%2Fcopyshield/lists"}