{"id":34065874,"url":"https://github.com/trailofbits/vendetect","last_synced_at":"2025-12-14T06:03:25.764Z","repository":{"id":304993382,"uuid":"959463671","full_name":"trailofbits/vendetect","owner":"trailofbits","description":"A tool to automatically detect copy+pasted and vendored code between repositories","archived":false,"fork":false,"pushed_at":"2025-12-01T22:21:46.000Z","size":332,"stargazers_count":73,"open_issues_count":4,"forks_count":5,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-12-07T04:52:20.054Z","etag":null,"topics":["plagiarism-detection","program-analysis","sbom","sbom-tool"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"agpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/trailofbits.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":"CODEOWNERS","security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-04-02T20:30:44.000Z","updated_at":"2025-12-01T17:01:22.000Z","dependencies_parsed_at":"2025-07-17T21:10:36.239Z","dependency_job_id":"23a446fe-4821-4e6b-bf1f-d22754161d96","html_url":"https://github.com/trailofbits/vendetect","commit_stats":null,"previous_names":["trailofbits/vendetect"],"tags_count":3,"template":false,"template_full_name":null,"purl":"pkg:github/trailofbits/vendetect","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/trailofbits%2Fvendetect","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/trailofbits%2Fvendetect/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/trailofbits%2Fvendetect/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/trailofbits%2Fvendetect/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/trailofbits","download_url":"https://codeload.github.com/trailofbits/vendetect/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/trailofbits%2Fvendetect/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":27719082,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-12-14T02:00:11.348Z","response_time":56,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["plagiarism-detection","program-analysis","sbom","sbom-tool"],"created_at":"2025-12-14T06:03:24.765Z","updated_at":"2025-12-14T06:03:25.749Z","avatar_url":"https://github.com/trailofbits.png","language":"Python","funding_links":[],"categories":["Dependency intelligence"],"sub_categories":[],"readme":"# Vendetect\n\n\u003c!--- BADGES: START ---\u003e\n[![CI](https://github.com/trailofbits/vendetect/actions/workflows/tests.yml/badge.svg)](https://github.com/trailofbits/vendetect/actions/workflows/tests.yml)\n[![PyPI version](https://badge.fury.io/py/vendetect.svg)](https://pypi.org/project/vendetect)\n[![Packaging status](https://repology.org/badge/tiny-repos/python:vendetect.svg)](https://repology.org/project/python:vendetect/versions)\n\u003c!--- BADGES: END ---\u003e\n\nA command-line tool for automatically detecting vendored and copy/pasted code between repositories.\n\n## Description 🧑‍🎓\n\nVendetect helps identify copied or vendored code between repositories, making it easier to detect when code has been copied with or without attribution. The tool uses similarity detection algorithms to compare code files and highlight matching sections.\n\nKey features:\n- Compare code between two repositories (local or remote)\n- Analyze specific subdirectories within repositories\n- Identify files with similar code and display them side-by-side\n- Show similarity percentages for matched code\n- Filter by file types and adjust similarity thresholds\n- Support for different programming languages through Pygments lexers\n- Similarity is _not_ solely based upon symbol names; vendetect also considers semantics\n\n## Installation 🚀\n\n### Using pip\n\n```bash\npip install vendetect\n```\n\n### Using [uv](https://docs.astral.sh/uv/guides/tools/)\n\n```bash\nuv tool install vendetect\n```\n\n### From source\n\nClone the repository and install:\n\n```bash\ngit clone https://github.com/trailofbits/vendetect.git\ncd vendetect\nuv tool install .\n```\n\n### Development installation\n\nFor development with all dependencies:\n\n```bash\ngit clone https://github.com/trailofbits/vendetect.git\ncd vendetect\nuv sync --group dev\nsource .venv/bin/activate\n```\n\n## Usage 🏃\n\n### Basic usage\n\n```bash\nvendetect TEST_REPO SOURCE_REPO\n```\n\nWhere:\n- `TEST_REPO`: Path or URL to the repository you want to check for copied code\n- `SOURCE_REPO`: Path or URL to the repository that is the potential source of the code\n\n### Examples\n\n```bash\n# Compare two local repositories\nvendetect /path/to/my/project /path/to/another/project\n\n# Compare a local project with a remote repository\nvendetect /path/to/my/project https://github.com/example/repo.git\n\n# Compare only specific subdirectories within repositories\nvendetect /path/to/my/project https://github.com/example/repo.git \\\n  --test-subdir src/components \\\n  --source-subdir lib/ui\n\n# Filter by file types and adjust similarity threshold\nvendetect /path/to/my/project /path/to/another/project \\\n  --type py --type js \\\n  --min-similarity 0.8\n```\n\n### Options\n\n```\n--format FORMAT              Output format: rich, csv, or json (default=rich)\n--output OUTPUT              Output file path (default: stdout)\n--force                      Force overwrite of existing output file\n--type FILE_TYPES, -t        File extension to consider (can be used multiple times)\n--min-similarity THRESHOLD   Minimum similarity threshold (range: 0.0-1.0, default: 0.5)\n--test-subdir DIR, -ts       Subdirectory within TEST_REPO to analyze\n--source-subdir DIR, -ss     Subdirectory within SOURCE_REPO to analyze\n--incremental                Enable incremental result reporting\n--batch-size SIZE            Number of files to process per batch (default: 100)\n--max-history-depth DEPTH    Maximum commit history depth (default: -1 = entire history)\n--log-level LEVEL            Sets the log level (default=INFO)\n--debug                      Equivalent to --log-level=DEBUG\n--quiet                      Equivalent to --log-level=CRITICAL\n```\n\n### Advanced Features\n\n#### Subdirectory Analysis\nWhen working with large repositories, you can focus analysis on specific subdirectories:\n\n```bash\n# Analyze only the src/ directory in both repositories\nvendetect /path/to/my/project /path/to/another/project \\\n  --test-subdir src --source-subdir src\n\n# Compare frontend code in one repo with backend in another\nvendetect /path/to/frontend-repo /path/to/backend-repo \\\n  --test-subdir client/src --source-subdir server/utils\n```\n\nThis is particularly useful for:\n- Focusing on relevant code sections\n- Reducing analysis time for large repositories\n- Comparing similar modules across different project structures\n\n#### File Type Filtering\nControl which files are analyzed by specifying file extensions:\n\n```bash\n# Only analyze Python files\nvendetect /path/to/my/project /path/to/another/project --type py\n\n# Analyze multiple file types\nvendetect /path/to/my/project /path/to/another/project --type py --type js --type ts\n```\n\n#### Similarity Thresholds\nAdjust the minimum similarity threshold to filter results:\n\n```bash\n# Show only high-confidence matches (80% similarity or higher)\nvendetect /path/to/my/project /path/to/another/project --min-similarity 0.8\n\n# Show all potential matches (lower threshold)\nvendetect /path/to/my/project /path/to/another/project --min-similarity 0.3\n```\n\n### Output Formats\n\nVendetect supports three output formats:\n\n1. **rich** (default): Interactive console output with syntax highlighting and side-by-side code comparison\n2. **csv**: Comma-separated values format with columns for Test File, Source File, Test Slice Start, Test Slice End, Source Slice Start, Source Slice End, and Similarity\n3. **json**: JSON format with detailed information about each detection, including file paths, similarity scores, and matched code slices\n\nExample using CSV output:\n```bash\nvendetect /path/to/my/project /path/to/another/project --format csv --output results.csv\n```\n\nExample using JSON output:\n```bash\nvendetect /path/to/my/project /path/to/another/project --format json --output results.json\n```\n\n## How it works 🧐\n\nVendetect uses a combination of techniques to identify similar code:\n\n1. It fingerprints all source code files in both repositories based upon their semantics rather than syntax\n2. For each file pair, it computes a similarity score\n3. It identifies specific sections (slices) of code that match between files\n4. Results are presented in a rich output format with side-by-side comparison\n\nThe tool can handle:\n- Local file system repositories\n- Git repositories (with history support)\n- Remote git repositories (automatically cloned for analysis)\n\n## Requirements 🛒\n\n- Python 3.11 or higher\n- Git (optional, for repository history analysis)\n\n## Contributing 🧑‍💻\n\nContributions are welcome! Check out the [issues](https://github.com/trailofbits/vendetect/issues) for ideas on where to start.\n\n### Development setup\n\n```bash\n# Install development dependencies\nuv sync --group dev\n\n# Source virtual env\nsource .venv/bin/activate\n\n# Run tests\npytest\n\n# Lint code\nruff check\n\n# Type checking\nmypy\n```\n\n## Contact 💬\n\nIf you'd like to file a bug report or feature request, please use our\n[issues](https://github.com/trailofbits/deptective/issues) page.\nFeel free to contact us or reach out in\n[Empire Hacking](https://slack.empirehacking.nyc/) for help using or extending Vendetect.\n\n## License 📝\n\nThis utility was developed by [Trail of Bits](https://www.trailofbits.com/).\n\nThis program is free software: you can redistribute it and/or modify\nit under the terms of the [GNU Affero General Public License](LICENSE) as published\nby the Free Software Foundation, either version 3 of the License, or\n(at your option) any later version.\n\nThis program is distributed in the hope that it will be useful,\nbut WITHOUT ANY WARRANTY; without even the implied warranty of\nMERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the\nGNU Affero General Public License for more details.\n\nYou should have received a copy of the GNU Affero General Public License\nalong with this program.  If not, see \u003chttps://www.gnu.org/licenses/\u003e.\n\n[Contact us](mailto:opensource@trailofbits.com) if you're looking for an\nexception to the terms.\n\n© 2025, Trail of Bits.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftrailofbits%2Fvendetect","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ftrailofbits%2Fvendetect","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftrailofbits%2Fvendetect/lists"}