{"id":25526044,"url":"https://github.com/exfil0/pdfdisarm","last_synced_at":"2026-01-05T18:30:13.635Z","repository":{"id":277070281,"uuid":"931238167","full_name":"exfil0/PDFdisarm","owner":"exfil0","description":"Advanced PDF Analysis \u0026 Disarm Tool is a robust Python-based utility designed to scan, analyze, and neutralize potentially malicious elements in PDF files.","archived":false,"fork":false,"pushed_at":"2025-02-12T00:09:14.000Z","size":20,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-02-12T01:23:27.051Z","etag":null,"topics":["analysis","cybersecurity","malware","pdf","pdfdisarm","pyhton3","security","threatdetection"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/exfil0.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2025-02-12T00:06:11.000Z","updated_at":"2025-02-12T00:13:22.000Z","dependencies_parsed_at":"2025-02-12T01:23:28.905Z","dependency_job_id":"be69f829-9692-4bf8-a31c-e321581fef4a","html_url":"https://github.com/exfil0/PDFdisarm","commit_stats":null,"previous_names":["exfil0/pdfdisarm"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/exfil0%2FPDFdisarm","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/exfil0%2FPDFdisarm/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/exfil0%2FPDFdisarm/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/exfil0%2FPDFdisarm/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/exfil0","download_url":"https://codeload.github.com/exfil0/PDFdisarm/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":239735263,"owners_count":19688262,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["analysis","cybersecurity","malware","pdf","pdfdisarm","pyhton3","security","threatdetection"],"created_at":"2025-02-19T21:16:04.968Z","updated_at":"2026-01-05T18:30:13.577Z","avatar_url":"https://github.com/exfil0.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Advanced PDF Analysis \u0026 Disarm Tool\n\n## Overview\n\nThis tool scans, analyzes, and optionally “disarms” PDF files. It provides:\n\n- **PDF Structure Analysis**: Detects keywords, calculates entropy, and identifies malicious indicators such as embedded JavaScript and launch actions.\n- **Concurrency**: Uses Python’s ThreadPoolExecutor to process multiple files in parallel.\n- **Disarm Mode**: Generates a `\u003cfilename\u003e.disarmed.pdf` that strips or obfuscates dangerous elements like `/JS`, `/JavaScript`, `/Launch`, etc.\n- **Directory Recursion**: Gathers files from a given directory or directories, optionally recursing.\n- **Plugin Architecture**: Supports loading custom plugins for scoring or additional checks.\n- **Selection Expressions**: Allows filtering results (e.g., show only PDFs with certain suspicious counts).\n- **Multiple Output Formats**: \n  - Human-readable console output\n  - CSV format (one line per file)\n  - JSON export function (PDFiD2JSON) available for custom usage\n\nUse it at your own risk.\n\n---\n\n## Installation\n\n1. **Requirements**:\n   - Python 3.7+ (recommended)\n   - Optional library: `pyzipper` for AES-encrypted ZIP support\n   - Standard libraries: `argparse`, `concurrent.futures`, `urllib.request`, etc. (included in most Python installs)\n\n2. **Clone or Download** this script:\n   ```bash\n   git clone https://github.com/exfil0/PDFdisarm.git\n   ```\n   *(If this is just an example—use your preferred distribution method.)*\n\n3. **Make It Executable (Linux/Mac)**:\n   ```bash\n   chmod +x pdfscan.py\n   ```\n\n4. **(Optional) Install pyzipper**:\n   ```bash\n   pip install pyzipper\n   ```\n\n---\n\n## Usage\n\n### Basic Command\n\n```bash\n./pdfscan.py \u003cfile1.pdf\u003e \u003cfile2.pdf\u003e ...\n```\n- Analyzes each file and prints detailed results to the console.\n\n### Wildcards and Directory Recursion\n\n```bash\n./pdfscan.py /path/to/pdfs -r\n```\n- Recursively scans all files under `/path/to/pdfs`.\n\n### Disarm Mode\n\n```bash\n./pdfscan.py malicious.pdf --disarm\n```\n- Creates `malicious.disarmed.pdf` with potentially malicious elements neutralized.\n\n### CSV Output\n\n```bash\n./pdfscan.py /path/to/pdfs -r --csv -o results.csv\n```\n- Outputs a single CSV with all scan results, one row per file.\n- If `-o` is not specified, CSV goes to `stdout`.\n\n### Selecting Files by Condition\n\n```bash\n./pdfscan.py *.pdf --select=\"pdf.js.count \u003e 0\"\n```\n- Only shows results for files where the JavaScript (`/JS`) count is greater than zero.\n\n### Plugin Usage\n\n```bash\n./pdfscan.py suspicious.pdf --plugins=MyPlugin.py --csv\n```\n- Loads a custom plugin (`MyPlugin.py`) which can provide additional scoring or checks.\n\n### Threading\n\n```bash\n./pdfscan.py /path/to/pdfs --threads 8\n```\n- Uses 8 worker threads to speed up scanning across many files.\n\n---\n\n## Command-Line Options\n\n- **`files`** (positional):\n  - One or more file paths, directory paths, or wildcard patterns.\n- **`-r, --recursedir`**: Recurse into subdirectories when a directory is provided.\n- **`-o, --output`**: Specify output file (CSV only).\n- **`--all`**: Show all recognized PDF keywords (even non-standard ones).\n- **`--extra`**: Collect extra data such as dates and entropy.\n- **`--force`**: Force scanning even if the PDF header is missing.\n- **`--disarm`**: Write a disarmed copy of each PDF as `\u003cfilename\u003e.disarmed.pdf`.\n- **`--select`**: Python expression to filter results, e.g. `pdf.js.count\u003e0`.\n- **`--nozero`**: Suppress printing zero counts in console output.\n- **`--threads`**: Number of parallel worker threads (default=4).\n- **`--scan`**: Legacy option, similar to scanning a directory.\n- **`--plugins`**: Comma-separated list of plugin `.py` files to load.\n- **`--pluginoptions`**: Additional string to pass to plugins.\n- **`--csv`**: Output results to CSV (to file if `-o` is specified, else stdout).\n- **`--minimumscore`**: Only show files or plugin results that meet or exceed this numeric score.\n- **`--verbose`**: Print detailed tracebacks on errors.\n\n---\n\n## Example Workflows\n\n1. **Single File Quick Scan**\n   ```bash\n   ./pdfscan.py mydocument.pdf\n   ```\n   Displays a detailed report (keywords, potential malicious actions) in the console.\n\n2. **Multiple PDFs, CSV Output**\n   ```bash\n   ./pdfscan.py /opt/pdfs/*.pdf --csv -o results.csv\n   ```\n   Gathers results in `results.csv`, easy to import into Excel.\n\n3. **Full Directory Disarm**\n   ```bash\n   ./pdfscan.py /opt/malware-pdfs -r --disarm\n   ```\n   Recursively generates `*.disarmed.pdf` copies.\n\n---\n\n## Plugin Notes\n\n- **Plugin Classes** must subclass `cPluginParent`.\n- The script automatically discovers plugin classes from the loaded files.\n- Each plugin typically implements a `Score()` method returning a numeric score.\n\n---\n\n## Disclaimer\n\nAuthored and maintained by **Exfil0**.  \n**No warranties** are provided. Use at your own risk.\n\nFeel free to adapt and redistribute **with attribution** to Exfil0.\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fexfil0%2Fpdfdisarm","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fexfil0%2Fpdfdisarm","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fexfil0%2Fpdfdisarm/lists"}