{"id":29666573,"url":"https://github.com/pkharsimran/ioc-inspector","last_synced_at":"2026-02-05T08:02:21.387Z","repository":{"id":303262907,"uuid":"1014909952","full_name":"PKHarsimran/IOC-Inspector","owner":"PKHarsimran","description":"Fast, SOC‑ready malicious document scanner that turns suspicious PDFs, DOC(X), XLS(X), and RTFs into IOC‑rich, SIEM‑friendly reports.","archived":false,"fork":false,"pushed_at":"2025-07-12T18:02:54.000Z","size":329,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-07-22T15:50:59.379Z","etag":null,"topics":["abuseipdb","cli","cybersecurity","ioc","malware-analysis","office-macros","pdf-analysis","python","soc-tools","static-analysis","threat-intelligence","virus-total"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/PKHarsimran.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-07-06T16:39:16.000Z","updated_at":"2025-07-16T10:41:44.000Z","dependencies_parsed_at":"2025-07-06T17:52:35.078Z","dependency_job_id":null,"html_url":"https://github.com/PKHarsimran/IOC-Inspector","commit_stats":null,"previous_names":["pkharsimran/ioc-inspector"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/PKHarsimran/IOC-Inspector","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/PKHarsimran%2FIOC-Inspector","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/PKHarsimran%2FIOC-Inspector/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/PKHarsimran%2FIOC-Inspector/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/PKHarsimran%2FIOC-Inspector/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/PKHarsimran","download_url":"https://codeload.github.com/PKHarsimran/IOC-Inspector/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/PKHarsimran%2FIOC-Inspector/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":29116450,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-02-05T05:31:32.482Z","status":"ssl_error","status_checked_at":"2026-02-05T05:31:29.075Z","response_time":65,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["abuseipdb","cli","cybersecurity","ioc","malware-analysis","office-macros","pdf-analysis","python","soc-tools","static-analysis","threat-intelligence","virus-total"],"created_at":"2025-07-22T15:38:08.669Z","updated_at":"2026-02-05T08:02:21.377Z","avatar_url":"https://github.com/PKHarsimran.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# IOC Inspector 🕵️‍♂️\n[![CI](https://github.com/PKHarsimran/IOC-Inspector/actions/workflows/ci.yml/badge.svg)](https://github.com/PKHarsimran/IOC-Inspector/actions/workflows/ci.yml)\n[![Lint \u0026 Type-check](https://github.com/PKHarsimran/IOC-Inspector/actions/workflows/lint.yml/badge.svg?branch=main)](https://github.com/PKHarsimran/IOC-Inspector/actions/workflows/lint.yml)  \n[![Codecov](https://codecov.io/gh/PKHarsimran/IOC-Inspector/branch/main/graph/badge.svg?token=F7IJ44D5AC)](https://codecov.io/gh/PKHarsimran/IOC-Inspector)\n[![License: MIT](https://img.shields.io/github/license/PKHarsimran/IOC-Inspector.svg)](LICENSE)\n![Python](https://img.shields.io/badge/python-3.10%20|%203.11-blue)\n[![GitHub release](https://img.shields.io/github/v/release/PKHarsimran/IOC-Inspector)](https://github.com/PKHarsimran/IOC-Inspector/releases)\n[![Security](https://img.shields.io/badge/security-policy-important)](SECURITY.md)\n![Contributions welcome](https://img.shields.io/badge/contributions-welcome-brightgreen.svg)\n\n**Fast, SOC-ready malicious-document scanner** — turn suspicious PDFs, DOC(X), XLS(X) \u0026 RTFs into IOC-rich, SIEM-friendly reports.\n\n---\n\n## ✅ What's New\n- Cross-platform CI with **Linux + Windows** and **Python 3.10/3.11** support\n- Improved parser error handling with custom `ParserError`\n- Dynamic API key loading for test reliability\n- Coverage-gated CI with **\u003e80%** unit test coverage\n- Final README polish ✨\n- Concurrent directory scanning with `--threads`\n\n---\n\n## ⚡ Why IOC Inspector?\n\n| 🔑 | Value to Analysts |\n|----|------------------|\n| **One-command triage** | `ioc-inspector invoice.docx` → instant verdict \u0026 Markdown report |\n| **Actionable scoring** | Custom heuristics blend macro flags, **auto-exec/API hits**, embedded-object metrics and threat-feed look-ups (VirusTotal + AbuseIPDB) into a **0-100 risk score** |\n| **Analyst-first outputs** | Markdown for tickets, JSON / CSV for Splunk \u0026 Elastic |\n| **Runs anywhere** | Linux • Windows • headless in GitHub Actions |\n| **Extensible** | All logic lives in `ioc_inspector_core/` — swap parsers, add feeds, tweak weights |\n\n---\n\n## 🔍 Feature Matrix\n\n| Category            | What you get                                                                                      |\n|---------------------|----------------------------------------------------------------------------------------------------|\n| **Formats**         | PDF • DOC / DOCX • XLS / XLSX • RTF                                                                |\n| **Static Analysis** | Macro dump, **deep auto-exec \u0026 suspicious-API analysis**, obfuscation finder, embedded-object counter |\n| **IOC Extraction**  | URLs • Domains • IPs • Base64 blobs • Hidden links                                                 |\n| **Threat Enrichment** | VirusTotal • AbuseIPDB                                                                      |\n| **Scoring Engine**  | Heuristic weights + rule modifiers (configurable)                                                  |\n| **Reporting**       | Markdown, JSON, CSV, JSONL, HTML\n                                          |\n| **Automation**      | Dir-recursive scan • `--threads` for concurrency • Quiet/Verbose switches • GitHub Actions workflow |\n\n---\n\n## 🚀 Quick Start\n\n```bash\n# 1 – Clone\n$ git clone https://github.com/PKHarsimran/IOC-Inspector.git\n$ cd IOC-Inspector\n\n# 2 – Install (Linux/macOS)\n$ python -m venv venv \u0026\u0026 source venv/bin/activate\n\n# 2 – Install (Windows)\n\u003e python -m venv venv \u0026\u0026 venv\\Scripts\\activate\n\n# 3 – Install requirements\n(venv) $ pip install -r requirements.txt\n\n# 4 – Set up API keys\n(venv) $ cp .env.example .env\n(venv) $ nano .env    # Add your VT_API_KEY \u0026 ABUSEIPDB_API_KEY\n\n# 5 – Run\n(venv) $ python main.py --file examples/sample_invoice.docx --report\n```\n\n\u003cdetails\u003e\u003csummary\u003eExample Output\u003c/summary\u003e\nexamples/sample_invoice.docx: score=45 verdict=suspicious  \nSee reports/sample_invoice_report.md for full IOC tables.\n\u003c/details\u003e\n\n---\n\n## ⚙️ Configuration Highlights (settings.py)\n```python\nRISK_WEIGHTS = {\n    \"macro\":          25,   # any VBA present\n    \"autoexec\":       15,   # AutoOpen / Document_Open …\n    \"obfuscation\":    20,   # long Base-64 blobs, XOR strings\n    \"susp_call\":       5,   # CreateObject, Shell … (×3 capped at 15)\n    \"malicious_url\":  30,   # VirusTotal consensus\n    \"malicious_ip\":   25,   # AbuseIPDB ≥ confidence cutoff\n}\n\nVT_THRESHOLD            = 5    # vendors that must flag URL/IP malicious\nABUSE_CONFIDENCE_CUTOFF = 70   # AbuseIPDB confidence to flag IP\nREPORT_FORMATS          = [\"markdown\", \"json\"]\n```\n\n🗂️ Repository Layout\n```text\nioc-inspector/\n├── ioc_inspector_core/         ← all analysis logic\n│   ├── __init__.py\n│   ├── pdf_parser.py\n│   ├── doc_parser.py\n│   ├── macro_analyzer.py       ← deep VBA heuristics\n│   ├── url_reputation.py\n│   ├── abuseipdb_check.py\n│   ├── heuristics.py\n│   └── report_generator.py\n│\n├── logger.py\n├── main.py\n├── settings.py\n│\n├── examples/\n├── reports/        (git-ignored)\n├── logs/           (git-ignored)\n│\n├── tests/\n└── requirements.txt\n```\n---\n\n## 📦 Dependencies at a Glance\n\n| Category | Package | Why it’s needed |\n|----------|---------|-----------------|\n| Core     | `oletools`, `pdfminer.six`, `PyMuPDF`, `requests`, `python-dotenv`, `tldextract` | Parsing, enrichment, API config |\n| Reporting| *(builtin)* | Markdown/JSON/CSV/JSONL/HTML rendering |\n| Optional | `tabulate`, `rich`, `jinja2` | Pretty console output, HTML reports |\n\n---\n\n### 🗺️ How the code flows\n\n```mermaid\nflowchart TD\n    CLI[\"CLI (main.py)\"] --\u003e DISPATCH[\"Dispatcher (__init__.analyze)\"]\n\n    subgraph \"Parsers\"\n        DISPATCH --\u003e PDF[\"pdf_parser.py\"]\n        DISPATCH --\u003e OFFICE[\"doc_parser.py\"]\n        OFFICE --\u003e MACRO[\"macro_analyzer.py\"]\n    end\n\n    PDF --\u003e ENRICH\n    MACRO --\u003e ENRICH\n    subgraph \"Reputation enrichment\"\n        ENRICH --\u003e VT[\"url_reputation.py\"]\n        ENRICH --\u003e ABIP[\"abuseipdb_check.py\"]\n    end\n\n    ENRICH --\u003e SCORE[\"heuristics.py\"]\n    SCORE --\u003e REPORT[\"report_generator.py\"]\n    SCORE --\u003e LOG[\"logger.py\"]\n    REPORT --\u003e OUTPUT[\"Markdown / JSON\"]\n```\n\n**What happens step-by-step**\n\n| Stage | Module | Job |\n|-------|--------|-----|\n| **CLI** | `main.py` | Reads flags, builds file list, prints a headline. |\n| **Dispatcher** | `ioc_inspector_core/__init__.py` | Routes each file to the right parser. |\n| **Parsers** | `pdf_parser.py` \u0026 `doc_parser.py` | Extract URLs, IPs, macros, embeds, JavaScript. |\n| **Enrichment** | `url_reputation.py`, `abuseipdb_check.py` | Query VirusTotal \u0026 AbuseIPDB; attach verdicts. |\n| **Scoring** | `heuristics.py` | Apply weights, produce 0-100 risk score \u0026 verdict. |\n| **Reporting** | `report_generator.py` | Write Markdown + JSON with IOC tables. |\n| **Logging** | `logger.py` | Console + rotating file breadcrumbs for every stage. |\n\n---\n\n## 📊 Coverage \u0026 Reliability\n- ✅ **\u003e80% test coverage** (enforced in CI)\n- ✅ Coverage badge + reports via Codecov\n- ✅ Works on **Linux and Windows** runners\n- ✅ CLI smoke test validates API usage and report generation\n\n---\n# 🛣️ Roadmap to v1.0.0\n\nThis outlines the path for taking IOC Inspector from a solid prototype (v0.1.0) to a polished, production-ready v1.0.0 release.\n\n---\n\n## ✅ Phase 1: Foundation (v0.1.0 – Done)\n- [x] Static IOC extraction: PDF, DOCX, XLSX, RTF\n- [x] Threat enrichment: VirusTotal + AbuseIPDB\n- [x] Heuristic-based scoring engine\n- [x] Markdown + JSON reporting\n- [x] Command-line interface with flags (`--report`, `--quiet`, etc.)\n- [x] Cross-platform CI (Linux + Windows)\n- [x] 80%+ test coverage with CLI smoke tests\n- [x] Final README polish and first release tag\n\n---\n\n## 🚧 Phase 2: Stability \u0026 Feedback (`v0.2.x`)\nFocus: Hardening the product \u0026 improving feedback loop\n\n### Technical Improvements\n- [x] JSON schema validation for report output\n- [ ] Improve error messaging with file context (e.g., filetype, parser used)\n- [ ] Separate reporting logic from CLI to enable more formats\n\n### Developer Experience\n- [x] Add `make test`, `make lint`, `make run` shortcuts\n- [ ] Add GitHub Discussions or feedback template\n- [ ] Incorporate feedback from test users\n\n---\n\n## ✨ Phase 3: Export \u0026 Integrations (`v0.3.x`)\nFocus: SIEM-friendliness \u0026 analyst use\n\n- [x] CSV export for Splunk or Excel\n- [x] JSONL support for batch pipelines\n- [x] HTML export with embedded styles\n- [ ] Normalize field naming for ingestion (e.g. `ioc.type`, `ioc.source`)\n- [ ] (Optional) Tag known MITRE ATT\u0026CK techniques from enriched IOCs\n\n---\n\n## 🚀 Phase 4: Productionization (`v0.9.x`)\nFocus: Distribution \u0026 packaging polish\n\n- [ ] Publish to PyPI for `pipx` install\n- [ ] Provide Docker image with CLI entrypoint\n- [ ] Build Windows binary via PyInstaller\n- [ ] Automate changelogs \u0026 releases via GitHub Actions\n- [ ] Use SemVer auto-tagging (`release-please`)\n\n---\n\n## 🏁 v1.0.0 Criteria\nIOC Inspector will be tagged v1.0.0 when:\n\n- [ ] All supported formats parse reliably with test coverage\n- [ ] JSON / Markdown / CSV output is schema-stable\n- [ ] Test coverage is \u003e90%\n- [ ] CLI is frictionless and documented\n- [ ] Docker + PyPI builds work out-of-box\n- [ ] Users validate usefulness via feedback\n\n---\n\n## 🧩 Post-1.0 Ideas\nOptional features to consider post-v1.0:\n\n- [ ] Ntfy/webhook notifications for batch runs\n- [ ] Web UI using Streamlit or Flask\n- [ ] Threat feed exporter (e.g. to MISP or CSV dump)\n- [ ] Language support for French / Spanish SOC teams\n\n---\n\n💬 Questions? Feedback? File an [Issue](https://github.com/PKHarsimran/IOC-Inspector/issues) or start a discussion.\n\n---\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpkharsimran%2Fioc-inspector","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fpkharsimran%2Fioc-inspector","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpkharsimran%2Fioc-inspector/lists"}