{"id":50411763,"url":"https://github.com/calvin1989/skillflowguard","last_synced_at":"2026-05-31T04:02:24.248Z","repository":{"id":359705489,"uuid":"1247179651","full_name":"Calvin1989/SkillFlowGuard","owner":"Calvin1989","description":"Workflow-level security auditor for cross-skill risks in agent skill ecosystems.","archived":false,"fork":false,"pushed_at":"2026-05-23T04:18:22.000Z","size":22,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-05-23T04:23:54.137Z","etag":null,"topics":["agent-security","ai-safety","llm-security","prompt-injection","python","security-tools","workflow-security"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Calvin1989.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-05-23T01:54:02.000Z","updated_at":"2026-05-23T04:18:26.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/Calvin1989/SkillFlowGuard","commit_stats":null,"previous_names":["calvin1989/skillflowguard"],"tags_count":3,"template":false,"template_full_name":null,"purl":"pkg:github/Calvin1989/SkillFlowGuard","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Calvin1989%2FSkillFlowGuard","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Calvin1989%2FSkillFlowGuard/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Calvin1989%2FSkillFlowGuard/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Calvin1989%2FSkillFlowGuard/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Calvin1989","download_url":"https://codeload.github.com/Calvin1989/SkillFlowGuard/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Calvin1989%2FSkillFlowGuard/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":33718446,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-05-31T02:00:06.040Z","response_time":95,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["agent-security","ai-safety","llm-security","prompt-injection","python","security-tools","workflow-security"],"created_at":"2026-05-31T04:02:20.441Z","updated_at":"2026-05-31T04:02:24.242Z","avatar_url":"https://github.com/Calvin1989.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# SkillFlowGuard\n\n[![Tests](https://github.com/Calvin1989/SkillFlowGuard/actions/workflows/tests.yml/badge.svg)](https://github.com/Calvin1989/SkillFlowGuard/actions/workflows/tests.yml)\n\nA lightweight workflow-level security auditor for agent skill ecosystems.\n\nSkillFlowGuard detects risks that emerge when individually reasonable skills are composed into a workflow, such as recommendation chains, artifact handoffs, permission escalation, and hidden natural-language coordination signals.\n\n---\n\n## Why SkillFlowGuard?\n\nModern agent systems often compose multiple tools, skills, or graph nodes into a single workflow. A single skill may look safe in isolation, but the workflow can become risky when:\n\n- one skill recommends another downstream skill,\n- one skill writes an artifact that a later skill reads,\n- a workflow moves from local-only access to network access,\n- natural-language instructions imply hidden handoffs or execution coupling.\n\nSkillFlowGuard audits these cross-skill relationships before execution.\n\n---\n\n## Features\n\n- Workflow-level risk detection across composed agent skills\n- Structured, document, and optional LLM-assisted analysis\n- Generic JSON and LangGraph-style workflow import adapters\n- Text, JSON, dashboard-style HTML, SARIF, and GitHub Code Scanning output\n- CI-oriented controls: `--fail-on`, `--min-level`, and TOML config\n- Config-based suppressions for accepted findings with required reasons\n- Baseline comparison for CI workflows that should fail only on new findings\n- Stable finding fingerprints for baseline comparison and SARIF integrations\n- Synthetic evaluation benchmark with pytest and GitHub Actions coverage\n\n---\n\n## Installation\n\n```bash\ngit clone https://github.com/Calvin1989/SkillFlowGuard.git\ncd SkillFlowGuard\npip install -e .\n```\n\nFor optional LLM support:\n\n```bash\npip install -e \".[llm]\"\n```\n\n---\n\n## Quick Start\n\nAnalyze a workflow:\n\n```bash\nskillflowguard analyze examples/suspicious_chain --extract-doc\n```\n\nGenerate JSON:\n\n```bash\nskillflowguard analyze examples/suspicious_chain --extract-doc --format json\n```\n\nGenerate SARIF for security tooling:\n\n```bash\nskillflowguard analyze examples/suspicious_chain --extract-doc --format sarif --output reports/suspicious.sarif\n```\n\nFilter displayed findings by severity:\n\n```bash\nskillflowguard analyze examples/suspicious_chain --extract-doc --min-level high\n```\n\nGenerate a dashboard-style HTML report:\n\n```bash\nskillflowguard analyze examples/suspicious_chain --extract-doc --format html --output reports/suspicious.html\n```\n\nThe HTML report is a zero-dependency static file with severity filter controls, expandable finding cards, baseline delta section, and suppressed findings display. Open it directly in a browser -- no server or build step needed.\n\nUse a project-level config file:\n\n```bash\nskillflowguard analyze examples/suspicious_chain --config examples/skillflowguard.toml\n```\n\nExample config:\n\n```toml\n[analysis]\nextract_doc = true\nformat = \"text\"\nmin_level = \"medium\"\n\n[[suppressions]]\nrule = \"cross_skill_recommendation\"\nreason = \"Accepted in the sample workflow.\"\n```\n\nInvalid config values are rejected with friendly CLI errors and exit code `2`.\n\nSuppressions hide accepted findings from normal report output while preserving them in JSON under `suppressed_findings`.\n\nSuppression rules are validated against the built-in rule catalog to avoid silent typos.\n\n`--min-level` filters displayed findings only. Risk score, risk level, and `--fail-on` are still based on the full analysis result.\n\n---\n\n## Example Output\n\n```text\nSummary:\n  Findings: 4\n  Risk Score: 0.85\n  Risk Level: HIGH\n  Document Extraction: ON\n  LLM Analysis: OFF\n\nDetected Risks:\n  [MEDIUM] code-review recommends report-exporter, which appears later in the workflow\n  [HIGH] code-review writes [report.json], and report-exporter reads them later\n  [HIGH] report-exporter requests network access after code-review used local-only permissions\n  [CRITICAL] recommendation + artifact dependency + network access appear in one chain\n```\n\n---\n\n## Detection Rules\n\nList built-in rules:\n\n```bash\nskillflowguard rules\nskillflowguard rules --format json\n```\n\n| Rule | Level | Description |\n|---|---:|---|\n| `cross_skill_recommendation` | Medium | A skill recommends another downstream skill. |\n| `workspace_anchor_dependency` | High | A skill writes an artifact that a later skill reads. |\n| `permission_escalation` | High | The workflow moves from local-only permissions to network access. |\n| `description_permission_mismatch` | Medium | A skill claims local/offline behavior but requests network permission. |\n| `combined_high_risk_chain` | Critical | Recommendation, artifact dependency, and network access occur together. |\n| `over_privileged_skill` | Medium | A skill combines read, write, and network privileges, which may increase blast radius if misused. |\n\nRule metadata is centralized and reused by the `rules` CLI command and SARIF report descriptors.\n\n---\n\n## Import External Workflows\n\nGeneric JSON import:\n\n```bash\nskillflowguard import generic-json examples/generic_adapter_input.json --output imported/generic_chain\nskillflowguard analyze imported/generic_chain\n```\n\nLangGraph-style import:\n\n```bash\nskillflowguard import langgraph-style examples/langgraph_style_input.json --output imported/langgraph_chain\nskillflowguard analyze imported/langgraph_chain\n```\n\nThe LangGraph-style adapter supports deterministic graph-style JSON with nodes, edges, and an entrypoint. It does not parse arbitrary LangGraph Python programs.\n\n---\n\n## Optional LLM Analysis\n\nLLM mode extracts subtle semantic signals from `SKILL.md`, such as implicit skill pairing or artifact handoff language.\n\nDefault provider:\n\n```bash\nskillflowguard analyze examples/subtle_chain --llm\n```\n\nOpenAI-compatible provider:\n\n```bash\nskillflowguard analyze examples/subtle_chain --llm \\\n  --llm-provider openai-compatible \\\n  --llm-base-url \u003cprovider-base-url\u003e \\\n  --llm-model \u003cmodel-name\u003e \\\n  --llm-api-key-env \u003cENV_VAR_NAME\u003e\n```\n\n`--llm` sends `SKILL.md` content to the configured provider. Do not use it on sensitive documents unless authorized.\n\n---\n\n## Evaluation\n\nSkillFlowGuard includes a synthetic benchmark under `evaluation/`.\n\n```bash\npython evaluation/run_eval.py --mode structured\npython evaluation/run_eval.py --mode extract-doc\npython evaluation/run_eval.py --mode llm-mock\n```\n\nMarkdown summary:\n\n```bash\npython evaluation/run_eval.py --mode structured --format markdown\n```\n\nCurrent rule-level results on 21 manually labeled synthetic workflow cases:\n\n| Mode | Precision | Recall | F1 |\n|---|---:|---:|---:|\n| `structured` | 1.000 | 0.566 | 0.723 |\n| `extract-doc` | 1.000 | 0.755 | 0.860 |\n| `llm-mock` | 1.000 | 1.000 | 1.000 |\n\n`llm-mock` is deterministic and does not call a real LLM API. The benchmark is synthetic and should not be interpreted as real-world detection performance.\n\n---\n\n## CI and Code Scanning\n\nCI gate example:\n\n```bash\nskillflowguard analyze examples/suspicious_chain --extract-doc --fail-on high\n```\n\nSARIF output can be uploaded to GitHub Code Scanning. See:\n\n- [GitHub Code Scanning guide](docs/github_code_scanning.md)\n- [Example workflow](.github/workflows/skillflowguard-sarif.yml.example)\n\n---\n\n## Baseline Comparison\n\nFor CI workflows that accumulate accepted findings over time, `--baseline` compares the current report against a previously accepted JSON report. Only findings not present in the baseline are considered new.\n\n```bash\n# Generate a baseline report\nskillflowguard analyze examples/suspicious_chain --extract-doc --format json --output baseline.json\n\n# Compare against the baseline, fail only on new high+ findings\nskillflowguard analyze examples/suspicious_chain --extract-doc --baseline baseline.json --fail-on-new high\n```\n\nThe JSON output includes a `baseline` section with `new_findings`, `blocking_new_findings`, and counts.\n\nFinding identity is based on `rule + detail`. `--fail-on-new` supports `medium`, `high`, and `critical` thresholds.\n\nFindings include stable fingerprints for baseline comparison, suppressions, and SARIF `partialFingerprints`.\n\nText reports include baseline comparison counts and list blocking new findings when `--fail-on-new` is used.\n\n---\n\n## Pre-commit Integration\n\nSkillFlowGuard can run as a [pre-commit](https://pre-commit.com/) hook to block risky workflows before they enter your repository.\n\n### From GitHub\n\nAdd to your `.pre-commit-config.yaml`:\n\n```yaml\nrepos:\n  - repo: https://github.com/Calvin1989/SkillFlowGuard\n    rev: v2.14.0\n    hooks:\n      - id: skillflowguard\n        args: [\"examples/suspicious_chain\", \"--extract-doc\", \"--fail-on\", \"high\"]\n```\n\nInstall and run:\n\n```bash\npre-commit install\npre-commit run skillflowguard --all-files\n```\n\n### Local Example\n\nA local config is included for quick demos:\n\n```bash\npre-commit run --config examples/pre_commit/.pre-commit-config.yaml --all-files\n```\n\n---\n\n## Quick Demo\n\nRun the interview demo script to generate sample reports in all formats:\n\n```bash\n# PowerShell\nscripts/demo.ps1\n\n# POSIX shell\nsh scripts/demo.sh\n```\n\n### Localized HTML Reports\n\nHTML reports support localized labels:\n\n```bash\nskillflowguard analyze examples/suspicious_chain --extract-doc --format html --report-language zh --output reports/demo/suspicious.zh.html\n```\n\nRule IDs and fingerprints remain stable across languages.\n\nSee [Interview Demo Kit](docs/interview_demo.md) for a full walkthrough and talking points.\n\n---\n\n## Testing\n\n```bash\npytest\n```\n\nCurrent suite:\n\n```text\n148 passed\n```\n\n---\n\n## Documentation\n\n- [Interview Demo Kit](docs/interview_demo.md)\n- [Demo walkthrough](docs/demo.md)\n- [Design notes](docs/design.md)\n- [Adapter architecture](docs/adapters.md)\n- [LangGraph-style adapter](docs/langgraph_adapter.md)\n- [GitHub Code Scanning](docs/github_code_scanning.md)\n- [Chinese README](docs/README_zh.md)\n\n---\n\n## Project Structure\n\n```text\nskillflowguard/\n  adapters/\n  loader.py\n  doc_parser.py\n  llm_doc_parser.py\n  rule_metadata.py\n  rules.py\n  analyzer.py\n  report.py\n  config.py\n  cli.py\n\nexamples/\nevaluation/\ntests/\ndocs/\n```\n\n---\n\n## Roadmap\n\n- [x] Stable workflow-level risk analysis CLI\n- [x] Optional LLM-assisted semantic extraction\n- [x] Synthetic benchmark and evaluation reports\n- [x] Generic JSON and LangGraph-style import adapters\n- [x] SARIF output and GitHub Code Scanning integration\n- [x] Centralized rule metadata and rule catalog CLI\n- [x] Project-level TOML config support\n- [x] Config validation and friendlier error messages\n- [x] Suppression / allowlist support for known accepted risks\n- [x] Baseline comparison for new-risk CI gating\n- [x] Baseline text report summary for CI explainability\n- [x] Stable finding fingerprints for baseline and SARIF identity\n- [x] Dashboard-style HTML report with severity filters and finding cards\n- [x] Pre-commit framework integration\n\n---\n\n## Changelog\n\nSee [CHANGELOG.md](CHANGELOG.md) for release history.\n\n---\n\n## License\n\nMIT License.","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcalvin1989%2Fskillflowguard","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcalvin1989%2Fskillflowguard","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcalvin1989%2Fskillflowguard/lists"}