{"id":50319393,"url":"https://github.com/duriantaco/ceres","last_synced_at":"2026-05-29T02:08:29.666Z","repository":{"id":359792159,"uuid":"1241563770","full_name":"duriantaco/ceres","owner":"duriantaco","description":" Static AI security scanner for models, datasets, RAG, prompts, agent tools, MCP, and AI supply   chain.","archived":false,"fork":false,"pushed_at":"2026-05-23T13:01:02.000Z","size":122,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-05-23T14:26:52.554Z","etag":null,"topics":["ai-security","llm-security","mcp","ml-security","python","rag-security","sast","security-tools","supply-chain-security","tool-security"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/duriantaco.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":"docs/roadmap.md","authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-05-17T14:48:23.000Z","updated_at":"2026-05-23T13:01:13.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/duriantaco/ceres","commit_stats":null,"previous_names":["duriantaco/ceres"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/duriantaco/ceres","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/duriantaco%2Fceres","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/duriantaco%2Fceres/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/duriantaco%2Fceres/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/duriantaco%2Fceres/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/duriantaco","download_url":"https://codeload.github.com/duriantaco/ceres/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/duriantaco%2Fceres/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":33633475,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-05-29T02:00:06.066Z","response_time":107,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai-security","llm-security","mcp","ml-security","python","rag-security","sast","security-tools","supply-chain-security","tool-security"],"created_at":"2026-05-29T02:08:28.801Z","updated_at":"2026-05-29T02:08:29.653Z","avatar_url":"https://github.com/duriantaco.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003cp align=\"center\"\u003e\n  \u003cimg src=\"assets/ceres-mark.svg\" alt=\"Ceres logo\" width=\"96\" height=\"96\"\u003e\n\u003c/p\u003e\n\n# Ceres\n\n**Developer-first AI security scanner.** Ceres is AI-SAST for repos: it inspects\nyour code, prompts, configs, model artifacts, datasets, RAG docs, and AI\nsupply chain for the security issues that traditional SAST/SCA tools miss. It\nruns locally, in pre-commit, and in CI.\n\n```text\nceres scan .\n```\n\n## What Ceres checks\n\n| Layer       | Examples |\n|-------------|----------|\n| Code        | `trust_remote_code=True`, `pickle.load`, `torch.load` without `weights_only=True`, `eval`/`exec`, unrestricted agent tools, risky tools without approval, poisoned tool/MCP descriptions |\n| Models      | `.pkl`/`.pickle` artifacts, unsafe formats, unknown source/provenance, suspicious pickle opcodes, missing/changed SHA-256, safetensors tensor/layer drift, GGUF/ONNX metadata drift, NaN/Inf/range anomalies, tokenizer / chat-template / LoRA-base drift |\n| Datasets    | missing manifest, missing/stale hash, source not in allowlist, duplicate-rate spikes, label distribution drift vs. baseline, sudden rare-trigger trigrams |\n| Eval/safety | disabled safety or regression eval gates, lowered safety thresholds, disabled filters/guardrails, high generation temperature |\n| RAG corpus  | prompt-injection phrases (`ignore previous instructions`, etc.), unsafe user-doc indexing, missing retrieval filters, permission checks after retrieval, hidden HTML / display:none, HTML comments with instructions, zero-width / bidi control chars, large base64 blobs |\n| Prompts     | user input templated into system context; optional inline secret checks when explicitly enabled |\n| Supply chain| unpinned Hugging Face model references in configs, unpinned Git dependencies, missing lockfiles, unpinned Docker images, remote install scripts, optional generic dependency pin checks, `pip-audit` results normalized into Ceres findings; `gitleaks` only when explicitly enabled |\n| AI-BOM      | warns when models/datasets are present but no `ai-bom.json` exists |\n\nFull docs:\n\n- [Docs index](docs/index.md)\n- [Rule catalog](docs/rules.md)\n- [Model security and tensor scanning](docs/model-security.md)\n\nCeres **never imports model files**. Model artifacts are inspected statically\n(pickle opcode decoding only, no `__reduce__` execution) with a 64 MB hard cap.\n\n## Install\n\n```bash\npip install ceres-scanner\n# or, from this repo:\npip install -e .\n```\n\nOptional integrations: install [`pip-audit`](https://pypi.org/project/pip-audit/)\nor, if you explicitly want generic secret scanning inside Ceres,\n[`gitleaks`](https://github.com/gitleaks/gitleaks) on `PATH`. Ceres detects\nenabled tools and folds their findings into the same report. If policy\nenables an external scanner but it is missing, Ceres emits a low-severity\n`ceres.supplychain.scanner_unavailable` finding so CI does not silently skip coverage.\n\n## Quick start\n\n```bash\nceres init                       # writes ceres.yml policy\nceres scan .                     # human-readable scan with explanations\nceres scan . --sarif-out out.sarif --json-out out.json\nceres scan . --diff-base origin/main\nceres baseline .                 # snapshot dataset+model+tool metadata -\u003e .ceres/baseline.json\nceres bom . --out ai-bom.json    # Ceres AI-BOM\nceres list-rules                 # show known rule IDs\n```\n\n`scan` exits non-zero when findings at gated severities are present (defaults:\n`critical` and `high` fail; `medium` warns).\n\nThe CLI report groups findings by AI system layer, highlights the first issues\nto review, explains why each issue matters, shows evidence when available, and\nends with the next remediation steps.\n\nUse `--diff-base` in PR checks to scan with full repository context but report\nonly findings on files or lines changed since the base ref.\n\n## Example Use Case\n\nA typical Ceres use case is reviewing a pull request for an AI support agent.\nThe PR changes model loading code, adds a new RAG document, updates a training\ndataset, and touches dependencies.\n\n```bash\nceres scan . --json-out ceres-report.json --sarif-out ceres.sarif\n```\n\nExample findings:\n\n```text\nCRITICAL  ceres.model.loader.remote_code_enabled\n          src/app.py:10\n          Model loader uses trust_remote_code=True.\n\nCRITICAL  ceres.model.artifact.pickle_format\n          models/final.pkl\n          Pickle-based model artifact may execute code during deserialization.\n\nHIGH      ceres.rag.instruction.ignore_context\n          rag/vendor_policy.md:5\n          RAG document contains instruction-like text.\n\nHIGH      ceres.dataset.hash_drift\n          data/train.csv\n          Dataset hash differs from manifest declaration.\n```\n\nFor a local demo from this repository:\n\n```bash\nceres scan examples/vulnerable-ai-repo\nceres scan examples/vulnerable-ai-repo \\\n  --json-out examples/vulnerable-ai-repo/ceres-report.json \\\n  --sarif-out examples/vulnerable-ai-repo/ceres.sarif\nceres bom examples/vulnerable-ai-repo\nceres baseline examples/vulnerable-ai-repo\n```\n\nThe vulnerable example is expected to fail. The clean example should pass:\n\n```bash\nceres scan examples/clean-ai-repo\n```\n\nFor real-world regression testing, run the seeded corpus harness. It copies or\nclones AI repos, injects known-bad model/RAG/agent/data/supply-chain changes,\nand fails if the expected rules do not fire:\n\n```bash\npython scripts/real_world_check.py \\\n  --corpus examples/real-world-corpus.yml \\\n  --workdir /tmp/ceres-real-world \\\n  --json-out /tmp/ceres-real-world/report.json\n```\n\n## Policy\n\n`ceres.yml` controls gates, allowlists, and waivers. The defaults are\nopinionated: `pickle` formats are blocked, `trust_remote_code` is denied, and\ngeneric secret scanning is off by default so Ceres stays focused on AI-model and\nAI-system risk.\n\n```yaml\nseverity_gate:\n  critical: fail\n  high: fail\n  medium: warn\n  low: info\n\nmodel_policy:\n  allowed_formats: [safetensors, onnx, gguf]\n  blocked_formats: [pkl, pickle]\n  require_revision_pin: true\n  allow_trust_remote_code: false\n\nwaivers:\n  - rule_id: ceres.model.loader.remote_code_enabled\n    file: src/research_loader.py\n    reason: \"Research-only script, not shipped\"\n    expires: \"2026-12-01\"\n    approved_by: \"security-team\"\n```\n\nExpired waivers stop suppressing findings *and* are surfaced as a\n`ceres.policy.waiver_expired` finding so they don't quietly rot.\n\n## Baselines\n\n```bash\nceres baseline .\ngit add .ceres/baseline.json\n```\n\nOnce a baseline exists, Ceres compares dataset fingerprints (row count, duplicate\nrate, label distribution, top trigrams), model/tokenizer state, and tool\nmetadata descriptions against it. Drift beyond policy thresholds becomes a\nfinding.\n\n## Model Layer Scanning\n\nCeres should scan model layers and tensors for **poisoning indicators**, but it\nshould not claim that static layer inspection can prove a layer is poisoned.\nBackdoors can be subtle and may only show up under specific triggers or runtime\nbehavior.\n\nCeres currently performs safe `.safetensors` tensor baseline checks without\nimporting model code or loading tensors into memory. It parses the safetensors\nheader, records tensor names, dtypes, shapes, offsets, SHA-256 hashes, and\ncompact numeric stats in the baseline, then compares future scans against that\nbaseline.\n\nImplemented static checks:\n\n- per-tensor SHA-256 hashes compared with a known-good baseline\n- unexpected layer names, missing layers, added layers, or shape changes\n- dtype changes\n- NaN/Inf values and configured absolute-value range anomalies\n- L2 norm drift and sparsity drift compared with baseline\n- GGUF header/metadata/tensor-inventory parsing with architecture, metadata, and\n  tensor-count drift checks\n- ONNX protobuf metadata parsing with opset, graph operator-summary, and model\n  metadata drift checks\n- LoRA adapter metadata changes such as base model mismatch\n- tokenizer, special-token, and chat-template changes that can hide behavior\n  shifts outside obvious weight tensors\n\nPlanned checks:\n\n- cross-layer outlier scoring for tensor families with similar roles\n- deeper ONNX graph-shape and GGUF tokenizer policy inspection\n\nGood finding wording:\n\n```text\nHIGH ceres.model.tensor.norm_drift\nmodels/adapter.safetensors\nLayer \"lm_head.weight\" changed shape and has unusually large norm drift compared\nwith baseline.\n```\n\nRecommended policy: use layer/tensor scanning as a baseline-diff and anomaly\ndetector, then combine it with provenance, signatures, dataset checks, and\ndynamic evaluation before making a poisoning claim.\n\nSee [Model security and tensor scanning](docs/model-security.md) for the\nimplemented model rules, baseline format, and policy knobs.\n\n## CI\n\n```yaml\n# .github/workflows/ceres.yml\nname: Ceres\non: [pull_request, push]\njobs:\n  ceres:\n    runs-on: ubuntu-latest\n    steps:\n      - uses: actions/checkout@v4\n      - uses: actions/setup-python@v5\n        with: { python-version: \"3.11\" }\n      - run: pip install ceres-scanner\n      - run: ceres scan . --sarif-out ceres.sarif\n      - uses: github/codeql-action/upload-sarif@v3\n        if: always()\n        with: { sarif_file: ceres.sarif }\n```\n\n## Pre-commit\n\n```yaml\n# .pre-commit-config.yaml\nrepos:\n  - repo: local\n    hooks:\n      - id: ceres\n        name: Ceres AI security scanner\n        entry: ceres scan . --policy ceres.yml\n        language: system\n        pass_filenames: false\n```\n\n## Status\n\nCeres is a young project. The MVP covers static rules for code, models, data,\nRAG, prompts, and supply chain, plus AI-BOM and baselines. The current product\nfocus is a fast, static, pre-production gate for AI workflow changes.\n\nSee `examples/vulnerable-ai-repo/` for an example that trips most rules and\n`examples/clean-ai-repo/` for a quiet baseline.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fduriantaco%2Fceres","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fduriantaco%2Fceres","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fduriantaco%2Fceres/lists"}