https://github.com/hinanohart/weightlock
AI Asset Compliance Gate — classify model-weight licenses (commercial-use / derivatives / gating / CONFLICT) and fail CI closed on non-commercial or unverifiable assets. pip + CLI, CPU-only. Not legal advice.
https://github.com/hinanohart/weightlock
ai-bom ci compliance huggingface license-compliance mlops model-weights supply-chain
Last synced: 13 days ago
JSON representation
AI Asset Compliance Gate — classify model-weight licenses (commercial-use / derivatives / gating / CONFLICT) and fail CI closed on non-commercial or unverifiable assets. pip + CLI, CPU-only. Not legal advice.
- Host: GitHub
- URL: https://github.com/hinanohart/weightlock
- Owner: hinanohart
- License: mit
- Created: 2026-05-26T15:26:58.000Z (about 1 month ago)
- Default Branch: main
- Last Pushed: 2026-06-10T12:09:06.000Z (18 days ago)
- Last Synced: 2026-06-10T12:22:55.470Z (18 days ago)
- Topics: ai-bom, ci, compliance, huggingface, license-compliance, mlops, model-weights, supply-chain
- Language: Python
- Size: 81.1 KB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
- Notice: NOTICE
Awesome Lists containing this project
README
# weightlock
**AI Asset Compliance Gate** — classify the commercial-use, derivative, gating
and *conflict* status of model weights, and fail your CI **closed** on
non-commercial or unverifiable assets.
> ⚠️ **weightlock is NOT legal advice.** It is a best-effort engineering aid for
> spotting license risk early. Every verdict carries a `source_url` and a
> `confidence` level — final licensing decisions belong to your legal team.
---
## Why
"Open weights" is not "open source." Between 2024 and 2026 the model ecosystem
filled with both genuinely commercial-friendly weights (MIT / Apache-2.0) and
landmines: CC-BY-NC checkpoints, RAIL behavioral-use licenses, community
licenses with monthly-active-user caps (Llama, Qwen), and gated repos. SPDX was
built for source code and cannot express "non-commercial weights", "use-based
restrictions", or "gated". Today most teams check this by hand.
weightlock turns that check into one command with a non-zero exit code, so a
non-commercial or unverifiable model can't slip into a commercial pipeline
unnoticed.
Independent motivation for this gap:
- *New Tools are Needed for Tracking Adoption and Adaptation of ML Models with Behavioral Use Clauses* — [arXiv:2505.22287](https://arxiv.org/abs/2505.22287)
- *Permissive-Washing* (95.8% of permissively-labeled models lack full license text) — [arXiv:2602.08816](https://arxiv.org/abs/2602.08816)
---
## Install
```bash
pip install weightlock # core (CPU-only, no GPU, no heavy deps)
pip install "weightlock[rich]" # prettier tables
```
---
## Quickstart
```bash
# Check one HuggingFace repo
weightlock check meta-llama/Llama-3.1-8B-Instruct
# In CI: fail the job if any asset is not unconditionally commercial-usable
weightlock check $(cat models.txt) --context commercial
# Machine-readable output
weightlock check facebook/musicgen-large --format json
```
### Exit codes (for `&&` chaining in CI)
| code | meaning |
|------|---------|
| `0` | all assets pass the policy |
| `1` | at least one asset violates the policy (the gate did its job) |
| `2` | an asset could not be resolved — **fail-closed** |
| `3` | invalid configuration |
### Policy flags
| flag | effect |
|------|--------|
| `--fail-on nc,gated,unknown,conflict` | which conditions fail the gate (this is the default) |
| `--strict` | fail on `nc,unknown,conflict` (ignores gating) |
| `--allow-unknown` | do not fail on unverifiable assets (opt out of fail-closed for `unknown`) |
| `--context commercial` | treat *restricted* and *prohibited* commercial use as violations |
| `--format table\|json` | output format |
`nc` means **not unconditionally commercial-usable**: `commercial_use` is
`prohibited` or `restricted`, or outputs are non-commercial. "Restricted"
(Llama >700M-MAU, Gemma ToU, RAIL behavioral) counts, because for most orgs it
is not a clean commercial "yes".
---
## How it works
### What it classifies (6 axes + status)
`commercial_use`, `derivatives`, `redistribution`, `gating`,
`output_restriction`, `attribution` — plus an independent `status`:
- **`ok`** — sources agree.
- **`conflict`** — the host's declared license, the license body, and/or the
curated seed DB disagree. weightlock adopts the *more restrictive* value and
flags it. This is how it catches **permissive-washing** (a repo tagged
`apache-2.0` whose actual LICENSE body says "non-commercial"). `conflict`
fails the gate by default.
- **`unknown`** — nothing could be verified; fail-closed.
### Verdict resolution pipeline
1. **HuggingFace Hub metadata** (declared license, gated) — primary, always current.
2. **Curated seed DB overlay** — an *authoritative overlay* of ~20 high-authority
entries (Llama / Gemma / RAIL / CC-NC / gated families) where the tag alone
misleads. On disagreement → `conflict`. Not a substitute for the primary lookup.
3. **License body / model-card text** — declared-vs-body cross-check.
4. Nothing resolved → `unknown` (fail-closed).
---
## Architecture
---
## Honest limitations (v0.1.0a1, alpha)
- **Not legal advice.** Best-effort, conservative, fail-closed.
- Covers **directly named models** only. Transitive `base_model` scanning, a
GitHub Action wrapper and SARIF output are planned for v0.1.1.
- `gating` is a **snapshot at fetch time**, not continuous monitoring.
- The seed DB favors authority over coverage; unrecognized licenses return
`unknown` and fail-closed rather than guessing.
- Dataset license classification and CycloneDX ML-BOM export are v0.2 scope.
- License-body parsing is regex-based and conservative; it reports a conflict
only on unambiguous signals.
- Seed-vs-host `conflict` is keyed on the **commercial-use axis** (the gate's
headline judgment). A disagreement on `derivatives` or `redistribution` alone
is still merged to the stricter value, but is not raised as a `conflict` until
v0.2.
---
## How weightlock differs
- **vs. JFrog Curation** — JFrog can block models by policy in CI, but it is a
commercial product with Artifactory lock-in. weightlock is OSS, `pip`-installable
and CPU-only.
- **vs. license-scanning tools** (ScanCode, ORT, licensecheck) — those target
*source code* licenses; they don't produce a commercial-use judgment for model
weights or a `--fail-on` CI gate.
---
## Development
```bash
uv venv && uv pip install -e ".[dev,rich]"
uv run pytest # unit tests (no network)
uv run pytest -m network # optional live HuggingFace Hub smoke test
uv run ruff check .
```
---
## License
MIT. See [LICENSE](LICENSE).