An open API service indexing awesome lists of open source software.

https://github.com/hinanohart/weightlock

AI Asset Compliance Gate — classify model-weight licenses (commercial-use / derivatives / gating / CONFLICT) and fail CI closed on non-commercial or unverifiable assets. pip + CLI, CPU-only. Not legal advice.
https://github.com/hinanohart/weightlock

ai-bom ci compliance huggingface license-compliance mlops model-weights supply-chain

Last synced: 13 days ago
JSON representation

AI Asset Compliance Gate — classify model-weight licenses (commercial-use / derivatives / gating / CONFLICT) and fail CI closed on non-commercial or unverifiable assets. pip + CLI, CPU-only. Not legal advice.

Awesome Lists containing this project

README

          

# weightlock

**AI Asset Compliance Gate** — classify the commercial-use, derivative, gating
and *conflict* status of model weights, and fail your CI **closed** on
non-commercial or unverifiable assets.

> ⚠️ **weightlock is NOT legal advice.** It is a best-effort engineering aid for
> spotting license risk early. Every verdict carries a `source_url` and a
> `confidence` level — final licensing decisions belong to your legal team.

---

## Why

"Open weights" is not "open source." Between 2024 and 2026 the model ecosystem
filled with both genuinely commercial-friendly weights (MIT / Apache-2.0) and
landmines: CC-BY-NC checkpoints, RAIL behavioral-use licenses, community
licenses with monthly-active-user caps (Llama, Qwen), and gated repos. SPDX was
built for source code and cannot express "non-commercial weights", "use-based
restrictions", or "gated". Today most teams check this by hand.

weightlock turns that check into one command with a non-zero exit code, so a
non-commercial or unverifiable model can't slip into a commercial pipeline
unnoticed.

Independent motivation for this gap:
- *New Tools are Needed for Tracking Adoption and Adaptation of ML Models with Behavioral Use Clauses* — [arXiv:2505.22287](https://arxiv.org/abs/2505.22287)
- *Permissive-Washing* (95.8% of permissively-labeled models lack full license text) — [arXiv:2602.08816](https://arxiv.org/abs/2602.08816)

---

## Install

```bash
pip install weightlock # core (CPU-only, no GPU, no heavy deps)
pip install "weightlock[rich]" # prettier tables
```

---

## Quickstart

```bash
# Check one HuggingFace repo
weightlock check meta-llama/Llama-3.1-8B-Instruct

# In CI: fail the job if any asset is not unconditionally commercial-usable
weightlock check $(cat models.txt) --context commercial

# Machine-readable output
weightlock check facebook/musicgen-large --format json
```

### Exit codes (for `&&` chaining in CI)

| code | meaning |
|------|---------|
| `0` | all assets pass the policy |
| `1` | at least one asset violates the policy (the gate did its job) |
| `2` | an asset could not be resolved — **fail-closed** |
| `3` | invalid configuration |

### Policy flags

| flag | effect |
|------|--------|
| `--fail-on nc,gated,unknown,conflict` | which conditions fail the gate (this is the default) |
| `--strict` | fail on `nc,unknown,conflict` (ignores gating) |
| `--allow-unknown` | do not fail on unverifiable assets (opt out of fail-closed for `unknown`) |
| `--context commercial` | treat *restricted* and *prohibited* commercial use as violations |
| `--format table\|json` | output format |

`nc` means **not unconditionally commercial-usable**: `commercial_use` is
`prohibited` or `restricted`, or outputs are non-commercial. "Restricted"
(Llama >700M-MAU, Gemma ToU, RAIL behavioral) counts, because for most orgs it
is not a clean commercial "yes".

---

## How it works

### What it classifies (6 axes + status)

`commercial_use`, `derivatives`, `redistribution`, `gating`,
`output_restriction`, `attribution` — plus an independent `status`:

- **`ok`** — sources agree.
- **`conflict`** — the host's declared license, the license body, and/or the
curated seed DB disagree. weightlock adopts the *more restrictive* value and
flags it. This is how it catches **permissive-washing** (a repo tagged
`apache-2.0` whose actual LICENSE body says "non-commercial"). `conflict`
fails the gate by default.
- **`unknown`** — nothing could be verified; fail-closed.

### Verdict resolution pipeline

1. **HuggingFace Hub metadata** (declared license, gated) — primary, always current.
2. **Curated seed DB overlay** — an *authoritative overlay* of ~20 high-authority
entries (Llama / Gemma / RAIL / CC-NC / gated families) where the tag alone
misleads. On disagreement → `conflict`. Not a substitute for the primary lookup.
3. **License body / model-card text** — declared-vs-body cross-check.
4. Nothing resolved → `unknown` (fail-closed).

---

## Architecture


weightlock architecture

---

## Honest limitations (v0.1.0a1, alpha)

- **Not legal advice.** Best-effort, conservative, fail-closed.
- Covers **directly named models** only. Transitive `base_model` scanning, a
GitHub Action wrapper and SARIF output are planned for v0.1.1.
- `gating` is a **snapshot at fetch time**, not continuous monitoring.
- The seed DB favors authority over coverage; unrecognized licenses return
`unknown` and fail-closed rather than guessing.
- Dataset license classification and CycloneDX ML-BOM export are v0.2 scope.
- License-body parsing is regex-based and conservative; it reports a conflict
only on unambiguous signals.
- Seed-vs-host `conflict` is keyed on the **commercial-use axis** (the gate's
headline judgment). A disagreement on `derivatives` or `redistribution` alone
is still merged to the stricter value, but is not raised as a `conflict` until
v0.2.

---

## How weightlock differs

- **vs. JFrog Curation** — JFrog can block models by policy in CI, but it is a
commercial product with Artifactory lock-in. weightlock is OSS, `pip`-installable
and CPU-only.
- **vs. license-scanning tools** (ScanCode, ORT, licensecheck) — those target
*source code* licenses; they don't produce a commercial-use judgment for model
weights or a `--fail-on` CI gate.

---

## Development

```bash
uv venv && uv pip install -e ".[dev,rich]"
uv run pytest # unit tests (no network)
uv run pytest -m network # optional live HuggingFace Hub smoke test
uv run ruff check .
```

---

## License

MIT. See [LICENSE](LICENSE).