https://github.com/mlcommons/endpoints-submission-cli
https://github.com/mlcommons/endpoints-submission-cli
Last synced: 14 days ago
JSON representation
- Host: GitHub
- URL: https://github.com/mlcommons/endpoints-submission-cli
- Owner: mlcommons
- License: apache-2.0
- Created: 2026-05-19T16:20:14.000Z (about 1 month ago)
- Default Branch: main
- Last Pushed: 2026-06-12T17:56:02.000Z (16 days ago)
- Last Synced: 2026-06-12T19:22:30.386Z (16 days ago)
- Language: Python
- Size: 452 KB
- Stars: 1
- Watchers: 0
- Forks: 0
- Open Issues: 7
-
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE.md
- Codeowners: .github/CODEOWNERS
Awesome Lists containing this project
README
# MLCommons Endpoints Submission Tools
A Python package with two tools for managing MLPerf Endpoints benchmark submissions:
- **`endpoints-submission-cli`** — registers benchmark runs, assembles submission packages, runs compliance checks, and opens GitHub pull requests via the PRISM API.
- **`submission-checker`** — validates a submission folder against the §9.1 automated compliance rules before or after upload.
---
## Installation
**With pip:**
```bash
pip install endpoints-submission-cli
```
**From source (editable):**
```bash
pip install -e ".[dev]"
```
**With [uv](https://github.com/astral-sh/uv):**
```bash
uv sync --extra dev
```
---
# endpoints-submission-cli
## Requirements
- Python 3.10 or later
- [`gh` CLI](https://cli.github.com/) — required for creating, updating, and withdrawing submissions
## Authentication
Every command requires a PRISM API token in `mlc_…` format. Supply it as an env var or pass `--token` per command:
```bash
# Persistent (add to shell profile)
export PRISM_USER_API_TOKEN=mlc_your_token_here
# Per-command override
endpoints-submission-cli runs list --token mlc_your_token_here
```
Submission commands that create or update GitHub pull requests also require the `gh` CLI:
```bash
gh auth login
```
## Configuration
| Environment variable | Default | Description |
|---|---|---|
| `PRISM_USER_API_TOKEN` | — | API key. Required unless `--token` is passed. |
| `MLPERF_SUBMISSION_REPO` | `MLCommons-Systems/test-endpoints-submission-repo` | Target GitHub repository for submission PRs (`owner/repo`). |
Add to your shell profile for a persistent setup:
```bash
export PRISM_USER_API_TOKEN=mlc_your_token_here
export MLPERF_SUBMISSION_REPO=MLCommons-Systems/endpoints-submission-repo
```
## Quick start
```bash
# 1. Verify connectivity
endpoints-submission-cli runs list
# 2. Register a benchmark run from a local result folder
endpoints-submission-cli runs create --path /results/llama3_h100_c4
# → Run created: d5d9873e-5eca-4f8d-a487-4be1cb8b440c
RUN_ID=d5d9873e-5eca-4f8d-a487-4be1cb8b440c
# 3. Create a submission (assembles, checks, uploads, opens PR)
endpoints-submission-cli submissions create \
--division standardized \
--availability available \
--run-ids $RUN_ID
# → Submission created: a1b2c3d4-…
# → PR: https://github.com/MLCommons-Systems/…/pull/42
SUB_ID=a1b2c3d4-e5f6-7890-abcd-ef1234567890
# 4. Add another run later
endpoints-submission-cli submissions add-run \
--submission-id $SUB_ID \
--run-id
# 5. Withdraw if needed
endpoints-submission-cli submissions withdraw --submission-id $SUB_ID
```
## Command reference
```
endpoints-submission-cli
├── runs
│ ├── list List all runs
│ ├── create Register a run from a local folder
│ ├── get Fetch run details
│ ├── delete Delete a run and its archive
│ ├── pin Pin a run (prevent expiry)
│ └── unpin Restore normal expiry
└── submissions
├── list List all submissions
├── create Create a submission from runs (full pipeline)
├── get Fetch submission details
├── update Update run list or metadata
├── withdraw Withdraw a submission
├── add-run Add a run to an existing submission
└── remove-run Remove a run from a submission
```
Use `--help` on any command for full flag details:
```bash
endpoints-submission-cli submissions create --help
```
---
# submission-checker
CLI tool for validating MLPerf Endpoints submissions against the §9.1 automated compliance checks.
## Usage
### Check a submission
```bash
submission-checker check /path/to/submission
```
The tool expects the submission root to contain `systems/` and `pareto/` subdirectories as specified in §8.1.
**Options:**
| Flag | Description |
|------|-------------|
| `--strict` | Treat warnings as errors (exit 1 on any warning) |
| `--quiet` / `-q` | Suppress INFO-level passing checks |
| `--output FILE` / `-o FILE` | Write full results as JSON to *FILE* |
**Exit codes:** `0` = all checks passed, `1` = one or more errors (or warnings with `--strict`).
### Show region boundaries
```bash
submission-checker regions --max-concurrency 1024
```
Prints the concurrency ranges for each region given a declared Maximum Supported Concurrency *M* (§5.5).
## Required Files in submission structure
```
/
├── systems/
│ └── .json # §8.2 — hardware + software description
└── pareto/
└── /
└── /
├── points/
│ └── point_.yaml # §8.3 — one config per measurement point
├── results/
│ └── point_/
│ ├── mlperf_endpoints_log_summary.json
│ └── mlperf_endpoints_log_detail.json
└── accuracy/
├── accuracy.txt
└── accuracy_result.json
```
## What gets checked
| Rule | Spec | Description |
|------|------|-------------|
| `path-exists` | §1 | Submission root directory exists |
| `required-dir` | §1 | `systems/` and `pareto/` present |
| `system-description-present` | §1 | At least one `*.json` file found in `systems/` |
| `system-description-valid` | §1 | `systems/*.json` parses against schema |
| `src-dir` | §1 | `src/` present for Standardized submissions |
| `pareto-dir-exists` | §1 | `pareto//` directory exists |
| `benchmark-model-dir` | §1 | At least one benchmark-model directory in `pareto//` |
| `pareto-subdir` | §1 | `points/`, `results/`, `accuracy/` present |
| `measurement-points-present` | §1 | At least one `point_*.yaml` found |
| `point-config-valid` | §1 | YAML parses against `PointConfig` schema |
| `point-filename-concurrency` | §1 | Filename concurrency matches declared value |
| `result-file-present` | §1 | Result summary log exists for each point config |
| `result-detail-present` | §1 | Result detail log exists for each point config |
| `result-file-valid` | §1 | Result summary log parses against `PointSummary` schema |
| `point-count` | §2, §8 | 7–32 measurement points |
| `point-cap` | §2, §8 | Point count does not exceed 32 |
| `low-latency-coverage` | §3 | At least one point in Low Latency region |
| `low-throughput-coverage` | §4 | At least one point in Low Throughput region |
| `med-throughput-coverage` | §5 | At least one point in Medium Throughput region |
| `high-throughput-coverage` | §6 | At least one point in High Throughput region |
| `max-concurrency-declared` | §7 | `max_supported_concurrency` field present |
| `region-computation` | §7 | *M* > 32 (required for region formula) |
| `concurrency-in-range` | §9 | Concurrency within region bounds (incl. 10% margin) |
| `load-pattern` | §10 | `load_pattern` is `concurrency` with a positive concurrency level |
| `point-duration` | §11 | Point meets per-region minimum duration |
| `min-query-count` | §12 | `n_samples_completed` meets dataset-specific minimum (§6.4) |
| `streaming-config` | §13 | `stream_all_chunks` is `True` |
| `metric-consistency-duration` | §14 | `duration_ns` > 0 |
| `metric-consistency-accounting` | §14 | `completed + failed == issued` |
| `metric-consistency-output-tokens` | §14 | `total_output_tokens` ≥ 0 |
| `metric-consistency-system-tps` | §9.1 | Stored `system_tps` consistent with derived value |
| `metric-consistency-tps-per-user` | §9.1 | Stored `tps_per_user` consistent with `system_tps / concurrency` |
| `accuracy-file` | §15 | `accuracy.txt` and `accuracy_result.json` present |
| `accuracy-valid` | §15 | `accuracy_result.json` parses correctly |
| `accuracy-consistency` | §15 | `passed` flag consistent with `score >= quality_target` |
| `accuracy-gate` | §15 | Score ≥ quality target |
| `config-consistency-dataset` | §16 | All points use the same dataset |
| `config-consistency-model` | §16 | Directory name matches `benchmark_model` |
| `region-declared` | §8.3 | Declared `region` field (if present) is valid and matches computed region |
## Programmatic API
```python
from submission_checker import SubmissionChecker, Report
checker = SubmissionChecker(Path("/submissions/acme_corp"))
report = checker.run()
if report.passed:
print("All checks passed")
else:
for result in report.errors:
print(f"[{result.rule}] {result.message}")
```
The `Report` object also exposes `report.warnings` and serialises cleanly via `report.model_dump_json()`.
---
## Development
```bash
uv run pytest # run all tests
uv run pytest --no-cov -x # fast fail on first error
uv run ruff check src/ tests/ # lint
uv run ruff format src/ tests/ # auto-format
```