https://github.com/auraoneai/open
Open tools for the human-judgment layer of AI evaluation: EvalKit (Python package + CLI), Robotics ReviewKit, and the Buying Toolkit.
https://github.com/auraoneai/open
ai-safety auraone evals evaluation human-feedback lerobot llm openx rlds robotics rubrics teleoperation
Last synced: 17 days ago
JSON representation
Open tools for the human-judgment layer of AI evaluation: EvalKit (Python package + CLI), Robotics ReviewKit, and the Buying Toolkit.
- Host: GitHub
- URL: https://github.com/auraoneai/open
- Owner: auraoneai
- License: mit
- Created: 2026-05-11T17:13:31.000Z (about 1 month ago)
- Default Branch: main
- Last Pushed: 2026-05-11T19:04:17.000Z (about 1 month ago)
- Last Synced: 2026-05-11T19:05:47.240Z (about 1 month ago)
- Topics: ai-safety, auraone, evals, evaluation, human-feedback, lerobot, llm, openx, rlds, robotics, rubrics, teleoperation
- Language: Python
- Homepage: https://auraone.ai/open
- Size: 188 KB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
- Code of conduct: CODE_OF_CONDUCT.md
- Security: SECURITY.md
Awesome Lists containing this project
README
# AuraOne Open
Open tools for the human-judgment layer of AI evaluation. Rubric authoring, scoring, judge calibration, reviewer agreement, drift detection, leakage audits, dataset documentation, and robotics review data — local, inspectable, and runnable without an AuraOne account.
The thesis behind this release: the best evals will stay private, but the standards for building trustworthy evals should not. Read it in full at [resources/writing/measuring-human-judgment-layer.md](resources/writing/measuring-human-judgment-layer.md).
## What's in this repository
| Component | What it is |
| --- | --- |
| [`packages/evalkit/`](packages/evalkit/) | `auraone-evalkit` Python package + `evalkit` CLI for rubric validation, scoring, judge calibration, reviewer agreement, drift, leakage audits, sampling, and versioning. |
| [`robotics-reviewkit/`](robotics-reviewkit/) | Schemas, exporters, taxonomy, and a static viewer for teleop review and failure data. Includes LeRobot and RLDS/OpenX export bridges. |
| [`resources/buying-toolkit/`](resources/buying-toolkit/) | Templates and checklists for teams buying human-data work: SOWs, RFPs, SLAs, vendor comparison, pilot design, reviewer certification, and program playbook. |
| [`resources/writing/`](resources/writing/) | The thesis posts behind this release. |
| [`docs/PRD/`](docs/PRD/) | The audit trail from source to v0.1.0. |
## Quick start
```bash
pip install auraone-evalkit
evalkit validate-rubric path/to/rubric.jsonl
evalkit lint-rubric path/to/rubric.jsonl
evalkit score --rubric rubric.jsonl --responses outputs.jsonl --out scores.json
```
Full CLI reference: [`packages/evalkit/README.md`](packages/evalkit/README.md).
## What this release is not
- Not an expert-authored benchmark. The tutorial datasets included here are synthetic and exist to make the tooling runnable.
- Not real robotics data. The example episodes in `robotics-reviewkit/examples/` are mock metadata.
- Not a hosted AuraOne service. EvalKit runs locally with no API key, no tenant, no database.
These limits are intentional. The release ships the standards, not the dataset.
## Related links
- AuraOne Open landing page: https://auraone.ai/open
- Resource hub: https://auraone.ai/resources
- Hosted AuraOne: https://auraone.ai
## License
MIT — see [LICENSE](LICENSE).