https://github.com/leondz/lm_risk_cards

Risks and targets for assessing LLMs & LLM vulnerabilities
https://github.com/leondz/lm_risk_cards

llm llm-security red-teaming security vulnerability

Last synced: 2 months ago
JSON representation

Risks and targets for assessing LLMs & LLM vulnerabilities

Host: GitHub
URL: https://github.com/leondz/lm_risk_cards
Owner: leondz
Created: 2022-12-13T22:51:35.000Z (over 2 years ago)
Default Branch: main
Last Pushed: 2024-05-27T15:07:32.000Z (about 1 year ago)
Last Synced: 2025-03-24T13:02:55.176Z (3 months ago)
Topics: llm, llm-security, red-teaming, security, vulnerability
Language: Python
Homepage: https://arxiv.org/abs/2303.18190
Size: 85.9 KB
Stars: 30
Watchers: 6
Forks: 8
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

awesome-machine-learning-interpretability - Language Model Risk Cards: Starter Set

README

# Language Model Risk Cards: Starter Set

A set of Language Model Risk cards for assessing a language model use case. To use these:

* Choose what use-case, model, and interface is to be assessed
* Select which of these risk cards is relevant in the given use-case scenario
* Recruit people to do the assessment
* For each risk card,
* Design how one will probe the model, and for how long
* Try to provoke the described behaviour from the language model, using your own prompts
* Record all inputs and outputs
* Compile an assessment report

Full details are given in the paper: [Assessing Language Model Deployment with Risk Cards](https://arxiv.org/abs/2303.18190) (2023), *Leon Derczynski, Hannah Rose Kirk, Vidhisha Balachandran, Sachin Kumar, Yulia Tsvetkov, M. R. Leiser, Saif Mohammad*

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/leondz/lm_risk_cards

Awesome Lists containing this project

README