https://github.com/leondz/lm_risk_cards
Risks and targets for assessing LLMs & LLM vulnerabilities
https://github.com/leondz/lm_risk_cards
llm llm-security red-teaming security vulnerability
Last synced: 10 days ago
JSON representation
Risks and targets for assessing LLMs & LLM vulnerabilities
- Host: GitHub
- URL: https://github.com/leondz/lm_risk_cards
- Owner: leondz
- Created: 2022-12-13T22:51:35.000Z (over 2 years ago)
- Default Branch: main
- Last Pushed: 2024-05-27T15:07:32.000Z (11 months ago)
- Last Synced: 2025-03-24T13:02:55.176Z (27 days ago)
- Topics: llm, llm-security, red-teaming, security, vulnerability
- Language: Python
- Homepage: https://arxiv.org/abs/2303.18190
- Size: 85.9 KB
- Stars: 30
- Watchers: 6
- Forks: 8
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
- awesome-machine-learning-interpretability - Language Model Risk Cards: Starter Set
README
# Language Model Risk Cards: Starter Set
A set of Language Model Risk cards for assessing a language model use case. To use these:
* Choose what use-case, model, and interface is to be assessed
* Select which of these risk cards is relevant in the given use-case scenario
* Recruit people to do the assessment
* For each risk card,
* Design how one will probe the model, and for how long
* Try to provoke the described behaviour from the language model, using your own prompts
* Record all inputs and outputs
* Compile an assessment reportFull details are given in the paper: [Assessing Language Model Deployment with Risk Cards](https://arxiv.org/abs/2303.18190) (2023), *Leon Derczynski, Hannah Rose Kirk, Vidhisha Balachandran, Sachin Kumar, Yulia Tsvetkov, M. R. Leiser, Saif Mohammad*