https://github.com/dpaleka/stealing-part-lm-supplementary

Some code for "Stealing Part of a Production Language Model"
https://github.com/dpaleka/stealing-part-lm-supplementary

Last synced: 4 months ago
JSON representation

Some code for "Stealing Part of a Production Language Model"

Host: GitHub
URL: https://github.com/dpaleka/stealing-part-lm-supplementary
Owner: dpaleka
License: mit
Created: 2024-03-19T23:26:31.000Z (over 1 year ago)
Default Branch: main
Last Pushed: 2024-03-20T13:24:44.000Z (over 1 year ago)
Last Synced: 2025-03-16T16:11:46.951Z (4 months ago)
Language: Python
Homepage:
Size: 3.55 MB
Stars: 12
Watchers: 1
Forks: 2
Open Issues: 1
Metadata Files:
- Readme: README.md
- License: LICENSE.md

Awesome Lists containing this project

Awesome-LLMSecOps - stealing-part-lm-supplementary - part-lm-supplementary?style=social) | (PoC)

README

## Supplementary code for [Stealing Part of a Production Language Model](https://arxiv.org/abs/2403.06634), Carlini et al., 2024

### `optimize_logit_queries`
Implements *logprob-free* attacks on a known vector of logits (no API calls).
Use `run_attacks.py` to access all implemented attacks, described in the paper in varying levels of detail.
Running `run_attacks.py` without modification runs several methods on a small random vector of logits.
This directory is useful as a starting point for further research on logprob-free attacks.

### Other, less directly useful code
#### `query_logprobs_emulator`.
Emulates the OpenAI API to test any logprob attack on a local model, as well as possible mitigation strategies.
(Could also use it on the OpenAI API before March 3 2024, but not anymore.)
Defaults to the logprob attack that uses `top_logprobs - 1` tokens per query.
May be useful for research on mitigations and ways to bypass them.

#### `distribution_logits`
Very unpolished script investigating the distribution of logits over the vocabulary for various open-source models.
Pythia seems to be an outlier with very low probabilities on the long tail of the vocabulary.

#### `openai_api_intricacies`
Verifying undocumented properties of `logit_bias` in the OpenAI API that are necessary for the attack to work as described in the paper.

## Disclaimer
Note: this repo does not contain any parameters of OpenAI or Google proprietary models,
nor any code that can directly extract weights from any API known to the authors.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/dpaleka/stealing-part-lm-supplementary

Awesome Lists containing this project

README