https://github.com/eleutherai/lm_perplexity
https://github.com/eleutherai/lm_perplexity
Last synced: 11 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/eleutherai/lm_perplexity
- Owner: EleutherAI
- License: mit
- Created: 2020-12-29T20:03:28.000Z (over 5 years ago)
- Default Branch: main
- Last Pushed: 2021-03-05T23:49:03.000Z (over 5 years ago)
- Last Synced: 2025-04-24T18:48:46.770Z (about 1 year ago)
- Language: Python
- Size: 536 KB
- Stars: 148
- Watchers: 6
- Forks: 18
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
- License: LICENSE.md
- Codeowners: CODEOWNERS
Awesome Lists containing this project
README
# lm_perplexity
Code for benchmarking language models with the Pile.
## Usage
Evaluating on GPT-2 (uses GPU):
```bash
# Compute intermediate outputs for calculating perplexity (e.g. logprobs)
python lm_perplexity/save_lm_perplexity_data.py \
--model_config_path preset_configs/gpt2_medium.json \
--data_path /path/to/mydata.jsonl.zst \
--output_path /path/to/perplexity_data.p
# Use intermediate outputs to compute perplexity
python lm_perplexity/compute_perplexity.py \
--perplexity_data_path /path/to/perplexity_data.p \
--output_path /path/to/perplexity.json
```
Evaluating on GPT-3 (requires OpenAI API key):
```bash
# Compute intermediate outputs for calculating perplexity (e.g. logprobs)
export OPENAI_API_SECRET_KEY=YOUR_KEY_HERE
python lm_perplexity/run_lm_perplexity.py \
--model_config_path preset_configs/gpt3_curie.json \
--data_path /path/to/mydata.jsonl.zst \
--output_path /path/to/perplexity_data.p
# Use intermediate outputs to compute perplexity
python lm_perplexity/compute_perplexity.py \
--perplexity_data_path /path/to/perplexity_data.p \
--output_path /path/to/perplexity.json
```
## Assets
JSON files in `assets/${DATASET}/group${GROUP_ID}.json` contain the document indices for the canonical one-tenth split of the test set. Evaluation in the paper were performed on `group0`.
## Requirements
* numpy
* torch
* transformers
* openai
* lm_dataformat
* tqdm