https://github.com/eleutherai/lm_perplexity

Last synced: 11 months ago
JSON representation

Host: GitHub
URL: https://github.com/eleutherai/lm_perplexity
Owner: EleutherAI
License: mit
Created: 2020-12-29T20:03:28.000Z (over 5 years ago)
Default Branch: main
Last Pushed: 2021-03-05T23:49:03.000Z (over 5 years ago)
Last Synced: 2025-04-24T18:48:46.770Z (about 1 year ago)
Language: Python
Size: 536 KB
Stars: 148
Watchers: 6
Forks: 18
Open Issues: 2
Metadata Files:
- Readme: README.md
- License: LICENSE.md
- Codeowners: CODEOWNERS

Awesome Lists containing this project

README

# lm_perplexity

Code for benchmarking language models with the Pile.

## Usage

Evaluating on GPT-2 (uses GPU):

```bash
# Compute intermediate outputs for calculating perplexity (e.g. logprobs)
python lm_perplexity/save_lm_perplexity_data.py \
--model_config_path preset_configs/gpt2_medium.json \
--data_path /path/to/mydata.jsonl.zst \
--output_path /path/to/perplexity_data.p

# Use intermediate outputs to compute perplexity
python lm_perplexity/compute_perplexity.py \
--perplexity_data_path /path/to/perplexity_data.p \
--output_path /path/to/perplexity.json
```

Evaluating on GPT-3 (requires OpenAI API key):

```bash
# Compute intermediate outputs for calculating perplexity (e.g. logprobs)
export OPENAI_API_SECRET_KEY=YOUR_KEY_HERE
python lm_perplexity/run_lm_perplexity.py \
--model_config_path preset_configs/gpt3_curie.json \
--data_path /path/to/mydata.jsonl.zst \
--output_path /path/to/perplexity_data.p

# Use intermediate outputs to compute perplexity
python lm_perplexity/compute_perplexity.py \
--perplexity_data_path /path/to/perplexity_data.p \
--output_path /path/to/perplexity.json
```

## Assets

JSON files in `assets/${DATASET}/group${GROUP_ID}.json` contain the document indices for the canonical one-tenth split of the test set. Evaluation in the paper were performed on `group0`.

## Requirements

* numpy
* torch
* transformers
* openai
* lm_dataformat
* tqdm

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/eleutherai/lm_perplexity

Awesome Lists containing this project

README