https://github.com/hendrikstrobelt/lmdiff

A diff tool for language models
https://github.com/hendrikstrobelt/lmdiff

ai nlproc visualization xai

Last synced: 3 days ago
JSON representation

A diff tool for language models

Host: GitHub
URL: https://github.com/hendrikstrobelt/lmdiff
Owner: HendrikStrobelt
License: apache-2.0
Created: 2021-05-19T13:30:46.000Z (over 4 years ago)
Default Branch: main
Last Pushed: 2023-12-28T12:07:26.000Z (almost 2 years ago)
Last Synced: 2025-09-09T12:43:04.290Z (about 1 month ago)
Topics: ai, nlproc, visualization, xai
Language: Python
Homepage:
Size: 658 KB
Stars: 44
Watchers: 1
Forks: 4
Open Issues: 6
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

          # LMdiff

Qualitative comparison of large language models.

Demo & Paper: [http://lmdiff.net](http://lmdiff.net)







 LMdiff is a MIT-IBM

Watson AI Lab collaboration between: 


Hendrik

  Strobelt (IBM, MIT) ,

Benjamin Hoover

(IBM, GeorgiaTech),

Arvind

  Satyanarayan (MIT),

and Sebastian

  Gehrmann (HarvardNLP, Google).


## Setting up / Quick start

From the root directory install Conda dependencies:

```

conda env create -f environment.yml

conda activate LMdiff

pip install -e .

```

Run the backend in development mode, deploying default models and configurations:

```

uvicorn backend.server:app --reload

```

Check the output for the right port (something like [http://localhost:8000](http://localhost:8000)) and open in Browser.

## Rebuild frontend

This is optional, because we have a compiled version checked into this repo.

```

cd client

npm install

npm run build:backend

cd ..

```

## Using your own models

To use your own models:

1. Create a `TextDataset` of phrases to analyze

    You can create the dataset file in several ways:

    

    From a text file

    So you have already collected all the phrases you want into a text file separated by newlines. Simply run:

    ```

    python scripts/make_dataset.py path/to/my_dataset.txt my_dataset -o folder/i/want/to/save/in

    ```

    

    

    

    From a python object (list of strings)

    Want to only work within python?

    ```python

    from analysis.create_dataset import create_text_dataset_from_object

    my_collection = ["Phrase 1", "My second phrase"]

    create_text_dataset_from_object(my_collection, "easy-first-dataset", "human_created", "folder/i/want/to/save/in")

    ```

    

    

    

    From [Huggingface Datasets](https://huggingface.co/docs/datasets/)

    It can be created from one of Huggingface's provided datasets with:

    ```python

    from analysis.create_dataset import create_text_dataset_from_hf_datasets

    import datasets

    import path_fixes as pf

    glue_mrpc = datasets.load_dataset("glue", "mrpc", split="train")

    name = "glue_mrpc_train"

    def ds2str(glue):

        """(e.g.,) Turn the first 50 sentences of the dataset into sentence information"""

        sentences = glue['sentence1'][:50]

        return "\n".join(sentences)

    create_text_dataset_from_hf_datasets(glue_mrpc, name, ds2str, ds_type="human_created", outfpath=pf.DATASETS)

    ```

    

	The dataset is a simple `.txt` file, with a new phrase on every line, and with a bit of required metadata header at the top. E.g.,    

	

	```

	---

	checksum: 92247a369d5da32a44497be822d4a90879807a8751f5db3ff1926adbeca7ba28

	name: dataset-dummy

	type: human_created

	---

	

	This is sentence 1, please analyze this.

	Every line is a new phrase to pass to the model.

	I can keep adding phrases, so long as they are short enough to pass to the model. They don't even need to be one sentence long.

	```

	The required fields in the header:

	

	- `checksum` :: A unique identifier for the state of that file. It can be calculated however you wish, but it should change if anything at all changes in the contents below (e.g., two phrases are transposed, a new phase added, or a period is added after a sentence)

	- `name` :: The name of the dataset. 

	- `type` :: Either `human_created` or `machine_generated` if you want to compare on a dataset that was spit out by another model

	

	Each line in the contents is a new phrase to compare in the language model. A few warnings:

	

	- Make sure the phrases are short enough that they can be passed to the model given your memory constraints

	- The dataset is fully loaded into memory to serve to the front end, so avoid creating a text file that is too large to fit in memory.

2. Choose two comparable models

    

    Two models are comparable if they:

    1. Have the exact same tokenization scheme

    2. Have the exact same vocabulary

    This allows us to do tokenwise comparisons on the model. For example, this could be:

    

    - A pretrained model and a finetuned version of it (e.g., `distilbert-base-cased` and `distilbert-base-uncased-finetuned-sst-2-english`)

    - A distilled version mimicking the original model (e.g., `bert-base-cased` and `distilbert-base-cased`)

    - Different sizes of the same model architecture (e.g., `gpt2` and `gpt2-large`)

3. Preprocess the models on the chosen dataset

    ```

    python scripts/preprocess.py all gpt2-medium distilgpt2 data/datasets/glue_mrpc_1+2.csv --output-dir data/sample/gpt2-glue-comparisons

    ```

4. Start the app

    ```

    python backend/server/main.py --config data/sample/gpt2-glue-comparisons

    ```

    Note that if you use a different tokenization scheme than the default `gpt`, you will need to tell the frontend how to visualize the tokens. For example, a `bert` based tokenization scheme:

    ```

    python backend/server/main.py --config data/sample/bert-glue-comparisons -t bert

    ```

## Architecture







## (Admin) Getting the Data

Models and datasets for the deployed app are stored on the cloud and require a private `.dvc/config` file.

With the correct config:

```dvc pull```

will populate the data directories correctly for the deployed version.

Testing

```

make test

```

or

```

python -m pytest tests

```

All tests are stored in `tests`.

### Frontend

We like [`pnpm`](https://pnpm.io/installation) but `npm` works just as well. We also like [`Vite`](https://vitejs.dev/) for its rapid hot module reloading and pleasant dev experience. This repository uses [`Vue`](https://vuejs.org/) as a reactive framework.

From the root directory:

```

cd client

pnpm install --save-dev

pnpm run dev

```

If you want to hit the backend routes, make sure to also run the `uvicorn backend.server:app` command from the project root.

For production (serve with Vite)

```

pnpm run serve

```

For production (serve with this repo's FastAPI server)

```

cd client

pnpm run build:backend

cd ..

uvicorn backend.server:app

```

Or the `gunicorn` command from above.

All artifacts are stored in the `client/dist` directory with the appropriate basepath.

For production (serve with external tooling like NGINX)

```

pnpm run build

```

All artifacts are stored in the `client/dist` directory.

## Notes

- Check the endpoints by visiting `:/docs`

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/hendrikstrobelt/lmdiff

Awesome Lists containing this project

README