https://github.com/daniel-furman/evals-with-chat-formats

Experiments applying chat templates to generative language model evaluations.
https://github.com/daniel-furman/evals-with-chat-formats

Last synced: about 2 months ago
JSON representation

Experiments applying chat templates to generative language model evaluations.

Host: GitHub
URL: https://github.com/daniel-furman/evals-with-chat-formats
Owner: daniel-furman
License: apache-2.0
Created: 2024-02-17T20:26:34.000Z (over 1 year ago)
Default Branch: main
Last Pushed: 2024-04-08T23:15:27.000Z (about 1 year ago)
Last Synced: 2025-02-08T14:25:41.664Z (4 months ago)
Language: Jupyter Notebook
Homepage: https://towardsdatascience.com/evaluations-with-chat-formats-7604067023c9
Size: 1.81 MB
Stars: 3
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

# Evals with chat formats

Experiments applying chat templates to generative language model evaluations

Link to results: [here](https://docs.google.com/spreadsheets/d/1Tawz9IHH2B-_XWj-JjeVGmu-og60lgSSpMywrGxcj6Q/edit?usp=sharing)

## tl;dr

Chat models are typically fine-tuned on datasets formatted with a prompt template. These chat templates are programmed recipes that convert a chat conversation into a single string. At prediction time, it's standard to match an LLM's expected chat format - not doing so is oft-noted as causing performance degradations. Do we indeed see these degradations in evaluation benchmarks?

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/daniel-furman/evals-with-chat-formats

Awesome Lists containing this project

README