https://github.com/daniel-furman/evals-with-chat-formats
Experiments applying chat templates to generative language model evaluations.
https://github.com/daniel-furman/evals-with-chat-formats
Last synced: about 2 months ago
JSON representation
Experiments applying chat templates to generative language model evaluations.
- Host: GitHub
- URL: https://github.com/daniel-furman/evals-with-chat-formats
- Owner: daniel-furman
- License: apache-2.0
- Created: 2024-02-17T20:26:34.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2024-04-08T23:15:27.000Z (about 1 year ago)
- Last Synced: 2025-02-08T14:25:41.664Z (4 months ago)
- Language: Jupyter Notebook
- Homepage: https://towardsdatascience.com/evaluations-with-chat-formats-7604067023c9
- Size: 1.81 MB
- Stars: 3
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Evals with chat formats
Experiments applying chat templates to generative language model evaluations
Link to results: [here](https://docs.google.com/spreadsheets/d/1Tawz9IHH2B-_XWj-JjeVGmu-og60lgSSpMywrGxcj6Q/edit?usp=sharing)
## tl;dr
Chat models are typically fine-tuned on datasets formatted with a prompt template. These chat templates are programmed recipes that convert a chat conversation into a single string. At prediction time, it's standard to match an LLM's expected chat format - not doing so is oft-noted as causing performance degradations. Do we indeed see these degradations in evaluation benchmarks?