https://github.com/disler/elm-itv-benchmark

Simple, Opinionated benchmark for testing the viability of Efficient Language Models (ELMs) for personal use cases.
https://github.com/disler/elm-itv-benchmark

Last synced: 4 months ago
JSON representation

Simple, Opinionated benchmark for testing the viability of Efficient Language Models (ELMs) for personal use cases.

Host: GitHub
URL: https://github.com/disler/elm-itv-benchmark
Owner: disler
Created: 2024-04-28T13:47:37.000Z (over 1 year ago)
Default Branch: main
Last Pushed: 2024-05-24T16:16:51.000Z (over 1 year ago)
Last Synced: 2025-07-21T21:23:29.790Z (5 months ago)
Language: TypeScript
Size: 1.5 MB
Stars: 47
Watchers: 2
Forks: 18
Open Issues: 1
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

jimsghstars - disler/elm-itv-benchmark - Simple, Opinionated benchmark for testing the viability of Efficient Language Models (ELMs) for personal use cases. (TypeScript)

README

# Efficient Language Model Personal Viability Benchmarking
> Simple, Opinionated benchmark for testing the viability of Efficient Language Models (ELMs) for personal use cases.
>
> Uses [bun](https://bun.sh/), [promptfoo](https://promptfoo.dev/), and [ollama](https://ollama.com/) for a minimalist, cross-platform, local LLM prompt testing & benchmarking experience.

![Zero Cost Prompts](./imgs/zero-cost-prompts.png)

## Setup
- [Install Bun](https://bun.sh/docs/installation#macos-and-linux)
- [Install Ollama](https://ollama.com/download)
- [Install llama3](https://ollama.com/library/llama3) `ollama run llama3`
- [Install phi3](https://ollama.com/library/phi3) `ollama run phi3`
- [Install gemma](https://ollama.com/library/gemma) `ollama run gemma`
- Setup .env variables
- `cp .env.sample .env`
- Add your OpenAI API key to the .env file
- Install dependencies: `bun i`
- Run the minimal tests: `bun minimal`
- Open test viewer: `bun view`
- Run the ELM-ITV tests: `bun elm`

## Guide
- First, [watch the video](https://youtu.be/sb9wSWeOPI4) where we walk through ELMs and this codebase.
- To get started take a look at `BENCH__minimal_test_suite/` to get an idea of how to structure a basic test suite.
- Next take a look at the `BENCH__efficient_language_models/` test suite to get an idea of how you can setup tests for your own viability tests for ELMs.
- Explore other [ollama based models](https://promptfoo.dev/docs/providers/ollama) you can test
- Or [OpenAI models](https://promptfoo.dev/docs/providers/openai)
- Or [Anthropic models](https://promptfoo.dev/docs/providers/anthropic)
- Or [Groq models](https://promptfoo.dev/docs/providers/groq)
- Modify the `BENCH__minimal_test_suite/` or `BENCH__efficient_language_models/` to suit your needs
- Create a new test with the [Create a new test suite](#scripts) script

## Folder Structure
- `/BENCH__`
- `/prompt.txt` - the prompt(s) to test
- `/test.yaml` - variables and assertions
- `/promptfooconfig.yaml` - llm model config

## Scripts
- Create a new test suite: `bun run ./scripts/new_prompt_test`
- Run a test prompt against a running ollama server `bun run ./scripts/ollama_local_model_call`

## Resources
- Ollama model library
- https://ollama.com/library
- LMSYS Chatbot Arena Leaderboard
- https://huggingface.co/spaces/lmsys/chatbot-arena-leaderboard
- Ollama api.md docs
- https://github.com/ollama/ollama/blob/main/docs/api.md#generate-a-chat-completion
- Promptfoo Ollama Provider
- https://promptfoo.dev/docs/providers/ollama
- Promptfoo LLM Providers
- https://www.promptfoo.dev/docs/providers
- Promptfoo Assertions
- https://www.promptfoo.dev/docs/configuration/expected-outputs/

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/disler/elm-itv-benchmark

Awesome Lists containing this project

README