https://github.com/disler/benchy
https://github.com/disler/benchy
Last synced: 3 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/disler/benchy
- Owner: disler
- Created: 2024-11-10T17:04:41.000Z (3 months ago)
- Default Branch: main
- Last Pushed: 2024-11-17T18:05:16.000Z (3 months ago)
- Last Synced: 2024-11-17T18:42:12.421Z (3 months ago)
- Language: Python
- Size: 3.12 MB
- Stars: 6
- Watchers: 1
- Forks: 1
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
- jimsghstars - disler/benchy - (TypeScript)
README
# BENCHY
> Benchmarks you can **feel**
>
> We all love benchmarks, but there's nothing like a hands on vibe check. What if we could meet somewhere in the middle?
>
> Enter BENCHY. A chill, live benchmark tool that lets you see the performance, price, and speed of LLMs in a side by side comparison for SPECIFIC use cases.
>
> Watch the walk through [video here](https://youtu.be/ZlljCLhq814)
## Live Benchmark Tools
- [Long Tool Calling](src/pages/AppMultiToolCall.vue)
- Goal: Understand the best LLMs and techniques for LONG chains of tool calls / function calls (15+).
- Watch the walk through [video here](https://youtu.be/ZlljCLhq814)
- [Multi Autocomplete](src/pages/AppMultiAutocomplete.vue)
- Goal: Understand [claude 3.5 haiku](https://www.anthropic.com/claude/haiku) & GPT-4o [predictive outputs](https://platform.openai.com/docs/guides/predicted-outputs) compared to existing models.
- Watch the walk through [video here](https://youtu.be/1ObiaSiA8BQ)## Important Files
- `.env` - Environment variables for API keys
- `server/.env` - Environment variables for API keys
- `package.json` - Front end dependencies
- `server/pyproject.toml` - Server dependencies
- `src/store/*` - Stores all front end state and prompt
- `src/api/*` - API layer for all requests
- `server/server.py` - Server routes
- `server/modules/llm_models.py` - All LLM models
- `server/modules/openai_llm.py` - OpenAI LLM
- `server/modules/anthropic_llm.py` - Anthropic LLM
- `server/modules/gemini_llm.py` - Gemini LLM## Setup
### Get API Keys
- [Anthropic](https://docs.anthropic.com/en/api/getting-started)
- [Google Cloud](https://ai.google.dev/gemini-api/docs/api-key)
- [OpenAI](https://help.openai.com/en/articles/4936850-where-do-i-find-my-openai-api-key)### Client Setup
```bash
# Install dependencies using bun (recommended)
bun install# Or using npm
npm install# Or using yarn
yarn install# Start development server
bun dev # or npm run dev / yarn dev
```### Server Setup
```bash
# Move into server directory
cd server# Create and activate virtual environment using uv
uv sync# Set up environment variables
cp .env.sample .env# Set EVERY .env key with your API keys and settings
ANTHROPIC_API_KEY=
OPENAI_API_KEY=
GEMINI_API_KEY=# Start server
uv run python server.py# Run tests
uv run pytest (**beware will hit APIs and cost money**)
```## Dev Notes & Caveats
- See `src/components/DevNotes.vue` for limitations## Resources
- https://github.com/simonw/llm?tab=readme-ov-file
- https://github.com/openai/openai-python
- https://platform.openai.com/docs/guides/predicted-outputs
- https://community.openai.com/t/introducing-predicted-outputs/1004502
- https://unocss.dev/integrations/vite
- https://www.npmjs.com/package/vue-codemirror6
- https://vuejs.org/guide/scaling-up/state-management
- https://www.ag-grid.com/vue-data-grid/getting-started/
- https://www.ag-grid.com/vue-data-grid/value-formatters/
- https://llm.datasette.io/en/stable/index.html
- https://cloud.google.com/vertex-ai/generative-ai/docs/multimodal/get-token-count
- https://ai.google.dev/gemini-api/docs/tokens?lang=python
- https://ai.google.dev/pricing#1_5flash
- https://ai.google.dev/gemini-api/docs/structured-output?lang=python
- https://platform.openai.com/docs/guides/structured-outputs
- https://docs.anthropic.com/en/docs/build-with-claude/tool-use
- https://ai.google.dev/gemini-api/docs/models/experimental-models