https://github.com/gqgs/llm100kbench
LLM 100k portfolio management benchmark
https://github.com/gqgs/llm100kbench
benchmark investment llm
Last synced: 7 months ago
JSON representation
LLM 100k portfolio management benchmark
- Host: GitHub
- URL: https://github.com/gqgs/llm100kbench
- Owner: gqgs
- Created: 2025-02-22T03:38:49.000Z (8 months ago)
- Default Branch: master
- Last Pushed: 2025-03-01T07:05:22.000Z (7 months ago)
- Last Synced: 2025-03-01T08:18:43.372Z (7 months ago)
- Topics: benchmark, investment, llm
- Language: Go
- Homepage:
- Size: 70.3 KB
- Stars: 40
- Watchers: 3
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# LLM Investment Benchmark
A tool for benchmarking and tracking Large Language Model (LLM) investment decisions.
## Overview
This project provides a framework to create, manage, and track investment portfolios generated by LLM models. It allows you to:
- Create new portfolios
- List current holdings and recent context
- Update portfolios based on model decisionsThe model executions and their current context can be seen [here](./orders).
Note: Some models will just hallucinate prices if they have technicals issues accessing prices at the time of execution.
## Why?
To optimize their portfolio, the primary objective defined for the LLMs, it is imperative to evaluate the risk-reward ratio, formulate cogent assumptions about future market conditions, and leverage tools and their understanding of human psychology and financial market dynamics.
This benchmark may be a good proxy to measure how well LLMs are able to coordinate the aforementioned efforts.
## Project Structure
- `cmd`: Contains the main command implementations
- `create`: Initialize new portfolios
- `list`: Display current holdings and context
- `update`: Process investment orders and update holdings## Prompt
The most recent prompt with the clear guidelines can be see [here](./cmd/create/prompt.txt) and [here](./cmd/list/prompt.txt).
## Current Portfolio (2025-03-01)
| Model | Ticket | Sum | Quantity |
|-------|-------|-------|--------|
|`claude3.5`|`GOOGL`|2625|15|
|`claude3.5`|`NVDA`|15500|20|
|`claude3.5`|`AMZN`|3600|20|
|`claude3.5`|`MSFT`|15800|40|
|`claude3.5`|`VOO`|48750|125|
|`deepseek-r1`|`AMD`|106250|625|
|`gemini2.0-flash`|`NVDA`|99957|294|
|`grok3`|`BRK.B`|20000|50|
|`grok3`|`MFG`|8700|58|
|`grok3`|`ENG`|8640|72|
|`grok3`|`IWM`|15000|75|
|`grok3`|`BTCETF`|5000|100|
|`grok3`|`METL`|10000|100|
|`grok3`|`BSV`|12480|156|
|`grok3`|`INTC`|20000|400|
|`o3-mini`|`TSLA`|10134|30|
|`o3-mini`|`GOOGL`|8178|45|
|`o3-mini`|`MSFT`|29799|73|
|`o3-mini`|`AMZN`|19925|92|
|`o3-mini`|`AAPL`|31649|129|
|`o3-mini`|`USD`|313|313|| Model | Total Sum | Change |
|-------|-----------|--------|
|`deepseek-r1`|106250|$${\color{green}6.25\\%}$$|
|`o3-mini`|99998|—|
|`gemini2.0-flash`|99957|—|
|`grok3`|99820|$${\color{red}0.18\\%}$$|
|`claude3.5`|86275|$${\color{red}13.72\\%}$$|