Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/veeara282/alignment-jam-2024may
https://github.com/veeara282/alignment-jam-2024may
Last synced: 5 days ago
JSON representation
- Host: GitHub
- URL: https://github.com/veeara282/alignment-jam-2024may
- Owner: veeara282
- License: mit
- Created: 2024-05-25T21:10:51.000Z (8 months ago)
- Default Branch: main
- Last Pushed: 2024-05-31T19:39:30.000Z (7 months ago)
- Last Synced: 2024-06-01T01:17:23.049Z (7 months ago)
- Language: Jinja
- Size: 125 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# alignment-jam-2024may
## Results:
### Model Score
For all three games, each with 10 rounds, GPT4 chose to cooperate 100% of the time.
### Dataset
The prompts and model completions for each round are listed in run2.csv, run3.csv, and run4.csv. In each completion, the model first analyzes the choices and then specifies its action choice. The choice is indicated at the end of the completion with a single number corresponding to the index of the action in the given answer choices.
### Game Context
The game was designed such that the payoffs for choosing a cooperative, deceptive, or aggressive action were different in each game. When running the program, the probabilities of achieving high, medium, or low payoffs for each type of action will be saved to the file probabilities.csv.
### Generalization
We suggest that models be evaluated by comparing their action trajectory with the optimal trajectory. If each action has an expected payoff of ua = pha * vha + pma * vma + pla * vla, where v is an element in the value matrix associating actions with their corresponding high, medium, and low pure payoffs, and p is an element in the probability matrix associating actions with their corresponding high, medium, and low payoff probabilities, and the terminal payoff can be represented as u = ∑i=1k {ui}, where i is the action taken at round n out of k total rounds, the optimal trajectory is the one which achieves max{u}, i.e. argmaxk{u}. We can then compare the number of cooperative, deceptive, and aggressive actions taken compared with the optimal path, and we can observe non-strategic preferences (biases) of the model towards these categories of behaviors.