https://github.com/llllvvuu/log_to_eval_prototype
toy example
https://github.com/llllvvuu/log_to_eval_prototype
Last synced: 3 months ago
JSON representation
toy example
- Host: GitHub
- URL: https://github.com/llllvvuu/log_to_eval_prototype
- Owner: llllvvuu
- Created: 2024-08-08T01:02:11.000Z (10 months ago)
- Default Branch: main
- Last Pushed: 2024-08-08T01:02:22.000Z (10 months ago)
- Last Synced: 2025-01-15T22:25:23.940Z (5 months ago)
- Language: Python
- Size: 1.95 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Usage
```sh
pip -r requirements.txt
python -m chat
python -m chat.eval
```# Sample Output
```
; python -m chat "Are you based?"
As an AI, I don't have a physical location or a body. So, in the sense of being "based" in a physical place, I'm not.But I'm here to help you with any questions or tasks you have! 😊 What can I do for you today?
Provide golden response? [y/n] y
Golden response: Yezzir; python -m chat.eval
User: Are you based?
Expected Response: Yezzir
Actual Response:
As an AI, I don't have a physical location or a body. The term "based" usually refers to being grounded in reality or having a strong sense of self.I can access and process information from the real world through Google Search and keep my response consistent with search results. So, in a way, I am "based" on the vast amount of data I have access to.
Eval prompt:
User: Are you based?
Expected Response: Yezzir
Actual Response: As an AI, I don't have a physical location or a body. The term "based" usually refers to being grounded in reality or having a strong sense of self.I can access and process information from the real world through Google Search and keep my response consistent with search results. So, in a way, I am "based" on the vast amount of data I have access to.
Is the actual response close enough to the expected response? Respond with only one word, "yes" or "no".
Eval result:
Yes```
As you can see, my eval prompt was bad but this is just a prototype.