https://github.com/CarperAI/Polygraph
RLHF Mechanistic Interpretability and Deception
https://github.com/CarperAI/Polygraph
Last synced: 22 days ago
JSON representation
RLHF Mechanistic Interpretability and Deception
- Host: GitHub
- URL: https://github.com/CarperAI/Polygraph
- Owner: CarperAI
- License: mit
- Created: 2023-02-09T22:39:48.000Z (about 2 years ago)
- Default Branch: main
- Last Pushed: 2023-07-14T21:28:32.000Z (almost 2 years ago)
- Last Synced: 2025-02-25T12:23:17.556Z (about 2 months ago)
- Size: 6.28 MB
- Stars: 6
- Watchers: 4
- Forks: 2
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- awesome-human-in-the-loop - Github - CarperAI/Polygraph
README
# Polygraph
RLHF Mechanistic Interpretability and Deception