An open API service indexing awesome lists of open source software.

https://github.com/carperai/polygraph

RLHF Mechanistic Interpretability and Deception
https://github.com/carperai/polygraph

Last synced: 10 months ago
JSON representation

RLHF Mechanistic Interpretability and Deception

Awesome Lists containing this project

README

          

# Polygraph
RLHF Mechanistic Interpretability and Deception