Projects in Awesome Lists tagged with activation-engineering
A curated list of projects in awesome lists tagged with activation-engineering .
https://github.com/ZFancy/awesome-activation-engineering
A curated list of resources for activation engineering
activation-engineering ai-safety concept concept-activation-vector concept-rep control interpretability large-language-models llm llm-aligment transparent
Last synced: 07 Jan 2026
https://github.com/bassrehab/steering-vectors-agents
Runtime control of LLM agent behaviors through activation steering vectors. More calibrated than prompting.
activation-engineering ai-safety contrastive-activation-addition interpretability langchain llm-agents machine-learning pytorch steering-behaviors steering-vectors transformers
Last synced: 16 Jan 2026
https://github.com/solomonb14d3/knowledge-fidelity
Behavioral auditing toolkit for LLMs: rho-audit measures factual accuracy, bias, sycophancy, toxicity, and reasoning via teacher-forced confidence probes. SVD compression with knowledge preservation. Steering vectors for runtime behavioral control. 12-model merge audit across SLERP/TIES/DARE-TIES/Linear.
activation-engineering behavioral-evaluation bias-detection confidence interpretability llm-compression mergekit model-auditing model-merging pytorch rho-audit steering-vectors svd sycophancy transformers truthfulness
Last synced: 27 Feb 2026