Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
Projects in Awesome Lists tagged with ai-alignment
A curated list of projects in awesome lists tagged with ai-alignment .
https://github.com/agencyenterprise/PromptInject
PromptInject is a framework that assembles prompts in a modular fashion to provide a quantitative analysis of the robustness of LLMs to adversarial prompt attacks. 🏆 Best Paper Awards @ NeurIPS ML Safety Workshop 2022
adversarial-attacks agi agi-alignment ai-alignment ai-safety chain-of-thought gpt-3 language-models large-language-models machine-learning ml-safety prompt-engineering
Last synced: 31 Oct 2024
https://github.com/agencyenterprise/promptinject
PromptInject is a framework that assembles prompts in a modular fashion to provide a quantitative analysis of the robustness of LLMs to adversarial prompt attacks. 🏆 Best Paper Awards @ NeurIPS ML Safety Workshop 2022
adversarial-attacks agi agi-alignment ai-alignment ai-safety chain-of-thought gpt-3 language-models large-language-models machine-learning ml-safety prompt-engineering
Last synced: 15 Dec 2024
https://github.com/tomekkorbak/pretraining-with-human-feedback
Code accompanying the paper Pretraining Language Models with Human Preferences
ai-alignment ai-safety decision-transformers gpt language-models pretraining reinforcement-learning rlhf
Last synced: 19 Dec 2024
https://github.com/giskard-ai/awesome-ai-safety
📚 A curated list of papers & technical articles on AI Quality & Safety
ai ai-alignment ai-quality ai-safety artificial-intelligence awesome awesome-list computer-vision ethical-ai llm llmops machine-learning ml ml-safety ml-testing mlops model-testing model-validation natural-language-processing robustness
Last synced: 13 Nov 2024
https://github.com/EzgiKorkmaz/adversarial-reinforcement-learning
Reading list for adversarial perspective and robustness in deep reinforcement learning.
adversarial-attacks adversarial-machine-learning adversarial-policies adversarial-reinforcement-learning ai-alignment ai-safety deep-reinforcement-learning explainable-machine-learning explainable-rl machine-learning-safety meta-reinforcement-learning multiagent-reinforcement-learning reinforcement-learning-generalization reinforcement-learning-safety responsible-ai robust-adversarial-reinforcement-learning robust-machine-learning robust-reinforcement-learning safe-reinforcement-learning safe-rlhf
Last synced: 30 Oct 2024
https://github.com/rlhflow/directional-preference-alignment
Directional Preference Alignment
ai-alignment large-language-models rlhf
Last synced: 15 Nov 2024