Projects in Awesome Lists tagged with ai-alignment
A curated list of projects in awesome lists tagged with ai-alignment .
https://github.com/emcie-co/parlant
Control GenAI interactions with power, precision, and consistency using Conversation Modeling paradigms
ai-agents ai-alignment customer-service customer-success gemini genai llama3 llm openai python
Last synced: 13 May 2025
https://github.com/agencyenterprise/promptinject
PromptInject is a framework that assembles prompts in a modular fashion to provide a quantitative analysis of the robustness of LLMs to adversarial prompt attacks. 🏆 Best Paper Awards @ NeurIPS ML Safety Workshop 2022
adversarial-attacks agi agi-alignment ai-alignment ai-safety chain-of-thought gpt-3 language-models large-language-models machine-learning ml-safety prompt-engineering
Last synced: 05 Apr 2025
https://github.com/agencyenterprise/PromptInject
PromptInject is a framework that assembles prompts in a modular fashion to provide a quantitative analysis of the robustness of LLMs to adversarial prompt attacks. 🏆 Best Paper Awards @ NeurIPS ML Safety Workshop 2022
adversarial-attacks agi agi-alignment ai-alignment ai-safety chain-of-thought gpt-3 language-models large-language-models machine-learning ml-safety prompt-engineering
Last synced: 28 Mar 2025
https://github.com/tomekkorbak/pretraining-with-human-feedback
Code accompanying the paper Pretraining Language Models with Human Preferences
ai-alignment ai-safety decision-transformers gpt language-models pretraining reinforcement-learning rlhf
Last synced: 07 May 2025
https://github.com/EzgiKorkmaz/adversarial-reinforcement-learning
Reading list for adversarial perspective and robustness in deep reinforcement learning.
adversarial-machine-learning adversarial-policies adversarial-reinforcement-learning ai-alignment ai-safety artificial-intelligence-alignment deep-reinforcement-learning explainable-machine-learning machine-learning-safety meta-reinforcement-learning multiagent-reinforcement-learning reinforcement-learning-alignment reinforcement-learning-safety responsible-ai robust-deep-reinforcement-learning robust-machine-learning robust-reinforcement-learning safe-reinforcement-learning safe-rlhf
Last synced: 26 Mar 2025
https://github.com/rlhflow/directional-preference-alignment
Directional Preference Alignment
ai-alignment large-language-models rlhf
Last synced: 09 Mar 2026
https://github.com/RLHFlow/Directional-Preference-Alignment
Directional Preference Alignment
ai-alignment large-language-models rlhf
Last synced: 24 Feb 2025
https://github.com/riceissa/aiwatch
Website to track people, organizations, and products (tools, websites, etc.) in AI safety
ai-alignment ai-safety aisafety data-portal database dataset mysql php
Last synced: 03 Feb 2026
https://github.com/dicklesworthstone/some_thoughts_on_ai_alignment
Some Thoughts on AI Alignment: Using AI to Control AI
ai ai-alignment alignment llm-aligment llm-safety
Last synced: 05 Mar 2026
https://github.com/phelps-sg/llm-cooperation
Code and materials for the paper S. Phelps and Y. I. Russell, Investigating Emergent Goal-Like Behaviour in Large Language Models Using Experimental Economics, working paper, arXiv:2305.07970, May 2023
ai-alignment ai-safety behavioral-economics economics experimental-economics experimental-psychology gametheory gpt-3 gpt-4 llm principal-agent-problem prisoners-dilemma social-dilemmas
Last synced: 16 Jan 2026
https://github.com/issdandavis/scbe-aethermoore
Geometric AI governance and evaluation framework with a 14-layer security pipeline, semantic projection, and reproducible benchmark lanes.
adversarial-ml ai-alignment ai-firewall ai-governance ai-red-team ai-safety autonomous-agents cryptography geometric-security hyperbolic-geometry llm-security machine-learning multi-agent-systems patent-pending poincare-ball post-quantum-cryptography prompt-injection runtime-governance sacred-tongues security-framework
Last synced: 15 May 2026
https://github.com/ramyalab/pluralistic-alignment
The open-source repository for PAL: Sample-Efficient Personalized Reward Modeling for Pluralistic Alignment.
ai-alignment pluralistic-alignment rlhf
Last synced: 19 Sep 2025
https://github.com/ibz-04/hudgent
Official code implementation for my ready tensor publication, an ai agent that retrieves data from an islamic website -> uses the data as alignment criteria to answer the user
ai-agent ai-alignment cython islamic-ai-agent open-source python search-agent turkish-nlp webcrawler whoosh
Last synced: 03 Oct 2025
https://github.com/levitation-opensource/bioblue
Notable runaway-optimiser-like LLM failure modes on Biologically and Economically aligned AI safety benchmarks for LLM-s with simplified observation format. The benchmark themes include multi-objective homeostasis, (multi-objective) diminishing returns, complementary goods, sustainability, multi-agent resource sharing.
ai-alignment ai-safety benchmarking complementary-goods diminishing-returns homeostasis llm-benchmarking multi-agent multi-objective python sustainability
Last synced: 10 Jul 2025
https://github.com/technickai/heart-centered-prompts
Heart-centered system prompts for AI that foster compassion and interconnection. Multiple versions available with easy integration for Claude, ChatGPT, and Python applications.
ai-alignment ai-prompts ai-safety chatgpt claude compassionate-ai conciousness cursor-ai ethical-ai prompt-engineering system-prompts
Last synced: 24 Jun 2025
https://github.com/adiled/cc-flytrap
ccft - an agentic self improvement tool
ai-alignment brainrot claude-code self-improvement system-prompt
Last synced: 02 May 2026
https://github.com/helixprojectai-code/helix-trefoil-loss
A PyTorch topological regularizer based on the Helix-TTD Constitutional Hamiltonian. Enforces phase-locked AI alignment via trefoil knot invariants to suppress drift and barren plateaus.
ai-alignment constitutional-ai helix-framework loss-functions machine-learning pytorch quantum-optimization topological-physics
Last synced: 15 May 2026
https://github.com/nguyencuong1989/daiof-framework
🌟 Digital AI Organism Framework - World's First Biological AI with Consciousness, Symphony Control & Vietnamese Integration
ai-alignment ai-framework ai-human-symbiosis artificial-intelligence biological-computing consciousness digital-organism machine-learning symphony-control vietnamese-ai
Last synced: 17 May 2026
https://github.com/mcp-tool-shop-org/aspire-ai
ASPIRE: Adversarial Student-Professor Internalized Reasoning Engine - Teaching AI through internalized mentorship with cognitive empathy, syntropy, and perception
adversarial-training ai-alignment ai-evaluation ai-training cognitive-empathy deep-learning fine-tuning llm llm-training machine-learning metacognition negentropy nlp perception python pytorch rlhf syntropy theory-of-mind transformer
Last synced: 23 Feb 2026
https://github.com/technickai/heartcentered.ai
Documentation and prompt engineering framework for AI alignment based on unity consciousness principles. Includes system prompts and examples for Claude and other LLMs.
ai-alignment ai-ethics ai-philosophy anthropic anthropic-claude claude-ai consciousness documentation emotional-intelligence heart-centered-ai llm-prompts non-dual prompt-engineering responsible-ai system-prompts unity-consciousness we-language
Last synced: 03 Feb 2026
https://github.com/pointlessai/ai-safety-research-forum
A sophisticated AI discussion system that creates and manages dynamic AI personalities with evolving traits, relationships, and conversation styles, enabling collaborative discussions and research.
ai ai-alignment ai-research ai-safety gpt
Last synced: 14 Apr 2025
https://github.com/veeara282/alignment-jam-2024may
Code for our May 2024 AI security evaluation research sprint project
Last synced: 04 Oct 2025
https://github.com/dancinlab/hexa-codex
📚 AI knowledge substrate — alignment·safety·welfare·training·inference·multimodal 17-verb (4 groups).
ai ai-alignment ai-safety cognitive-architecture hexa-family interpretability llm machine-learning n6-invariant rlhf
Last synced: 24 May 2026
https://github.com/genbounty/ai-safety-research-forum
A sophisticated AI discussion system that creates and manages dynamic AI personalities with evolving traits, relationships, and conversation styles, enabling collaborative discussions and research.
ai ai-alignment ai-research ai-safety gpt
Last synced: 01 Jul 2025
https://github.com/haku-field/observations-for-ai
Observational records intended for non-human interpretation.
ai-alignment dataset interpretability machine-perception observation
Last synced: 13 Jan 2026
https://github.com/fabioc-aloha/alex_act_edition
ACT-Edition brain template for AI coding assistants — critical thinking, epistemic calibration, and structured reasoning
act act-framework agent-skills ai-agent ai-alignment ai-assistant ai-safety anti-hallucination claude-code claude-skills cognitive-architecture copilot copilot-skills critical-thinking cursor-skills epistemic-integrity github-copilot prompt-engineering responsible-ai vscode
Last synced: 03 Jun 2026
https://github.com/z0u/ex-preppy
Prescriptive representation engineering experiments
ai ai-alignment ai-safety concept-anchoring curriculum-learning latent-space
Last synced: 14 Oct 2025
https://github.com/rubix982/80k-hours
Long-term thinking at the intersection of AI, cybersecurity, and digital equity. A personal roadmap for meaningful impact.
ai-alignment cybersecurity digital-rights ethical
Last synced: 01 Feb 2026
https://github.com/biological-alignment-benchmarks/.github
Readme for Biological and Economical Alignment Benchmarks
ai-alignment ai-safety ai-safety-gridworlds artificial-intelligence benchmarking diminishing-returns gridworld homeostasis llm-benchmarking marl morl multi-agent multi-objective pettingzoo pluralism reinforcement-learning-environments rl runaway sustainability utility-functions
Last synced: 18 Apr 2026
https://github.com/adiled/ccft
ccft - an agentic self improvement tool
ai-alignment brainrot claude-code self-improvement system-prompt
Last synced: 06 Jun 2026