Projects in Awesome Lists tagged with trustworthy-ai

https://github.com/trusted-ai/adversarial-robustness-toolbox

Adversarial Robustness Toolbox (ART) - Python Library for Machine Learning Security - Evasion, Poisoning, Extraction, Inference - Red and Blue Teams

adversarial-attacks adversarial-examples adversarial-machine-learning ai artificial-intelligence attack blue-team evasion extraction inference machine-learning poisoning privacy python red-team trusted-ai trustworthy-ai

Last synced: 16 Dec 2024

https://github.com/Trusted-AI/adversarial-robustness-toolbox

Adversarial Robustness Toolbox (ART) - Python Library for Machine Learning Security - Evasion, Poisoning, Extraction, Inference - Red and Blue Teams

adversarial-attacks adversarial-examples adversarial-machine-learning ai artificial-intelligence attack blue-team evasion extraction inference machine-learning poisoning privacy python red-team trusted-ai trustworthy-ai

Last synced: 28 Oct 2024

https://github.com/Giskard-AI/giskard

🐢 Open-Source Evaluation & Testing for ML & LLM systems

ai-red-team ai-safety ai-security ai-testing ethical-artificial-intelligence evaluation-framework fairness-ai llm llm-eval llm-evaluation llm-security llmops ml-safety ml-testing ml-validation mlops rag-evaluation red-team-tools responsible-ai trustworthy-ai

Last synced: 08 Nov 2024

https://github.com/giskard-ai/giskard

🐢 Open-Source Evaluation & Testing for ML & LLM systems

ai-red-team ai-safety ai-security ai-testing ethical-artificial-intelligence evaluation-framework fairness-ai llm llm-eval llm-evaluation llm-security llmops ml-safety ml-testing ml-validation mlops rag-evaluation red-team-tools responsible-ai trustworthy-ai

Last synced: 17 Dec 2024

https://github.com/zjunlp/easyedit

[ACL 2024] An Easy-to-use Knowledge Editing Framework for LLMs.

artificial-intelligence baichuan chatgpt easyedit efficient gpt knowledge-editing knowlm large-language-models llama llama2 mistral mmedit model-editing natural-language-processing open-source-project safeedit tool trustworthy-ai unlearning

Last synced: 22 Dec 2024

https://github.com/zjunlp/EasyEdit

[知识编辑] [ACL 2024] An Easy-to-use Knowledge Editing Framework for LLMs.

artificial-intelligence baichuan chatgpt easyedit efficient gpt knowledge-editing knowlm large-language-models llama llama2 mistral mmedit model-editing natural-language-processing open-source-project safeedit tool trustworthy-ai unlearning

Last synced: 31 Oct 2024

https://github.com/johnsnowlabs/langtest

Deliver safe & effective language models

ai-safety ai-testing artificial-intelligence benchmark-framework benchmarks ethics-in-ai large-language-models llm llm-as-evaluator llm-evaluation-toolkit llm-test llm-testing ml-safety ml-testing mlops model-assessment nlp responsible-ai trustworthy-ai

Last synced: 16 Dec 2024

https://github.com/howiehwong/trustllm

[ICML 2024] TrustLLM: Trustworthiness in Large Language Models

ai benchmark dataset evaluation large-language-models llm natural-language-processing nlp pypi-package toolkit trustworthy-ai trustworthy-machine-learning

Last synced: 20 Dec 2024

https://github.com/HowieHwong/TrustLLM

[ICML 2024] TrustLLM: Trustworthiness in Large Language Models

ai benchmark dataset evaluation large-language-models llm natural-language-processing nlp pypi-package toolkit trustworthy-ai trustworthy-machine-learning

Last synced: 16 Nov 2024

https://github.com/THUYimingLi/BackdoorBox

The open-sourced Python toolbox for backdoor attacks and defenses.

backdoor-attacks backdoor-defenses backdoor-learning trustworthy-ai trustworthy-machine-learning

Last synced: 30 Oct 2024

https://github.com/aiverify-foundation/moonshot

Moonshot - A simple and modular tool to evaluate and red-team any LLM application.

benchmarking evaluation-framework llm red-teaming trustworthy-ai

Last synced: 13 Nov 2024

https://github.com/liuzuxin/fsrl

🚀 A fast safe reinforcement learning library in PyTorch

cpo cvpo decision-making library ppo pytorch reinforcement-learning robotics sac safe-rl safety-critical trpo trustworthy-ai

Last synced: 17 Dec 2024

https://github.com/yunqing-me/AttackVLM

[NeurIPS-2023] Annual Conference on Neural Information Processing Systems

adversarial-attack deep-generative-model foundation-models generative-ai image-to-text-generation large-language-models text-to-image-generation trustworthy-ai vision-language-model

Last synced: 02 Dec 2024

https://github.com/thu-ml/mmtrusteval

A toolbox for benchmarking trustworthiness of multimodal large language models (MultiTrust, NeurIPS 2024 Track Datasets and Benchmarks)

benchmark claude fairness gpt-4 mllm multi-modal privacy robustness safety toolbox trustworthy-ai truthfulness

Last synced: 16 Dec 2024

https://github.com/thu-ml/MMTrustEval

A toolbox for benchmarking trustworthiness of multimodal large language models (MultiTrust, NeurIPS 2024 Track Datasets and Benchmarks)

benchmark claude fairness gpt-4 mllm multi-modal privacy robustness safety toolbox trustworthy-ai truthfulness

Last synced: 02 Dec 2024

https://github.com/ffhibnese/Model-Inversion-Attack-ToolBox

A comprehensive toolbox for model inversion attacks and defenses, which is easy to get started.

benchmarks machine-learning model-inversion model-inversion-attacks privacy toolbox trustworthy-ai

Last synced: 09 Nov 2024

https://github.com/sleeepeer/PoisonedRAG

[USENIX Security 2025] PoisonedRAG: Knowledge Corruption Attacks to Retrieval-Augmented Generation of Large Language Models

ai machine-learning rag retrieval-augmented-generation security trustworthy-ai

Last synced: 02 Dec 2024

https://github.com/dlmacedo/distinction-maximization-loss

A project to improve out-of-distribution detection (open set recognition) and uncertainty estimation by changing a few lines of code in your project! Perform efficient inferences (i.e., do not increase inference time) without repetitive model training, hyperparameter tuning, or collecting additional data.

ai-safety anomaly-detection classification deep-learning machine-learning novelty-detection ood ood-detection open-set open-set-recognition osr out-of-distribution out-of-distribution-detection pytorch robust-machine-learning trustworthy-ai trustworthy-machine-learning uncertainty-estimation

Last synced: 05 Nov 2024

https://github.com/richard-peng-xia/CARES

[arXiv'24 & ICMLW'24] CARES: A Comprehensive Benchmark of Trustworthiness in Medical Vision Language Models

large-vision-language-model medical-multimodal-learning trustworthy-ai vision-language-model

Last synced: 02 Dec 2024

https://github.com/aimagelab/safe-clip

Safe-CLIP: Removing NSFW Concepts from Vision-and-Language Models. ECCV 2024

eccv2024 image-to-text nsfw retrieval safety text-to-image trustworthy-ai vision-and-language

Last synced: 07 Nov 2024

https://github.com/LucasFidon/trustworthy-ai-fetal-brain-segmentation

Trustworthy AI method based on Dempster-Shafer theory - application to fetal brain 3D T2w MRI segmentation

deep-learning fetal-mri segmentation trustworthy-ai trustworthy-machine-learning

Last synced: 13 Nov 2024

https://github.com/dlmacedo/robust-deep-learning

A project to train your model from scratch or fine-tune a pretrained model using the losses provided in this library to improve out-of-distribution detection and uncertainty estimation performances. Calibrate your model to produce enhanced uncertainty estimations. Detect out-of-distribution data using the defined score type and threshold.

anomaly-detection classification deep-learning deep-neural-networks machine-learning novelty-detection ood-detection open-set open-set-recognition out-of-distribution out-of-distribution-detection pytorch robust-deep-learning robust-machine-learning trustworthy-ai trustworthy-machine-learning uncertainty-calibration uncertainty-estimation uncertainty-neural-networks