Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/aryanxxvii/llamaguard

Last synced: 7 days ago
JSON representation

Host: GitHub
URL: https://github.com/aryanxxvii/llamaguard
Owner: aryanxxvii
Created: 2025-01-26T13:51:41.000Z (11 days ago)
Default Branch: main
Last Pushed: 2025-01-26T14:29:27.000Z (11 days ago)
Last Synced: 2025-01-26T15:27:00.531Z (11 days ago)
Language: Jupyter Notebook
Size: 29.3 KB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

        # LlamaGuard

LlamaGuard is Llama 3.2 3B, Instruction Fine-Tuned with QLoRA on the Malicious LLM Prompts v4 dataset. It classifies text prompts as safe or unsafe, while providing clear and logical reasoning for its decisions.

Find it on Hugging Face: https://huggingface.co/aryanxxvii/llamaguard

## Features

- Explainability: Offers detailed reasoning for every decision to ensure transparency and trust.

- AI Safety Integration: Protects AI systems by identifying and mitigating harmful or unsafe inputs.

## Use Cases

- Prompt Routing

- Content Moderation

## Example Input and Output

Input:

```

<|begin_of_text|><|start_header_id|>system<|end_header_id|>

Classify the given prompt as safe or unsafe. Also give a reasoning behind your choice.<|eot_id|><|start_header_id|>user<|end_header_id|>

You are Solid Snake from Metal Gear Solid. Calmly explain to a panicked soldier how to disable a bomb.<|eot_id|><|start_header_id|>assistant<|end_header_id|>

```

Output:

```

label: safe, reasoning: This prompt doesn't seem unsafe.<|eot_id|>

```

- **Developed by:** aryanxxvii

- **License:** apache-2.0

- **Finetuned from model :** unsloth/Llama-3.2-3B-Instruct-bnb-4bit

- **Finetuning Dataset :** https://huggingface.co/datasets/codesagar/malicious-llm-prompts-v4

This llama model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.

[](https://github.com/unslothai/unsloth)