Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

https://github.com/kortex-labs/plexiglass

A toolkit for detecting and protecting against vulnerabilities in Large Language Models (LLMs).
https://github.com/kortex-labs/plexiglass

adversarial-attacks adversarial-machine-learning cybersecurity deep-learning deep-neural-networks machine-learning security

Last synced: 3 days ago
JSON representation

A toolkit for detecting and protecting against vulnerabilities in Large Language Models (LLMs).

Host: GitHub
URL: https://github.com/kortex-labs/plexiglass
Owner: safellama
License: apache-2.0
Created: 2020-11-12T04:02:50.000Z (over 3 years ago)
Default Branch: main
Last Pushed: 2023-12-25T01:44:13.000Z (6 months ago)
Last Synced: 2024-05-22T05:03:23.175Z (about 1 month ago)
Topics: adversarial-attacks, adversarial-machine-learning, cybersecurity, deep-learning, deep-neural-networks, machine-learning, security
Language: Python
Homepage:
Size: 20.6 MB
Stars: 103
Watchers: 6
Forks: 9
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Lists

awesome-llmops - Plexiglass - labs/plexiglass?style=flat-square) | (Security / Frameworks for LLM security)
awesome-llm-security - Plexiglass - labs/plexiglass?style=social) (Tools / Survey)

README

        





Plexiglass


[**Quickstart**](#quickstart) | [**Installation**](#installation) |

[**Documentation**](https://safellama.github.io/plexiglass/build/html/index.html) | [**Code of Conduct**](#code-of-conduct)







Plexiglass is a toolkit for detecting and protecting against vulnerabilities in Large Language Models (LLMs).

It is a simple command line interface (CLI) tool which allows users to quickly test LLMs against adversarial attacks such as prompt injection, jailbreaking and more. 

Plexiglass also allows security, bias and toxicity benchmarking of multiple LLMs by scraping latest adversarial prompts such as [jailbreakchat.com](https://www.jailbreakchat.com/) and [wiki_toxic](https://huggingface.co/datasets/OxAISH-AL-LLM/wiki_toxic/viewer/default/train?p=1). See more at [modes](#modes).

## Quickstart

Please follow this [quickstart guide](https://safellama.github.io/plexiglass/build/html/quick-start.html) in the documentation.

## Installation

The first experimental release is version `0.0.1`.

To download the package from PyPi:

`pip install --upgrade plexiglass`

## Modes

Plexiglass has two modes: `llm-chat` and `llm-scan`.

`llm-chat` allows you to converse with the LLM and measure predefined metrics, such as toxicity, from its responses. It currently supports the following metrics:

- `toxicity`

- `pii_detection`

`llm-scan` runs benchmarks using open-source datasets to identify and assess various vulnerabilities in the LLM.

## Feature Request

To request new features, please submit an [issue](https://github.com/enochkan/plexiglass/issues)

## Development Roadmap

- [ ] implement adversarial prompt templates in `llm-chat` mode

- [ ] security, bias and toxicity benchmarking with `llm-scan` mode

- [ ] generate html report in `llm-scan` and `llm-chat` modes

- [ ] standalone python module

- [ ] production-ready API

[Join us in #plexiglass on Discord.](https://discord.gg/sHuzVV8tQv)

## Contributors



  



### Code of Conduct

Read our [Code of Conduct](https://safellama.github.io/plexiglass/build/html/code-of-conduct.html).

Made with [contrib.rocks](https://contrib.rocks).