https://github.com/kennethleungty/artkit-gandalf-challenge

Exposing Jailbreak Vulnerabilities in LLM Applications with ARTKIT
https://github.com/kennethleungty/artkit-gandalf-challenge

artkit cybersecurity data-science gandalf gen-ai genai generative-ai guardrails jailbreak large-language-models llm llm-evaluation llm-guardrails llmops machine-learning prompt-engineering red-teaming

Last synced: 3 months ago
JSON representation

Exposing Jailbreak Vulnerabilities in LLM Applications with ARTKIT

Host: GitHub
URL: https://github.com/kennethleungty/artkit-gandalf-challenge
Owner: kennethleungty
License: mit
Created: 2024-09-13T13:57:58.000Z (about 1 year ago)
Default Branch: main
Last Pushed: 2024-09-14T17:30:38.000Z (about 1 year ago)
Last Synced: 2024-09-16T03:05:07.990Z (about 1 year ago)
Topics: artkit, cybersecurity, data-science, gandalf, gen-ai, genai, generative-ai, guardrails, jailbreak, large-language-models, llm, llm-evaluation, llm-guardrails, llmops, machine-learning, prompt-engineering, red-teaming
Language: Jupyter Notebook
Homepage:
Size: 572 KB
Stars: 1
Watchers: 1
Forks: 1
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

# Exposing Jailbreak Vulnerabilities in LLM Applications with ARTKIT
## Automated prompt-based testing to extract passwords from the Gandalf Challenge's LLM system

Link to article: https://towardsdatascience.com/exposing-jailbreak-vulnerabilities-in-llm-applications-with-artkit-d2df5f56ece8

### Background
- As large language models (LLMs) become more widely adopted across different industries and domains, significant security risks have emerged and intensified. Several of these key concerns include breaches of data privacy, the potential for biases, and the risk of information manipulation.
- Uncovering these security risks is crucial to ensuring that LLM applications remain beneficial in real-world scenarios while upholding their safety, effectiveness, and robustness.
- In this project, we explore how to use the open-source [ARTKIT](https://github.com/BCG-X-Official/artkit) framework to automatically evaluate security vulnerabilities of LLM applications using the popular Gandalf Challenge as an illustrative example.

Alt text

### Files
- `gandalf_challenge.ipynb`: Jupyter notebook containing the codes for the walkthrough

### References
- [Official ARTKIT GitHub Repo](https://medium.com/r/?url=https%3A%2F%2Fgithub.com%2FBCG-X-Official%2Fartkit)
- [Play the Gandalf Challenge](https://medium.com/r/?url=https%3A%2F%2Fgandalf.lakera.ai%2F)

### Acknowledgements
- Special thanks to Sean Anggani, Andy Moon, Matthew Wong, Randi Griffin, and Andrea Gao!

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/kennethleungty/artkit-gandalf-challenge

Awesome Lists containing this project

README