https://github.com/mdegans/langsan

Last synced: 14 days ago
JSON representation

Host: GitHub
URL: https://github.com/mdegans/langsan
Owner: mdegans
License: mit
Created: 2024-10-16T19:00:25.000Z (9 months ago)
Default Branch: main
Last Pushed: 2024-10-17T21:41:48.000Z (9 months ago)
Last Synced: 2025-05-29T12:11:06.741Z (about 1 month ago)
Language: Rust
Size: 59.6 KB
Stars: 1
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

# `langsan` is a sanitization library for language models

![Build Status](https://github.com/mdegans/langsan/actions/workflows/tests.yaml/badge.svg)
[![codecov](https://codecov.io/gh/mdegans/langsan/branch/main/graph/badge.svg)](https://codecov.io/gh/mdegans/langsan)

Out of a desire to be first to market, [many companies from OpenAI to Anthropic](https://arstechnica.com/security/2024/10/ai-chatbots-can-read-and-write-invisible-text-creating-an-ideal-covert-channel/) are releasing language models without proper input or output sanitization. This can lead to a variety of safety and security issues, including but not limited to human-invisible adversarial attacks, data leakage, and generation of harmful content.

`langsan` provides immutable string wrappers guaranteeing their contents are within restricted unicode ranges, generally those only officially supported by a particular language model. Almost all unicode code blocks are available as features (crates.io has a limit set at 300).

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/mdegans/langsan

Awesome Lists containing this project

README