https://github.com/mdegans/langsan
https://github.com/mdegans/langsan
Last synced: 4 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/mdegans/langsan
- Owner: mdegans
- License: mit
- Created: 2024-10-16T19:00:25.000Z (8 months ago)
- Default Branch: main
- Last Pushed: 2024-10-17T21:41:48.000Z (7 months ago)
- Last Synced: 2025-01-05T04:34:19.724Z (5 months ago)
- Language: Rust
- Size: 59.6 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# `langsan` is a sanitization library for language models

[](https://codecov.io/gh/mdegans/langsan)Out of a desire to be first to market, [many companies from OpenAI to Anthropic](https://arstechnica.com/security/2024/10/ai-chatbots-can-read-and-write-invisible-text-creating-an-ideal-covert-channel/) are releasing language models without proper input or output sanitization. This can lead to a variety of safety and security issues, including but not limited to human-invisible adversarial attacks, data leakage, and generation of harmful content.
`langsan` provides immutable string wrappers guaranteeing their contents are within restricted unicode ranges, generally those only officially supported by a particular language model. Almost all unicode code blocks are available as features (crates.io has a limit set at 300).