https://github.com/user1342/Tomato

LLM steganography with minimum-entropy coupling - Hiding encrypted messages in natural language.
https://github.com/user1342/Tomato

encryption hidden-message large-language-models llm machine-learning machine-learning-algorithms steganography steganography-algorithms

Last synced: 6 months ago
JSON representation

LLM steganography with minimum-entropy coupling - Hiding encrypted messages in natural language.

Host: GitHub
URL: https://github.com/user1342/Tomato
Owner: user1342
License: mit
Created: 2024-09-08T08:02:55.000Z (10 months ago)
Default Branch: main
Last Pushed: 2024-09-09T06:14:37.000Z (10 months ago)
Last Synced: 2025-01-09T03:08:18.824Z (6 months ago)
Topics: encryption, hidden-message, large-language-models, llm, machine-learning, machine-learning-algorithms, steganography, steganography-algorithms
Language: Python
Homepage:
Size: 150 KB
Stars: 78
Watchers: 2
Forks: 6
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

StarryDivineSky - user1342/Tomato - 在自然语言中隐藏加密消息。如何运作：LLM 生成的封面文本：LLM 像往常一样，根据提示生成连贯的文本。使用 MEC 嵌入：MEC 用于将隐藏消息（密文）的概率分布与LLM，这种耦合最小化了联合熵，确保隐写文本（带有嵌入消息的封面文本）保留了自然语言的统计属性，使隐藏的消息实际上无法被检测到。解码过程：在解码过程中，LLM 通过提供隐写文本的上下文感知解释来提供帮助，然后反向使用 MEC 将隐藏的消息与隐藏文本分离，该过程利用嵌入过程中使用的相同概率分布，确保在不影响隐藏文本完整性的情况下准确提取消息。此方法可确保隐藏的消息无缝集成到文本中，并且可以在以后安全、精确地检索，同时将检测风险降至最低。 (A01_文本生成_文本对话 / 其他_文本生成_文本对话)

README

🤖 Hide text within other natural language text 🍅

**Tomato is a proof of concept steganography tool that utilises minimum-entropy coupling code provided by [ssokota](https://github.com/ssokota/mec/tree/master)! ⭐**

# 🧠 How It Works
- **LLM-Generated Cover Text:** The LLM, as normal, generates coherent text based off a prompt.
- **Embedding with MEC:** MEC is applied to merge the probability distribution of the hidden message (ciphertext) with the distribution of the LLM-generated covertext. This coupling minimizes the joint entropy, ensuring that the stegotext (covertext with the embedded message) retains the statistical properties of natural language, making the hidden message effectively undetectable.
- **Decoding Process:** During decoding, the LLM assists by providing a context-aware interpretation of the stegotext. MEC is then used in reverse to decouple the hidden message from the covertext. The process leverages the same probability distributions used during embedding, ensuring that the message is accurately extracted without compromising the integrity of the covertext.

This method ensures that the hidden message is seamlessly integrated into the text and can be securely and precisely retrieved later, with minimal risk of detection.

# 📙 Example

```python
from tomato import Encoder

encoder = Encoder()

plaintext = "hello"
formatted_stegotext, stegotext = encoder.encode(plaintext)
estimated_plaintext, estimated_bytetext = encoder.decode(stegotext)
```

Output:
```bash
Stegotext: After hours, I like to walk. Sometimes I will travel by train to a station I’ve never been, and walk from there in no particular direction. As I walk, I find the world reveals itself, in small, inexplicable ways. Tonight, a rabbit darted across the track ahead of the train with such urgency I thought, for a moment, it was a fox, or something more menacing. And when I pulled my phone out to
------
Decoded Plaintext: helloAAAAAAAAAA # The A's are padding up to the encryption key length
```

# ⚙️ Setup
Tomato required Nvidia CUDA. Follow the steps below:
- Ensure your Nvidia drivers are up to date: https://www.nvidia.com/en-us/geforce/drivers/
- Install the appropriate dependancies from here: https://pytorch.org/get-started/locally/
- Validate CUDA is installed correctly by running the following and being returned a prompt ```python -c "import torch; print(torch.rand(2,3).cuda())"```

Install the dependencies using:

```bash
pip install git+https://github.com/user1342/mec
```
```bash
git clone https://github.com/user1342/Tomato.git
cd tomato
pip install -r requirements.txt
pip install -e .
```

# 🏃 Running
You can use the Tomato Encoder/Decoder Tool directly from the command line. Here are the available commands:

## Encode a Message
To encode a plaintext message into stegotext:

```bash
tomato-encode.exe "Your secret message here" --cipher_len 20 --shared_private_key 123abc... --prompt "Good evening."
```

Output:
```
Stegotext: [Your encoded message here]
```

## Decode a Message
To decode a stegotext back into its original plaintext:

```bash
tomato-decode.exe "Your stegotext here" --cipher_len 20 --shared_private_key 123abc... --prompt "Good evening."
```

Output:
```
Estimated Plaintext: [Your decoded plaintext]
Estimated Bytetext: [Your decoded bytetext]
```

## Programatic Example
Checkout the [example playbook](https://github.com/user1342/Tomato/blob/main/example.ipynb)! For a quick demonstration, you can try encoding and decoding a simple message using the following code snippet:

```python
from tomato import Encoder

encoder = Encoder()

plaintext = "I am a hidden code"
formatted_stegotext, stegotext = encoder.encode(plaintext)
estimated_plaintext, estimated_bytetext = encoder.decode(stegotext)

print(formatted_stegotext)
print("------")
print(estimated_plaintext)
```

# 🛡️ Customization Options
The Tomato Encoder/Decoder Tool offers several customizable parameters:

* cipher_len: Length of the cipher (default is 15).
* shared_private_key: Shared private key in hex format. If not provided, a random key will be generated.
* prompt: Prompt for the language model (default is "Good evening.").
* max_len: Maximum length of the covertext (default is 100).
* temperature: Sampling temperature for the language model (default is 1.0).
* k: The k parameter for the language model (default is 50).
* model_name: Name of the language model to be used (default is "unsloth/mistral-7b-instruct-v0.3-bnb-4bit").

# 🙏 Contributions
Tomato is an open-source project and welcomes contributions from the community. If you would like to contribute to Tomoto, please follow these guidelines:

- Fork the repository to your own GitHub account.
- Create a new branch with a descriptive name for your contribution.
- Make your changes and test them thoroughly.
- Submit a pull request to the main repository, including a detailed description of your changes and any relevant documentation.
- Wait for feedback from the maintainers and address any comments or suggestions (if any).
- Once your changes have been reviewed and approved, they will be merged into the main repository.

# ⚖️ Code of Conduct
Tomato follows the Contributor Covenant Code of Conduct. Please make sure to review and adhere to this code of conduct when contributing to Tomato.

# 🐛 Bug Reports and Feature Requests
If you encounter a bug or have a suggestion for a new feature, please open an issue in the GitHub repository. Please provide as much detail as possible, including steps to reproduce the issue or a clear description of the proposed feature. Your feedback is valuable and will help improve Monocle for everyone.

# 📜 License
MIT

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/user1342/Tomato

Awesome Lists containing this project

README