https://github.com/purestorage-openconnect/embedding-security

Sample scripts to simulate embedding security scenarios
https://github.com/purestorage-openconnect/embedding-security

Last synced: 4 months ago
JSON representation

Sample scripts to simulate embedding security scenarios

Host: GitHub
URL: https://github.com/purestorage-openconnect/embedding-security
Owner: PureStorage-OpenConnect
Created: 2025-08-19T10:31:48.000Z (6 months ago)
Default Branch: main
Last Pushed: 2025-08-19T10:55:53.000Z (6 months ago)
Last Synced: 2025-08-19T12:39:02.089Z (6 months ago)
Language: Python
Size: 6.84 KB
Stars: 0
Watchers: 0
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# Embedding-Security

Embedding Inversion Simulation
This script demonstrates how a vector embedding, which may seem anonymous, can be reverse-engineered to reconstruct the original sensitive data it represents. It uses a real sentence-transformer model to generate embeddings and a text generation model (GPT-2) to simulate the reconstruction attack.

Requirements
Python 3.7+

pip (Python package installer)

Installation
Clone or download the repository/script.

Navigate to the script's directory in your terminal.

Install the required Python libraries using the provided requirements.txt file. Run the following command:

pip install -r requirements.txt

This will install all necessary packages, including numpy, torch, sentence-transformers, transformers, and scipy.

Running the Simulation
Once the installation is complete, you can run the simulation script directly from your terminal:

python3 simulation_embedding.py

The script will then execute the simulation, printing the original secret, the generated "anonymous" vector, the discovered semantic keywords, and the final reconstructed text to your console.

###########################################################

Second script demonstrate Data Poisoning the AI's knowledge base to skew its output or inject bias. Uploading malicious documents or compromising data feeds before they are vectorized.

python3 rag_poisoning.py

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/purestorage-openconnect/embedding-security

Awesome Lists containing this project

README