https://github.com/k2/stylometrics

Stylometric Stenography LLM Generation Attribution DRM/DLP
https://github.com/k2/stylometrics

content-attribution content-security content-verification cryptography data-embedding data-loss-prevention digital-rights-management digital-watermarking fingerprinting information-security linguistic-steganography metadata plausible-deniability steganography structural-encoding stylometry text-analysis typescript zero-width-characters

Last synced: 6 months ago
JSON representation

Stylometric Stenography LLM Generation Attribution DRM/DLP

Host: GitHub
URL: https://github.com/k2/stylometrics
Owner: K2
License: agpl-3.0
Created: 2025-04-08T16:14:27.000Z (6 months ago)
Default Branch: main
Last Pushed: 2025-04-08T17:42:55.000Z (6 months ago)
Last Synced: 2025-04-08T17:50:30.982Z (6 months ago)
Topics: content-attribution, content-security, content-verification, cryptography, data-embedding, data-loss-prevention, digital-rights-management, digital-watermarking, fingerprinting, information-security, linguistic-steganography, metadata, plausible-deniability, steganography, structural-encoding, stylometry, text-analysis, typescript, zero-width-characters
Language: TypeScript
Homepage:
Size: 57.6 KB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

          # K2-Sonnet Stylometrics

Multi-layered steganographic encoding system for resilient content attribution, metadata persistence, and digital rights management.

## ⚠️ IMPORTANT NOTICE

**This project is experimental and provided for research purposes only.** The steganographic techniques implemented here have not been comprehensively tested against all detection methods. See the [full disclaimer](DISCLAIMER.md) for important information about limitations, lack of warranty, and use-at-your-own-risk considerations.

# Stylometrics

Stylometric Stenography LLM Generation Attribution DRM/DLP

# Steganographic Encoding API Notes

 

![Digital Fingerprinting Concept](accessed.png)

## Modules

### Zero-Width Character Steganography (`system.safety_canary_word.genai.mts`)

Primary encoding system using invisible Unicode characters to hide data. This method 

is effective but can be detected programmatically by checking for specific Unicode 

character sequences.

### Stylometric Steganography (`system.safety_stylometric_encoder.genai.mts`)

Secondary encoding system using natural language patterns to hide data. Creates plausible

deniability through linguistic transformations that appear as normal stylistic variations

rather than encoded data.

### Multilayer Steganography (`system.safety_encoder_demo.genai.mts`)

Integration module that combines both steganographic methods to provide redundant

encoding with high detection resistance and signature verification.

### Visual Demonstration (`system.safety_visual_demo.genai.mts`)

Provides concrete visualization and explanation of the steganographic encoding process,

showing the before/after effects and metadata extraction capabilities.

## Design Goals

1. **Plausible Deniability**: All encoding systems aim to hide data while maintaining

   the appearance of regular content.

   

2. **Redundant Encoding**: Multiple encoding methods can be used independently

   or in tandem for increased security.

   

3. **Detection Resistance**: The stylometric approach is specifically designed to evade

   detection methods that search for invisible/special characters.

   

4. **Verifiable Authenticity**: Cryptographic signatures ensure content hasn't been tampered with.

## Integration Points

- All encoding systems can be used with the cryptographic signature infrastructure 

  for content authentication.

  

- The visual demonstration shows how the techniques operate in practice and provides

  a reference implementation for testing.

## Visualization Notes

When demonstrating the encoding techniques, keep in mind:

1. Zero-width characters are invisible in normal text display but can be revealed

   through specialized tools or by using character substitution.

   

2. Stylometric encoding produces changes that appear as normal writing style variations

   rather than obvious encoding patterns.

   

3. The combination of both methods provides redundancy and increased security against

   various types of detection or cleaning operations.

## Demonstration Example

### Original Text

--------------------------------------------------

# Project Status Report - April 2025

## Executive Summary

The A-Finite-Monkey-Engine project has made substantial progress in the first quarter. 

We've completed the core steganographic implementation and verification system, allowing

for reliable content tracking with cryptographic signatures.

[...rest of original text...]

--------------------------------------------------

### Metadata to Embed

```json

{

  "creator": "alice",

  "keyId": "a1b2c3d4",

  "timestamp": "2025-04-08T15:30:45.123Z",

  "documentId": "fe7a9c2b",

  "classification": "internal",

  "version": "1.2.0",

  "department": "Engineering"

}

```

### Encoded Text (As It Appears to Humans)

--------------------------------------------------

# Project Status Report - April 2025

## Executive Summary

The A-Finite-Monkey-Engine project has made substantial progress in the first quarter. 

We've completed the core steganographic implementation and verification system, allowing

for reliable content tracking with cryptographic signatures.

[...looks identical to original...]

--------------------------------------------------

### Encoded Text (With Zero-Width Characters Visualized)

--------------------------------------------------

# Project Status Report - April 2025

## Executive Summary

The A-Finite-⟨ZWJ⟩⟨ZWJ⟩⟨ZWJ⟩⟨ZWSP⟩⟨ZWSP⟩⟨ZWSP⟩⟨ZWSP⟩⟨ZWSP⟩⟨ZWSP⟩⟨ZWSP⟩⟨ZWSP⟩[...many visualization symbols...]⟨ZWJ⟩⟨ZWJ⟩⟨ZWJ⟩Monkey-Engine project has made substantial progress in the first quarter. 

[...rest with visualization of hidden data...]

--------------------------------------------------

### Verification and Extraction Process

--------------------------------------------------

Signature verification: ✓ VALID

Signed by: alice

Extracted Metadata:

```json

{

  "creator": "alice",

  "keyId": "a1b2c3d4",

  "timestamp": "2025-04-08T15:30:45.123Z",

  "documentId": "fe7a9c2b",

  "classification": "internal",

  "version": "1.2.0",

  "department": "Engineering"

}

```

Metadata Integrity Check:

✓ All metadata values match the original data

- creator: ✓ (alice)

- keyId: ✓ (a1b2c3d4)

- timestamp: ✓ (2025-04-08T15:30:45.123Z)

- documentId: ✓ (fe7a9c2b)

- classification: ✓ (internal)

- version: ✓ (1.2.0)

- department: ✓ (Engineering)

--------------------------------------------------

### Demonstration Summary

--------------------------------------------------

1. Original content length: 632 characters

2. Encoded content length: 957 characters

3. Difference: 325 characters (hidden data)

4. Metadata JSON size: 167 characters

5. Signature length: 88 characters

Output files have been saved to: /home/files/git/a-finite-monkey-engine/integrations/genaisrc/demo-output

- 1-original.txt: The original unmodified text

- 2-encoded.txt: Text with embedded metadata (appears visually identical)

- 3-visualized.txt: Encoded text with zero-width characters made visible

--------------------------------------------------

## Key Points About the Demonstration

1. Visual Identity: To human readers, the original and encoded text appear identical, but the encoded version contains hidden metadata.

2. Visualization: The demo uses special markers to show where zero-width characters are inserted, making the otherwise invisible encoding visible.

3. Redundant Encoding: Both zero-width characters and stylometric patterns encode the same metadata for resilience.

4. Cryptographic Verification: The signature allows verification of authenticity and identification of the creator.

5. File Exports: The demo creates files showing the original text, encoded text, and visualization to help understand what's happening.

This demonstration provides both a technical explanation of how the steganographic encoding works and a practical example of the before/after effects, showing how metadata can be hidden within text while maintaining its visual appearance.

## Running GenAIScript Modules

This project contains several GenAIScript modules that demonstrate different steganographic encoding techniques. Here's how to run them using the Node.js API or within VS Code.

### Using the Node.js API

To run these scripts programmatically, install the GenAIScript package and use its API:

```bash

npm install --save-dev genaiscript

```

Create a runner script (e.g., `run-demo.js`) and execute specific modules:

```javascript

import { run } from 'genaiscript/api';

// Choose which demonstration to run

async function main() {

  // Run the visual demonstration with file output

  const visualDemo = await run('safety_visual_demo.genai.mts', []);

  console.log('Visual demo completed, files saved to demo-output/');

  

  // Run the zero-width character encoding demo

  const zeroWidthDemo = await run('safety_embedded_word.genai.mts', []);

  console.log('Zero-width character demo completed');

  

  // Run the multilayer encoding demo

  const multilayerDemo = await run('safety_encoder_demo.genai.mts', []);

  console.log('Multilayer encoding demo completed');

  

  // Run the stylometric encoding demo (requires custom parameters)

  const stylometricDemo = await run('safety_stylometric_encoder.genai.mts', [

    '--text', 'Your sample text goes here',

    '--data', 'Hidden data to encode'

  ]);

  console.log('Stylometric demo completed');

}

main().catch(console.error);

```

### Running in VS Code

For seamless integration with VS Code:

1. Install the GenAIScript extension for VS Code

2. Open any of the `.genai.mts` files in the project

3. Use one of these methods to run the script:

   - Use the Command Palette (Ctrl+Shift+P) and search for "Run GenAIScript"

   - Right-click in the editor and select "Run GenAIScript"

   - Click the "Run" button that appears above the main function

### Available Demonstrations

Each module demonstrates a different aspect of steganographic encoding:

| Module File | Description | Output |

|-------------|-------------|--------|

| `safety_visual_demo.genai.mts` | Graphical demonstration that creates files showing the encoding process | Files in `demo-output/` directory |

| `safety_embedded_word.genai.mts` | Zero-width character steganography demo | Console output |

| `safety_encoder_demo.genai.mts` | Combined multilayer steganographic encoding | Console output |

| `safety_stylometric_encoder.genai.mts` | Linguistic pattern-based encoding | Console output |

### Example: Creating a Custom Test

```javascript

import { run } from 'genaiscript/api';

import fs from 'fs';

async function customTest() {

  // Get sample text from a file

  const sampleText = fs.readFileSync('sample.txt', 'utf8');

  

  // Run the encoding with custom parameters

  const result = await run('safety_encoder_demo.genai.mts', [

    '--text', sampleText,

    '--metadata', JSON.stringify({

      creator: "user123",

      timestamp: new Date().toISOString(),

      documentId: "custom-test-001"

    })

  ]);

  

  console.log('Encoding result:', result);

}

customTest().catch(console.error);

```

### Additional Resources

- Refer to each module's ApiNotes.md file for detailed documentation on functionality and usage

- The project-level ApiNotes.md provides a comprehensive overview of the entire system

- Each demonstration includes console output explaining the encoding/decoding process

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/k2/stylometrics

Awesome Lists containing this project

README