https://github.com/blues/padlock
A post-quantum encoding utility for backups & border-crossings, vibe-coded by Ray Ozzie and his Mom.
https://github.com/blues/padlock
Last synced: 5 months ago
JSON representation
A post-quantum encoding utility for backups & border-crossings, vibe-coded by Ray Ozzie and his Mom.
- Host: GitHub
- URL: https://github.com/blues/padlock
- Owner: blues
- Created: 2025-04-21T11:09:13.000Z (about 1 year ago)
- Default Branch: master
- Last Pushed: 2025-04-29T01:59:24.000Z (about 1 year ago)
- Last Synced: 2025-12-26T17:50:43.322Z (6 months ago)
- Language: Go
- Size: 83.6 MB
- Stars: 4
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Padlock: A post-quantum secure secret-splitting utility for backups & border-crossings
**Padlock** is a high-performance, single-pass K-of-N data encoding and decoding that implements a threshold one-time-pad scheme for secure data archiving and border-crossings. It splits data into encrypted output collections that can be archived or transferred. Only a subset of those collections are then required to recover the original content. By relying solely on secure random number generation and XOR operations, Padlock achieves high security while remaining straightforward and fully streamable.
Except for this section in README.md written by @rayozzie, everything in this repo - from this document through all code and other files - was 100% vibe-coded using a mixture-of-models over the course of several days in April 2025 to test the limits of tools & models at this moment in time.
By way of background, years ago - in a pre-internet peer-to-peer era - my colleagues and I found ourselves needing to physically carry media containing confidential source code across borders. I was being advised by my exec security that my hotel rooms were certainly being bugged and penetrated, and that, in two countries I was visiting, an 'evil maid' attack was not out of the question - and that I should always travel to these countries with a fresh, disposable laptop.
Back in 1985, in Lotus Notes, we'd implemented a K-of-N scheme for protecting administrators' keychains that contained the private keys and certs used for managing enterprise deployments as a team. This was just a few years after Ron Rivest had turned me on to Shamir's secret sharing work. But rather than using his algorithms, we used a much more crude (but wonderfully effective) system in which we just stored all permutations of the ciphertext. After all, K and N and the data size were all very small.
For decades I'd frequently pondered how great it would be to have a simple zip-like utility that could split my data securely into N copies, with K being required for rehydration. I'd physically carry some on my laptop or phone as collections of JPGs. Others would be sent to in-country colleagues by physical mail over a series of months. And then, at the right time, when I had access to K of them, reassembly would be possible.
And so, I decided several weeks ago to try my own "CEO vibe coding" experiment, to see just how far I could get before I had to touch a line of code.
For better or worse, this involved many hundreds of written instructions issued to multiple models in several tools over many hours, with correction upon correction and rewrite after rewrite. Yes, I was coding, but instead of via emacs, it was 100% through narrative and through chat.
Although I started with a fairly concise description of what I wanted, I found that I could only get the design and implementation 'correct' through a mixture-of-models technique, moving the project from model to model whenever one got stuck, as they all did. I could only make real progress by having them check and re-do each other's work. A simple observation is that for each and all of them, once the context started to get too long they started to make desperate changes, sometimes making hash of the code. And so, I got into the habit of starting fresh and training it in my goals all over again, with the current code as its new starting point.
The worst behavior - and it was quite bad - was when one of the models started doing what a truly junior developer under pressure would do when it was frustrated: A tar byte stream was getting corrupted and it wasn't clear why, and so after many iterations it started desperately 'rounding up', adding unnecessary padding, and taking all sorts of desperate measures while rationalizing to me (literally!) that 'sometimes these things are necessary to cover edge cases' even though it couldn't explain why. Ultimately, I had to help the model past this problem by telling it to stop, by reverting the repo, and by switching models.
In one case, a single line of code happened to be generated by o3-mini-high that had blocked progress. Not a single one of the models was able to successfully recognize and fix the issue even though eventually, through careful inspection, I could see what the problem was: a golang "buf = append(pad[n:m], input...)" was overwriting the pad itself because that pad slice had ample capacity. Yes, subtle. When I discovered this I told the model to fix that line of code, and we moved on together. That said, I sense that the model's feelings must have been hurt because it soon rewrote that section of code during a gratuitous refactor - which itself happened all too often.
While at first I began with Devin, I soon moved to ChatGPT o3-mini-high and eventually and kept swapping back and forth between that model and Sonnet 3.7 in Claude Code. Gemini Advanced 2.5 Pro helped from time to time. Codex came late.
Grand conclusion? It was far, far too much work to rely upon these junior AI's to bring even this project to completion. With the simplicity of the project's architecture and code, combined with golang's wonderful standard libraries, I could have easily written this in an evening rather than the days it took for the AI to help. But it did a nice job at paving the overall structure from my description. Nonetheless, great potential and it portends quite an amazing future as it turns 10x developers into 100x developers. I do worry for 1x devs.
## Download
Pre-built binaries are available for the following platforms:
| Platform | Architecture | Download | SHA256 |
|----------|-------------|----------|--------|
| macOS | ARM64 | [padlock](https://github.com/blues/padlock/raw/refs/heads/master/bin/macos-arm64/padlock) | [SHA256](bin/macos-arm64/padlock.sha256.txt) |
| macOS | AMD64 | [padlock](https://github.com/blues/padlock/raw/refs/heads/master/bin/macos-amd64/padlock) | [SHA256](bin/macos-amd64/padlock.sha256.txt) |
| Windows | ARM64 | [padlock.exe](https://github.com/blues/padlock/raw/refs/heads/master/bin/windows-arm64/padlock.exe) | [SHA256](bin/windows-arm64/padlock.exe.sha256.txt) |
| Windows | AMD64 | [padlock.exe](https://github.com/blues/padlock/raw/refs/heads/master/bin/windows-amd64/padlock.exe) | [SHA256](bin/windows-amd64/padlock.exe.sha256.txt) |
| Linux | ARM64 | [padlock](https://github.com/blues/padlock/raw/refs/heads/master/bin/linux-arm64/padlock) | [SHA256](bin/linux-arm64/padlock.sha256.txt) |
| Linux | AMD64 | [padlock](https://github.com/blues/padlock/raw/refs/heads/master/bin/linux-amd64/padlock) | [SHA256](bin/linux-amd64/padlock.sha256.txt) |
| Linux | ARMv7 | [padlock](https://github.com/blues/padlock/raw/refs/heads/master/bin/linux-armv7/padlock) | [SHA256](bin/linux-armv7/padlock.sha256.txt) |
## Key Features
- **Threshold Security:**
The data is split into N collections, where at least K collections (with 2 ≤ K ≤ N ≤ 26) are needed to reconstruct the original content. With fewer than K collections, no information is revealed.
- **Stream-Pipelined Processing:**
Both the encoding and decoding processes operate as fully streaming pipelines, processing the data chunk-by-chunk without needing to load the entire dataset into memory. This makes Padlock ideal for large-scale or real-time applications.
- **Information-Theoretic Security:**
Instead of computational cryptography, Padlock uses a one-time-pad threshold scheme based on information theory. For each input chunk:
- For each permutation of K collections, the system:
- Generates K-1 random pads and XORs them with the plaintext to create a ciphertext
- Distributes the random pads and the ciphertext across the K collections in that permutation
- Each collection contains multiple pieces from different permutations
- When K or more collections are combined, the original data can be reconstructed
- With fewer than K collections, no information about the original data can be recovered
- **Flexible Output Formats:**
Data chunks are stored as individual files in one of two formats:
- **PNG Files:** Files are named using the pattern
`IMG_.PNG`
(for example, if the collection directory is "3C5", the first chunk file is named `IMG3C5_00001.PNG`).
- **Raw Binary Files (.bin):** Files are named with the format
`_.bin`
- **User-Friendly Messaging and Error Handling:**
Messages intended for users (such as summaries and error notifications) are always displayed. Detailed trace and debug messages, with component-specific prefixes (like "PADLOCK:", "FILE:", etc.), appear only when the `-verbose` flag is set.
## How It Works
### Overview
1. **Encoding Process:**
- **Archive & Compress:**
The input directory is archived using tar and optionally compressed using gzip.
- **Chunking:**
The compressed stream is divided into chunks of a specified maximum size.
- **Threshold Encryption:**
For each chunk, the system:
- Generates random one-time pads for each permutation of K collections
- XORs the input data with these pads to create ciphertexts
- Distributes the data across collections according to combinatorial mathematics
- **Collection Organization:**
Collections can be stored as directories or as ZIP archives. Each collection is named with a pattern that includes the required number (K), a collection letter, and the total number of copies (N) - for example, "3A5" for the first collection in a 3-of-5 scheme.
2. **Decoding Process:**
- **Collection Discovery:**
The available collection directories or ZIP files are identified. ZIP files are automatically extracted to a temporary directory for processing. The collection names (containing the required copies and total copies) are parsed to extract important parameters.
- **Permutation Selection:**
The system determines which permutation to use based on the available collections. If fewer than K collections are present, an error is reported since reconstruction is mathematically impossible.
- **Data Reconstruction:**
For each chunk, the appropriate permutation is used to combine pieces from K collections. The XOR operation reconstructs the original data from the distributed pieces.
- **Extraction:**
The reassembled data is decompressed (if needed) and untarred to rebuild the original directory structure and files.
## Security
- **Perfect Secrecy:**
As long as a new one-time pad is generated securely for each chunk and is never reused, the encryption provides information-theoretic (perfect) secrecy.
- **Threshold Assurance:**
The design guarantees that without access to at least the required number K of collections, no useful information about the original data is revealed, regardless of the computational power available to an attacker.
- **Defense in Depth:**
The random number generation system combines multiple independent sources of entropy to ensure high-quality randomness even if some sources are compromised.
### Security Analysis
#### Random Number Generation
Padlock implements a robust defense-in-depth approach to random number generation, which is critical for one-time pad security:
1. **Multi-Source RNG Architecture**
- `MultiRNG` combines five independent random sources through XOR operations
- Security depends only on the strongest uncompromised source
- Even if multiple sources are compromised, data remains secure as long as at least one source remains uncompromised
- Implementation includes:
- `CryptoRand`: OS entropy pool (primary source)
- `MathRand`: Securely seeded PRNG
- `ChaCha20Rand`: Stream cipher with random key/nonce
- `PCG64Rand`: High-quality statistical PRNG
- `MT19937Rand`: Mersenne Twister with secure seed
2. **Randomness Quality Validation**
- Comprehensive test suite validates statistical properties:
- Frequency testing of bit distribution
- Runs test for sequential patterns
- Byte distribution uniformity verification
- Shannon entropy measurement
- Autocorrelation testing
- Chi-square testing
- All RNG providers use mutex locks to ensure thread safety
- Detailed error handling prevents the use of low-quality randomness
#### K-of-N Implementation
Padlock uses a mathematical approach to K-of-N threshold security:
1. **Combinatorial Design**
- `UniqueSortedCombinations` function generates all possible combinations of K elements from N elements
- Each collection participates in multiple permutations
- For each input chunk, K-1 random pads are generated
- XOR operations distribute data across collections so any K can reconstruct the original
2. **Information-Theoretic Security**
- With fewer than K collections, no information about the original data is revealed
- The system provides perfect secrecy under the one-time pad model
- Security relies on mathematical properties rather than computational hardness assumptions
- Each collection appears completely random when viewed in isolation
#### One-Time Pad Generation and Usage
1. **Pad Generation**
- Each chunk generates unique random pads for every permutation
- Pad sizes match the input data exactly
- Pads are never reused across chunks or collections
- The `encodeOneChunk` function handles the core cryptographic operations
2. **XOR-Based Cryptography**
- Simple XOR operations provide mathematically provable security
- Implementation is straightforward and auditable
- Avoids complex cryptographic primitives that could introduce vulnerabilities
- The approach is quantum-resistant by design
#### Data Formats and Error Handling
1. **Storage Formats**
- Binary (.bin) format for efficiency
- PNG (.PNG) format for steganographic storage with CRC validation
- PNG implementation includes data integrity checks via CRC32
2. **Error Detection**
- Chunk headers contain collection names and sizes for verification
- Collection naming convention provides self-verification
- Format-specific integrity checks during decoding
- Detailed error reporting for troubleshooting
#### Handling Incorrect or Corrupted Data
1. **Collection Verification**
- System verifies collection names, required copies, and total copies
- Mismatched parameters trigger explicit errors during decoding
- Collections can be provided in any order during decoding
2. **Corruption Handling**
- PNG format includes CRC32 validation to detect modifications
- If fewer than K collections are provided, decoding mathematically fails
- If collections are modified or corrupted:
- Header or CRC checks fail, producing errors
- Successful decoding with corrupted data produces garbage output that's indistinguishable from random data
#### Security Boundaries
1. **Limitations**
- Security depends entirely on the quality of random number generation
- Physical security of collections becomes the primary concern
- No verification of original data integrity beyond successful reconstruction
2. **Threat Model Considerations**
- Designed to protect against computational threats including quantum computers
- Does not protect against insider threats with access to K or more collections
- No protection against side-channel attacks during encoding/decoding operations
This implementation achieves information-theoretic security through a clean, auditable design that relies on well-understood mathematical principles rather than complex cryptographic primitives.
### Mathematical Foundations
#### Combinatorial Security Architecture
The K-of-N threshold scheme is built on rigorous combinatorial mathematics:
1. **Combinatorial Distribution**
- For N collections where any K are needed, there are C(N,K) = N!/(K!(N-K)!) possible combinations
- Each collection participates in exactly C(N-1,K-1) different permutations
- With N=5, K=3, there are 10 unique permutations, and each collection appears in 6 permutations
- This mathematical structure guarantees that any K collections contain at least one complete permutation
2. **XOR Properties Leveraged**
- XOR is commutative: A ⊕ B = B ⊕ A
- XOR is associative: (A ⊕ B) ⊕ C = A ⊕ (B ⊕ C)
- XOR with the same value twice cancels out: A ⊕ B ⊕ B = A
- XOR with random data produces random data: If B is truly random, then A ⊕ B is indistinguishable from random
3. **Perfect Reconstruction Properties**
- For a permutation involving K collections (e.g., ABC):
- Collection A stores random pad P_A
- Collection B stores random pad P_B
- Collection C stores C_data = D ⊕ P_A ⊕ P_B (where D is original data)
- During decoding: P_A ⊕ P_B ⊕ C_data = P_A ⊕ P_B ⊕ (D ⊕ P_A ⊕ P_B) = D
- XOR operations perfectly cancel out, leaving only the original data
#### Information-Theoretic Security Analysis
1. **Mathematical Proof of Threshold Properties**
- With K-1 or fewer collections, the system of equations is underdetermined
- For each missing piece, there are 2^n possible values (for n-bit data), all equally likely
- This creates perfect statistical independence between available and missing pieces
- The proof follows Claude Shannon's original work on information theory and perfect secrecy
2. **Statistical Independence**
- Each collection in isolation appears completely random
- No correlation exists between collections when viewed separately
- The XOR of random data with any fixed data produces statistically random output
- This guarantees that partial collection sets reveal zero information about the original data
#### Deep Algorithm Analysis
1. **Encoding Process Mechanics**
```
For each chunk of data D:
For each permutation P of K collections (e.g., ABC):
Generate K-1 random pads R_1, R_2, ..., R_(K-1)
Compute ciphertext C = D ⊕ R_1 ⊕ R_2 ⊕ ... ⊕ R_(K-1)
Distribute D, R_1, R_2, ..., R_(K-1) across the K collections
```
2. **Permutation Generation Process**
- Uses recursive backtracking to generate all K-sized combinations from N elements
- Creates a deterministic mapping between collections and permutations
- Ensures each collection has precisely the correct pieces for reconstruction
- Runtime complexity is O(C(N,K)), which is polynomial for fixed K
3. **Chunking Security Benefits**
- Enables efficient streaming processing of arbitrary-sized inputs
- Provides natural boundaries for error containment
- Ensures independence between chunks (compromise of one doesn't affect others)
- Allows for piece-wise verification during reconstruction
The mathematical elegance of this system lies in its perfect balance between redundancy and security. With exactly K-1 collections, an attacker gains absolutely zero information about the data - not just computational difficulty, but mathematical impossibility. This property holds regardless of computing power, including theoretical quantum computers, making it a future-proof security approach for protecting critical data.
### Documentation (courtesy of Devin)
- [Overview](docs/wiki/Overview.md) - High-level overview of the Padlock system
- [Architecture](docs/wiki/Architecture.md) - Technical architecture details
- [Usage Guide](docs/wiki/Usage-Guide.md) - Instructions for using Padlock
- [Security Model](docs/wiki/Security-Model.md) - Security principles and implementation
- [Implementation Details](docs/wiki/Implementation-Details.md) - Code organization and design
## Installation and Usage
### Requirements
- Go (version 1.23 or later is recommended)
- A standard Go build environment
### Building Padlock
To build the utility, run the following command in your terminal. (Simply copy and paste the command as-is.)
```bash
go build -o padlock cmd/padlock/main.go
```
### Command-Line Usage
- **Encode:**
padlock encode -copies 5 -required 3 -format png -chunk 2097152 [-clear] [-verbose] [-files] [-dryrun]
- ``: Directory containing the data to be archived and encoded.
- ``: Destination directory for the generated collection subdirectories.
- `-copies`: Number of collections to create (must be between 2 and 26).
- `-required`: Minimum number of collections required for reconstruction.
- `-format`: Output format, either "bin" or "png".
- `-chunk`: Maximum chunk size in bytes.
- `-clear`: (Optional) Clears the output directory before encoding.
- `-verbose`: (Optional) Enables detailed trace/debug messages.
- `-files`: (Optional) Creates individual files for each collection instead of TAR archives.
- `-dryrun`: (Optional) Calculate and display size information without writing output files.
- **Decode:**
padlock decode [-clear] [-verbose] [-dryrun]
- ``: Root directory containing the collection subdirectories or ZIP files.
- ``: Destination directory where the original data will be restored.
- `-clear`: (Optional) Clears the output directory before decoding.
- `-verbose`: (Optional) Enables detailed trace/debug messages.
- `-dryrun`: (Optional) Calculate and display size information without writing output files.
**Important:**
Do not place the output directory within the input directory to avoid recursive processing. Also, ensure that the number of available collections meets or exceeds the required threshold; otherwise, an error will be displayed.
## Implementation Details
- **Source File Organization:**
- **cmd/padlock/main.go:** The command-line interface entry point.
- **pkg/padlock/padlock.go:** Coordinates the encoding and decoding processes, integrating the various components.
- **pkg/file/:** Contains modules for file and directory operations:
- **format.go:** Implementations for working with different file formats (BIN and PNG).
- **directory.go:** Directory validation and management.
- **zip.go:** ZIP file creation and extraction.
- **collection.go:** Collection directory operations.
- **serialize.go:** Directory serialization/deserialization to/from tar streams.
- **compress.go:** Stream compression/decompression using gzip.
- **pkg/pad/pad.go:** Core implementation of the one-time pad threshold scheme.
- **pkg/pad/rng.go:** Provides secure random number generation by combining multiple entropy sources.
- **pkg/trace/trace.go:** Context-based logging system for debug and trace information.
## Disclaimer
Padlock is a demonstration of a secure, threshold-based method for splitting and encrypting data using a one-time pad and XOR operations without relying on additional cryptographic algorithms. Users must ensure that one-time pads are never reused and that configuration parameters are correctly set to achieve the intended level of security.
## License
MIT License