https://github.com/bitbynik/substitution_cipher

SIL765 Assignment-1
https://github.com/bitbynik/substitution_cipher

cryptanalysis decipher iitd security substitution-cipher

Last synced: 7 months ago
JSON representation

SIL765 Assignment-1

Host: GitHub
URL: https://github.com/bitbynik/substitution_cipher
Owner: BitByNIK
Created: 2025-01-09T08:59:41.000Z (about 1 year ago)
Default Branch: main
Last Pushed: 2025-01-19T10:44:34.000Z (about 1 year ago)
Last Synced: 2025-01-19T11:29:22.387Z (about 1 year ago)
Topics: cryptanalysis, decipher, iitd, security, substitution-cipher
Language: Python
Homepage:
Size: 2.23 MB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# Substitution Cipher Decoder

This project is a Python-based implementation of a substitution cipher decoder. It uses **hill climbing**, a search optimization algorithm, to iteratively improve decryption keys. By analyzing patterns in the ciphertext and comparing them with English language characteristics, the script deciphers encrypted text.

---

## How It Works

A substitution cipher replaces each letter in plaintext with a specific character (e.g., numbers, symbols, or other letters). To decrypt such a cipher, the script finds the mapping between ciphertext characters and plaintext letters.

### Steps in the Decryption Process

1. **Initial Setup:**

- The script starts with a **random key**, mapping ciphertext characters to plaintext letters.
- A key represents how each ciphertext character translates into a plaintext letter.

2. **Fitness Calculation:**

- The script evaluates how good the current key is by calculating a **fitness score**.
- This score measures how similar the decrypted text is to English using the frequency of groups of four letters (called **quadgrams**).
- Formula used for fitness:
$$\text{Fitness} = \sum_{\text{quadgram}} \log(\text{QF}[\text{quadgram}])$$
- **QF** represents the frequency of quadgrams in standard English text.
- Taking the logarithm ensures to avoid underflow because frequencies can be small.

3. **Heuristic-Based Key Selection:**

- A **heuristic function** is used to prioritize which parts of the key to adjust.
- The heuristic identifies ciphertext characters whose frequencies differ the most from expected English frequencies.
- Formula for the heuristic:
$$H(b) = \max(1, |\text{FC}(b) - \text{FE}(\text{E}^{-1}(b))|)$$
Where:

- $$H(b)$$: Heuristic score for a ciphertext letter $$b$$.
- $$\text{FC}(b)$$: Rank of $$b$$ in ciphertext frequencies.
- $$\text{FE}(a)$$: Rank of the corresponding plaintext letter $$a$$ in English frequencies.

- A ciphertext letter $$b$$ with a large heuristic value is **more likely to be swapped** because it is far from its expected frequency rank. The goal is to quickly minimize these differences.
- Once a character $$b$$ is selected, it is swapped with another randomly chosen character in the key.

4. **Key Evaluation:**

- The script decrypts the text using the modified key.
- If the fitness score improves, the new key is kept. Otherwise, it is discarded.

5. **Stopping Condition:**
- If the fitness score does not improve after $ T $ iterations, the process stops.
- The entire procedure is repeated multiple times, and the best key across all attempts is chosen.

---

## Why Use a Heuristic?

The heuristic focuses on ciphertext letters whose observed frequencies differ the most from expected English frequencies. This:

- **Speeds Up Convergence:** Quickly minimizes large frequency mismatches.
- **Improves Accuracy:** Guides the algorithm to better keys, especially for longer ciphertexts where frequency distributions are more reliable.

---

## Features

- **Hill Climbing Optimization:** Iteratively improves the decryption key using fitness scoring.
- **Quadgram-Based Scoring:** Evaluates how "English-like" the decrypted text is.
- **Heuristic Guidance:** Targets the most impactful key adjustments to improve efficiency.

---

## Requirements

- **Python 3.6 or Higher**
- A file containing a large sample of English text is required to calculate the letter and quadgram frequencies. Here, we have used big.txt, which should be a plain text file with diverse English content (e.g., books, articles). The quality of decryption improves with a larger and more representative reference text.

### Dependencies:

- Built-in Python libraries: `collections`, `math`, `random`.

---

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/bitbynik/substitution_cipher

Awesome Lists containing this project

README