https://github.com/tommygrammar/dual-channel-vector-quantization

A prototype library demonstrating a dual‐channel compression scheme built on hierarchical vector quantisation (HVQ), which via attached per‐cluster transformation functions yields multiple distinct reconstructions (meanings) from a single compressed code sequence.
https://github.com/tommygrammar/dual-channel-vector-quantization

compression-algorithm signal-processing vector

Last synced: 11 months ago
JSON representation

Host: GitHub
URL: https://github.com/tommygrammar/dual-channel-vector-quantization
Owner: tommygrammar
License: mit
Created: 2025-05-30T08:16:47.000Z (about 1 year ago)
Default Branch: blackgrammar-projects
Last Pushed: 2025-05-30T08:17:47.000Z (about 1 year ago)
Last Synced: 2025-06-18T03:46:32.695Z (about 1 year ago)
Topics: compression-algorithm, signal-processing, vector
Language: Python
Homepage:
Size: 7.81 KB
Stars: 1
Watchers: 0
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: Readme.MD
- License: LICENSE

Awesome Lists containing this project

README

          # Dual Channel Vector Quantisation

A prototype library demonstrating a **dual‐channel compression scheme** built on hierarchical vector quantisation (HVQ), which via attached per‐cluster transformation functions yields **multiple distinct reconstructions** (meanings) from a single compressed code sequence.

---

## Table of Contents

1. [Motivation](#motivation)

2. [Core Concept](#core-concept)

3. [Project Layout](#project-layout)

4. [Installation](#installation)

5. [Getting Started](#getting-started)

   1. [Fitting & Compressing](#fitting--compressing)

   2. [Dual Decoding](#dual-decoding)

   3. [Storage & Quality Reports](#storage--quality-reports)

6. [Empirical Results](#empirical-results)

7. [Under the Hood](#under-the-hood)

   1. [Macro + Micro Codebooks](#macro--micro-codebooks)

   2. [Per-Cluster Transforms](#per-cluster-transforms)

   3. [Primary vs. Secondary Reconstruction](#primary-vs-secondary-reconstruction)

   4. [Byte‐Size Accounting](#byte-size-accounting)

8. [Scaling to More Channels](#scaling-to-more-channels)

9. [Applications](#applications)

10. [Future Directions](#future-directions)

11. [License](#license)

---

## Motivation

Traditional compression targets a **single best reconstruction** of data. Yet many domains require **multiple semantic views** of the same base signal, for instance:

* A raw sensor reading plus a noise‐corrected variant.

* An image and a style-transformed version.

* A time series under different hypothetical scenarios.

Storing or transmitting separate compressed streams for each view multiplies the data footprint. Dual Channel Vector Quantisation explores embedding **two (or more) coherent decodings** in a **single** compressed code via **per-cluster transforms**, cutting overall size dramatically.

---

## Core Concept

1. **Hierarchical Vector Quantisation (HVQ)**

   * A two‐level VQ:

     * **Macro** clusters capture global structure.

     * **Micro** clusters refine residuals within each macro.

   * Produces a discrete code per sample: $(m,u)$.

2. **Per-Cluster Transform Functions**

   * Each macro cluster $m$ stores a small parameter block $T_m$.

   * These transforms generate **secondary meanings** from the primary decode.

3. **Dual Decoding Channels**

   * **Primary decoder** $D_1$ reconstructs a baseline approximation.

   * **Secondary decoder** $D_2$ applies $T_m$ to $D_1$’s output, yielding a distinct variant.

4. **Storage Efficiency**

   * **Codes**: two byte-sized indices per sample ($m,u$).

   * **Transforms**: one float vector per macro cluster.

   * Combined storage remains **far smaller** than raw data while supporting **two full reconstructions**.

---

## Project Layout

```

DualChannelVQ/

├── main.py       # Core DualChannelVQ implementation

├── example.py    # Demonstration script: synthetic data → fit → compress → decode → report

└── README.md     # This document

```

* **`main.py`**

  * Implements the `DualHVQ` class (renamed in code to `DualChannelVQ` if desired) with methods:

    * `fit(X)`: learns macro & micro centroids and per-cluster transforms.

    * `compress(X)`: maps data to `(macro_idx, micro_idx)` codes.

    * `decompress_primary(codes)`: baseline decode.

    * `decompress_secondary(codes)`: transform-augmented decode.

    * `report_sizes(X, codes)`: byte-size breakdown.

* **`example.py`**

  * Builds synthetic 2D data.

  * Fits the dual‐channel VQ model.

  * Compresses, reconstructs both channels, and prints size/error comparisons.

---

## Installation

1. **Clone** the repo:

   ```bash

   git clone https://github.com/yourusername/Dual-Channel-Vector-Quantization.git

   cd Dual-Channel-Vector-Quantization

   ```

2. **Install** dependencies (Python 3.6+):

   ```bash

   pip install numpy scipy

   ```

3. **Verify** by running the example:

   ```bash

   python3 example.py

   ```

---

## Getting Started

### Fitting & Compressing

```python

from main import DualHVQ

import numpy as np

# Generate or load your data

X = np.random.randn(500, 8)  # 500 samples, 8 features

# Instantiate model: choose number of macros & micros

model = DualHVQ(n_macro=9, n_micro=36, random_state=0)

# Learn codebooks + transforms

model.fit(X)

# Compress into (macro, micro) codes

codes = model.compress(X)  # shape: (500, 2), dtype: uint8

```

### Dual Decoding

```python

# Primary reconstruction (hierarchical VQ)

X_primary = model.decompress_primary(codes)

# Secondary reconstruction (after per-cluster transform)

X_secondary = model.decompress_secondary(codes)

```

### Storage & Quality Reports

```python

# Byte‐size breakdown

sizes = model.report_sizes(X, codes)

print("Primary codes only:",     sizes['code_bytes'])

print("Transform params size:", sizes['param_bytes'])

print("Total compressed size:", sizes['total_compressed_bytes'])

# Reconstruction quality

mse1 = np.mean(np.linalg.norm(X - X_primary,   axis=1)**2)

mse2 = np.mean(np.linalg.norm(X - X_secondary, axis=1)**2)

print("Primary MSE:",   mse1)

print("Secondary MSE:", mse2)

# Distinctness of channels

dist = np.mean(np.linalg.norm(X_primary - X_secondary, axis=1))

print("Mean inter-channel distance:", dist)

```

---

## Empirical Results

On **synthetic 2D Gaussian data** (300 samples × 2 dims), with 3 macros and 10 micros:

| Metric                             | Value         |

| ---------------------------------- | ------------- |

| Raw data size                      | 4 800 bytes   |

| Primary codes only                 | 600 bytes     |

| Transform parameters               | 48 bytes      |

| **Total compressed footprint**     | **648 bytes** |

| Compression savings                | ≈ 86.5 %      |

| Primary reconstruction (mean L₂)   | 0.3175        |

| Secondary reconstruction (mean L₂) | 0.4089        |

| Mean inter-channel distance        | 0.0914        |

These results confirm:

* **High compression** (7.4× smaller).

* **Two semantically distinct reconstructions** from a single code stream.

---

## Under the Hood

### 1. Macro + Micro Codebooks

* **Macro**: k-means on the raw dataset → $n_{\text{macro}}$ centroids $\mu_m$.

* **Micro**: within each macro cluster, run k-means on residuals → $n_{\text{micro}}$ centroids $\nu_{m,u}$.

A data point $x$ maps to code $z=(m,u)$.

### 2. Per-Cluster Transforms

* For each macro $m$, generate a small parameter vector $T_m$ (e.g., gain & offset, or shift).

* Store these in float32 for minimal overhead.

### 3. Dual Reconstruction Channels

* **Primary Decode** $D_1$:

  $$

    x' = \mu_m + \nu_{m,u}.

  $$

* **Secondary Decode** $D_2$:

  $$

    x'' = \text{Transform}_m(x') \quad\bigl(\text{e.g. } g_m \odot x' + o_m\bigr).

  $$

Both derive from the same discrete code $(m,u)$.

### 4. Byte-Size Accounting

* **Raw**: $8 \times \text{num\_elements}$ bytes (float64).

* **Codes**: 2 bytes/sample (uint8 × 2).

* **Transforms**: $\text{dim} \times n_{\text{macro}} \times 4$ bytes (float32).

---

## Scaling to More Channels

* Attach $K$ transform sets $T_m^{(k)}$ per macro → yields $K$ distinct decodings.

* Overhead grows linearly with $K$, but code payload remains unchanged.

* Experimentally, with $K=4$, total compressed size rises modestly, while providing four alternative reconstructions.

---

## Applications

* **Bandwidth-constrained streaming**: send base + alternate views in one stream.

* **Sensor networks**: raw readings + calibrated corrections from one code stream.

* **Image & style transmission**: original + stylized images without doubling bit-rate.

* **Time-series what-if scenarios**: baseline + hypothetical regime outputs in a single package.

---

## Future Directions

1. **Learned transforms**: replace handcrafted shifts/gains with per-cluster regressors or small neural nets.

2. **Differentiable HVQ**: enable end-to-end training of centroids and transforms via backpropagation.

3. **Adaptive clustering**: auto-tune macro/micro counts by data locality or target error.

4. **Nonlinear transforms**: explore affine, polynomial, or kernel-based per-cluster mappings.

5. **Real-time pipelines**: integrate with streaming architectures for on-the-fly dual decoding.

---

## License

This project is released under the **MIT License**. See [LICENSE](LICENSE) for details.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/tommygrammar/dual-channel-vector-quantization

Awesome Lists containing this project

README