{"id":28796376,"url":"https://github.com/tommygrammar/dual-channel-vector-quantization","last_synced_at":"2025-07-25T02:34:51.268Z","repository":{"id":296361135,"uuid":"993105295","full_name":"tommygrammar/Dual-Channel-Vector-Quantization","owner":"tommygrammar","description":"A prototype library demonstrating a dual‐channel compression scheme built on hierarchical vector quantisation (HVQ), which via attached per‐cluster transformation functions yields multiple distinct reconstructions (meanings) from a single compressed code sequence.","archived":false,"fork":false,"pushed_at":"2025-05-30T08:17:47.000Z","size":8,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"blackgrammar-projects","last_synced_at":"2025-06-18T03:46:32.695Z","etag":null,"topics":["compression-algorithm","signal-processing","vector"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/tommygrammar.png","metadata":{"files":{"readme":"Readme.MD","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-05-30T08:16:47.000Z","updated_at":"2025-06-01T00:39:19.000Z","dependencies_parsed_at":"2025-05-30T10:37:25.544Z","dependency_job_id":"6565cacf-a815-46e3-95d0-1298ddaf27ea","html_url":"https://github.com/tommygrammar/Dual-Channel-Vector-Quantization","commit_stats":null,"previous_names":["tommygrammar/dual-channel-vector-quantization"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/tommygrammar/Dual-Channel-Vector-Quantization","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tommygrammar%2FDual-Channel-Vector-Quantization","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tommygrammar%2FDual-Channel-Vector-Quantization/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tommygrammar%2FDual-Channel-Vector-Quantization/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tommygrammar%2FDual-Channel-Vector-Quantization/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/tommygrammar","download_url":"https://codeload.github.com/tommygrammar/Dual-Channel-Vector-Quantization/tar.gz/refs/heads/blackgrammar-projects","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tommygrammar%2FDual-Channel-Vector-Quantization/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":266944689,"owners_count":24010488,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-07-25T02:00:09.625Z","response_time":70,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["compression-algorithm","signal-processing","vector"],"created_at":"2025-06-18T03:39:54.074Z","updated_at":"2025-07-25T02:34:51.260Z","avatar_url":"https://github.com/tommygrammar.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Dual Channel Vector Quantisation\n\nA prototype library demonstrating a **dual‐channel compression scheme** built on hierarchical vector quantisation (HVQ), which via attached per‐cluster transformation functions yields **multiple distinct reconstructions** (meanings) from a single compressed code sequence.\n\n---\n\n## Table of Contents\n\n1. [Motivation](#motivation)\n2. [Core Concept](#core-concept)\n3. [Project Layout](#project-layout)\n4. [Installation](#installation)\n5. [Getting Started](#getting-started)\n\n   1. [Fitting \u0026 Compressing](#fitting--compressing)\n   2. [Dual Decoding](#dual-decoding)\n   3. [Storage \u0026 Quality Reports](#storage--quality-reports)\n6. [Empirical Results](#empirical-results)\n7. [Under the Hood](#under-the-hood)\n\n   1. [Macro + Micro Codebooks](#macro--micro-codebooks)\n   2. [Per-Cluster Transforms](#per-cluster-transforms)\n   3. [Primary vs. Secondary Reconstruction](#primary-vs-secondary-reconstruction)\n   4. [Byte‐Size Accounting](#byte-size-accounting)\n8. [Scaling to More Channels](#scaling-to-more-channels)\n9. [Applications](#applications)\n10. [Future Directions](#future-directions)\n11. [License](#license)\n\n---\n\n## Motivation\n\nTraditional compression targets a **single best reconstruction** of data. Yet many domains require **multiple semantic views** of the same base signal, for instance:\n\n* A raw sensor reading plus a noise‐corrected variant.\n* An image and a style-transformed version.\n* A time series under different hypothetical scenarios.\n\nStoring or transmitting separate compressed streams for each view multiplies the data footprint. Dual Channel Vector Quantisation explores embedding **two (or more) coherent decodings** in a **single** compressed code via **per-cluster transforms**, cutting overall size dramatically.\n\n---\n\n## Core Concept\n\n1. **Hierarchical Vector Quantisation (HVQ)**\n\n   * A two‐level VQ:\n\n     * **Macro** clusters capture global structure.\n     * **Micro** clusters refine residuals within each macro.\n   * Produces a discrete code per sample: $(m,u)$.\n\n2. **Per-Cluster Transform Functions**\n\n   * Each macro cluster $m$ stores a small parameter block $T_m$.\n   * These transforms generate **secondary meanings** from the primary decode.\n\n3. **Dual Decoding Channels**\n\n   * **Primary decoder** $D_1$ reconstructs a baseline approximation.\n   * **Secondary decoder** $D_2$ applies $T_m$ to $D_1$’s output, yielding a distinct variant.\n\n4. **Storage Efficiency**\n\n   * **Codes**: two byte-sized indices per sample ($m,u$).\n   * **Transforms**: one float vector per macro cluster.\n   * Combined storage remains **far smaller** than raw data while supporting **two full reconstructions**.\n\n---\n\n## Project Layout\n\n```\nDualChannelVQ/\n├── main.py       # Core DualChannelVQ implementation\n├── example.py    # Demonstration script: synthetic data → fit → compress → decode → report\n└── README.md     # This document\n```\n\n* **`main.py`**\n\n  * Implements the `DualHVQ` class (renamed in code to `DualChannelVQ` if desired) with methods:\n\n    * `fit(X)`: learns macro \u0026 micro centroids and per-cluster transforms.\n    * `compress(X)`: maps data to `(macro_idx, micro_idx)` codes.\n    * `decompress_primary(codes)`: baseline decode.\n    * `decompress_secondary(codes)`: transform-augmented decode.\n    * `report_sizes(X, codes)`: byte-size breakdown.\n\n* **`example.py`**\n\n  * Builds synthetic 2D data.\n  * Fits the dual‐channel VQ model.\n  * Compresses, reconstructs both channels, and prints size/error comparisons.\n\n---\n\n## Installation\n\n1. **Clone** the repo:\n\n   ```bash\n   git clone https://github.com/yourusername/Dual-Channel-Vector-Quantization.git\n   cd Dual-Channel-Vector-Quantization\n   ```\n2. **Install** dependencies (Python 3.6+):\n\n   ```bash\n   pip install numpy scipy\n   ```\n3. **Verify** by running the example:\n\n   ```bash\n   python3 example.py\n   ```\n\n---\n\n## Getting Started\n\n### Fitting \u0026 Compressing\n\n```python\nfrom main import DualHVQ\nimport numpy as np\n\n# Generate or load your data\nX = np.random.randn(500, 8)  # 500 samples, 8 features\n\n# Instantiate model: choose number of macros \u0026 micros\nmodel = DualHVQ(n_macro=9, n_micro=36, random_state=0)\n\n# Learn codebooks + transforms\nmodel.fit(X)\n\n# Compress into (macro, micro) codes\ncodes = model.compress(X)  # shape: (500, 2), dtype: uint8\n```\n\n### Dual Decoding\n\n```python\n# Primary reconstruction (hierarchical VQ)\nX_primary = model.decompress_primary(codes)\n\n# Secondary reconstruction (after per-cluster transform)\nX_secondary = model.decompress_secondary(codes)\n```\n\n### Storage \u0026 Quality Reports\n\n```python\n# Byte‐size breakdown\nsizes = model.report_sizes(X, codes)\nprint(\"Primary codes only:\",     sizes['code_bytes'])\nprint(\"Transform params size:\", sizes['param_bytes'])\nprint(\"Total compressed size:\", sizes['total_compressed_bytes'])\n\n# Reconstruction quality\nmse1 = np.mean(np.linalg.norm(X - X_primary,   axis=1)**2)\nmse2 = np.mean(np.linalg.norm(X - X_secondary, axis=1)**2)\nprint(\"Primary MSE:\",   mse1)\nprint(\"Secondary MSE:\", mse2)\n\n# Distinctness of channels\ndist = np.mean(np.linalg.norm(X_primary - X_secondary, axis=1))\nprint(\"Mean inter-channel distance:\", dist)\n```\n\n---\n\n## Empirical Results\n\nOn **synthetic 2D Gaussian data** (300 samples × 2 dims), with 3 macros and 10 micros:\n\n| Metric                             | Value         |\n| ---------------------------------- | ------------- |\n| Raw data size                      | 4 800 bytes   |\n| Primary codes only                 | 600 bytes     |\n| Transform parameters               | 48 bytes      |\n| **Total compressed footprint**     | **648 bytes** |\n| Compression savings                | ≈ 86.5 %      |\n| Primary reconstruction (mean L₂)   | 0.3175        |\n| Secondary reconstruction (mean L₂) | 0.4089        |\n| Mean inter-channel distance        | 0.0914        |\n\nThese results confirm:\n\n* **High compression** (7.4× smaller).\n* **Two semantically distinct reconstructions** from a single code stream.\n\n---\n\n## Under the Hood\n\n### 1. Macro + Micro Codebooks\n\n* **Macro**: k-means on the raw dataset → $n_{\\text{macro}}$ centroids $\\mu_m$.\n* **Micro**: within each macro cluster, run k-means on residuals → $n_{\\text{micro}}$ centroids $\\nu_{m,u}$.\n\nA data point $x$ maps to code $z=(m,u)$.\n\n### 2. Per-Cluster Transforms\n\n* For each macro $m$, generate a small parameter vector $T_m$ (e.g., gain \u0026 offset, or shift).\n* Store these in float32 for minimal overhead.\n\n### 3. Dual Reconstruction Channels\n\n* **Primary Decode** $D_1$:\n\n  $$\n    x' = \\mu_m + \\nu_{m,u}.\n  $$\n* **Secondary Decode** $D_2$:\n\n  $$\n    x'' = \\text{Transform}_m(x') \\quad\\bigl(\\text{e.g. } g_m \\odot x' + o_m\\bigr).\n  $$\n\nBoth derive from the same discrete code $(m,u)$.\n\n### 4. Byte-Size Accounting\n\n* **Raw**: $8 \\times \\text{num\\_elements}$ bytes (float64).\n* **Codes**: 2 bytes/sample (uint8 × 2).\n* **Transforms**: $\\text{dim} \\times n_{\\text{macro}} \\times 4$ bytes (float32).\n\n---\n\n## Scaling to More Channels\n\n* Attach $K$ transform sets $T_m^{(k)}$ per macro → yields $K$ distinct decodings.\n* Overhead grows linearly with $K$, but code payload remains unchanged.\n* Experimentally, with $K=4$, total compressed size rises modestly, while providing four alternative reconstructions.\n\n---\n\n## Applications\n\n* **Bandwidth-constrained streaming**: send base + alternate views in one stream.\n* **Sensor networks**: raw readings + calibrated corrections from one code stream.\n* **Image \u0026 style transmission**: original + stylized images without doubling bit-rate.\n* **Time-series what-if scenarios**: baseline + hypothetical regime outputs in a single package.\n\n---\n\n## Future Directions\n\n1. **Learned transforms**: replace handcrafted shifts/gains with per-cluster regressors or small neural nets.\n2. **Differentiable HVQ**: enable end-to-end training of centroids and transforms via backpropagation.\n3. **Adaptive clustering**: auto-tune macro/micro counts by data locality or target error.\n4. **Nonlinear transforms**: explore affine, polynomial, or kernel-based per-cluster mappings.\n5. **Real-time pipelines**: integrate with streaming architectures for on-the-fly dual decoding.\n\n---\n\n## License\n\nThis project is released under the **MIT License**. See [LICENSE](LICENSE) for details.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftommygrammar%2Fdual-channel-vector-quantization","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ftommygrammar%2Fdual-channel-vector-quantization","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftommygrammar%2Fdual-channel-vector-quantization/lists"}