{"id":49714283,"url":"https://github.com/weakknight/tinyastc","last_synced_at":"2026-05-08T19:04:03.607Z","repository":{"id":342298066,"uuid":"939853385","full_name":"WeakKnight/tinyASTC","owner":"WeakKnight","description":null,"archived":false,"fork":false,"pushed_at":"2026-03-05T14:19:55.000Z","size":9246,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2026-03-05T17:25:45.757Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/WeakKnight.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-02-27T08:06:19.000Z","updated_at":"2026-03-05T14:19:59.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/WeakKnight/tinyASTC","commit_stats":null,"previous_names":["weakknight/tinyastc"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/WeakKnight/tinyASTC","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/WeakKnight%2FtinyASTC","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/WeakKnight%2FtinyASTC/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/WeakKnight%2FtinyASTC/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/WeakKnight%2FtinyASTC/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/WeakKnight","download_url":"https://codeload.github.com/WeakKnight/tinyASTC/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/WeakKnight%2FtinyASTC/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":32793488,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-08T08:22:46.396Z","status":"ssl_error","status_checked_at":"2026-05-08T08:22:45.650Z","response_time":54,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2026-05-08T19:04:00.123Z","updated_at":"2026-05-08T19:04:03.589Z","avatar_url":"https://github.com/WeakKnight.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# tinyBC\n\n**Minimal GPU texture compression — two approaches in readable code.**\n\nThis repo contains two tiny, self-contained texture compressors you can read in one sitting:\n\n| | **tinyBC** | **tinyMLP (Hash)** | **tinyLatent** |\n|---|---|---|---|\n| Approach | Classic block compression (BC7 Mode 6) | Neural: multi-res hash tables + SIREN | Neural: dense latent texture grid + SIREN |\n| Core idea | 2 endpoint colors + 16 weights per 4×4 block | Hash lookup → 32 features → decode | Bilinear-sampled 2D grid → 32 features → decode |\n| Runs on | GPU via [Slang](https://shader-slang.com/) compute shader | GPU via PyTorch | GPU via PyTorch |\n| PSNR | ~40 dB | ~39.5 dB | ~37.6 dB (two-scale + QAT) |\n| Stage-1 ratio | 4:1 (fixed) | 1.9× (default) | **7.2×** (uint8 grids + fp16 decoder) |\n| Stage-2 ratio | — | — | **26.9×** (+ JPEG compression of uint8 latent) |\n| Interactive | — | Real-time visualization | Real-time + optional latent panel |\n\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"images/fig_comparison.png\" alt=\"Compression quality comparison\" width=\"100%\"/\u003e\n\u003c/p\u003e\n\n---\n\n## Table of Contents\n\n- [Quick Start](#quick-start)\n- [What is Block Compression?](#what-is-block-compression)\n  - [Why Block Compression Exists](#why-block-compression-exists)\n  - [The Core Idea](#the-core-idea-two-colors-and-a-recipe)\n  - [Geometric Insight](#geometric-insight-a-line-in-color-space)\n- [How tinyBC Works](#how-tinybc-works)\n  - [Step 1: PCA Initial Guess](#step-1-pca-initial-guess)\n  - [Step 2: Nelder-Mead Refinement](#step-2-nelder-mead-refinement)\n  - [The Pipeline](#the-full-pipeline)\n- [tinyMLP: Neural Image Compression](#tinymlp-neural-image-compression)\n  - [The Idea](#the-idea)\n  - [Architecture: Hash Encoding + SIREN](#architecture-hash-encoding--siren)\n  - [Latent Texture: a Different Compression Primitive](#latent-texture-a-different-compression-primitive)\n  - [Interactive Demo](#interactive-demo)\n- [Results](#results)\n- [Code Walkthrough](#code-walkthrough)\n- [License](#license)\n\n---\n\n## Quick Start\n\n### tinyBC — Block Compression\n\n**Prerequisites:** Python 3.10+, a GPU with Vulkan support, [SlangPy](https://shader-slang.com/slang-python/).\n\n```bash\npip install slangpy sgl numpy\n```\n\n```bash\npython tinybc.py                          # compress sample.jpg, print PSNR\npython tinybc.py -i photo.png -o out.png  # custom input, save decoded output\npython tinybc.py -b                       # benchmark mode (1000 iterations)\n```\n\n### tinyMLP — Neural Compression (Hash Encoding)\n\n**Prerequisites:** Python 3.10+, PyTorch, OpenCV (optional, for interactive window).\n\n```bash\npip install torch opencv-python\n```\n\n```bash\npython tinyMLP.py                                    # train on sample.png, watch it learn\npython tinyMLP.py -i photo.png                       # custom input\npython tinyMLP.py --log2_T 10 --hidden 32 --depth 1  # tiny model, ~16x compression\npython tinyMLP.py --save model.pth                   # save trained weights\n```\n\n**Controls:** `Q`/`ESC` quit, `Space` pause/resume, `S` save snapshot.\n\n### tinyLatent — Neural Compression (Latent Texture)\n\n```bash\npython tinyLatent.py                                          # single-scale (default)\npython tinyLatent.py --ch_lo 16 --ch_hi 2                    # two-scale (~104K params)\npython tinyLatent.py --ch_lo 16 --ch_hi 2 --qat_bits 8       # two-scale + QAT int8\npython tinyLatent.py --activation gelu                        # experiment: GELU decoder\npython tinyLatent.py --vis_latent                             # show latent in 3rd panel\npython tinyLatent.py --ch_lo 16 --ch_hi 2 --qat_bits 8 \\\n    --save_latent latent.npz                                  # save uint8 artifact\n```\n\n---\n\n## What is Block Compression?\n\n### Why Block Compression Exists\n\nWhen a GPU renders a 3D scene, it reads texture data _millions_ of times per frame. Unlike JPEG or PNG, the GPU can't afford to decompress an entire image first — it needs **random access** to any pixel at any time.\n\nThis rules out most image compression formats. JPEG uses variable-length coding that requires sequential decoding. PNG uses an LZ-based stream. Neither lets you jump to pixel (423, 871) without decoding everything before it.\n\n**Block compression** solves this by dividing the image into small, independent tiles — typically **4×4 pixels** — each compressed to a **fixed-size** bit string (128 bits for BC7). The GPU can decode any block in O(1) without touching any other block.\n\n| Property | JPEG/PNG | Block Compression |\n|---|---|---|\n| Random access | No | **Yes** |\n| Decode unit | Entire image | Single 4×4 block |\n| GPU-friendly | No (CPU decode) | **Yes** (hardware decoder) |\n| Compression ratio | Very high | Moderate (~4:1) |\n| Use case | Storage, web | **Real-time rendering** |\n\n### The Core Idea: Two Colors and a Recipe\n\nEvery 4×4 block is encoded as:\n\n1. **Two endpoint colors** (Color A and Color B)\n2. **16 weights** — one per pixel, each saying \"how much of A vs B\"\n\nTo decode a pixel, just interpolate: `pixel = lerp(A, B, weight)`.\n\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"images/fig_block_concept.png\" alt=\"Block compression concept\" width=\"100%\"/\u003e\n\u003c/p\u003e\n\nThe total cost: ~28 bits for each endpoint + ~64 bits for 16 weights + a few mode bits = **128 bits per block**, or **0.5 bytes per pixel**. Uncompressed RGBA would cost 16 × 32 = 512 bits — a 4× saving.\n\n\u003e **Key insight:** We're betting that within any tiny 4×4 region of an image, all the colors can be _reasonably approximated_ as a blend of just two colors. For natural images, this bet pays off surprisingly well.\n\n### Geometric Insight: A Line in Color Space\n\nHere's another way to think about it. Each pixel is a point in RGB color space (a 3D cube). Block compression finds the **best-fit line** through these 16 points, then projects each pixel onto that line.\n\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"images/fig_color_line.png\" alt=\"Color line in RGB space\" width=\"55%\"/\u003e\n\u003c/p\u003e\n\nThe two endpoints define where the line starts and ends. The per-pixel weight records where each pixel falls along this line. All the compression error comes from the **perpendicular distance** between each pixel and the line — the off-axis detail that gets lost.\n\n---\n\n## How tinyBC Works\n\n### Step 1: PCA Initial Guess\n\nFinding the \"best\" two endpoints is an optimization problem. A brute-force search over all possible RGBA endpoint pairs would be astronomically expensive (each endpoint lives in a continuous 4D space).\n\ntinyBC starts with a fast, classic trick: **Principal Component Analysis (PCA)**.\n\n1. Compute the **mean color** of the 16 pixels.\n2. Find the **dominant direction** of color variation (the axis of greatest spread).\n3. Project all pixels onto this axis.\n4. The two extremes become the initial endpoint colors.\n\nThis is cheap — just a few dot products — and gives a surprisingly good initial answer. For many blocks, it's already good enough (loss below a threshold), and we skip straight to output.\n\n### Step 2: Nelder-Mead Refinement\n\nFor harder blocks (strong color variation, edges, mixed content), the PCA solution can be noticeably off. tinyBC then applies **Nelder-Mead simplex optimization** to refine the endpoints.\n\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"images/fig_simplex_intuition.png\" alt=\"Nelder-Mead simplex intuition\" width=\"100%\"/\u003e\n\u003c/p\u003e\n\nNelder-Mead is a derivative-free optimizer. It works by maintaining a **simplex** (a geometric shape with N+1 vertices in N dimensions). Since our search space is 8-dimensional (4 components × 2 endpoints), the simplex has 9 vertices. At each iteration, it:\n\n| Operation | What happens |\n|---|---|\n| **Reflection** | Mirror the worst vertex through the centroid of the rest — try the \"opposite direction\" |\n| **Expansion** | If reflection found a great point, push even further in that direction |\n| **Contraction** | If reflection didn't help, pull the worst vertex closer to the centroid |\n| **Shrink** | If nothing works, shrink the entire simplex toward the best vertex |\n\nAfter up to 64 iterations, the simplex converges to a local (often global) minimum. The best vertex gives our refined endpoints.\n\n\u003e **Why Nelder-Mead?** It's derivative-free (our loss landscape is non-smooth due to weight quantization), simple to implement in a shader, and converges quickly in low dimensions. Perfect for a GPU compute kernel where each thread independently optimizes one 4×4 block.\n\n### The Full Pipeline\n\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"images/fig_pipeline.png\" alt=\"Encoding pipeline\" width=\"100%\"/\u003e\n\u003c/p\u003e\n\n1. **Load** the input texture onto the GPU.\n2. **Dispatch** one compute thread per 4×4 block.\n3. Each thread runs **PCA** to get initial endpoints.\n4. If loss \u003e threshold (0.004), run **Nelder-Mead** (64 iterations max).\n5. Compute final per-pixel weights and write the decoded block to the output texture.\n6. **Compute PSNR** on the CPU by comparing input vs decoded.\n\n---\n\n## tinyMLP: Neural Image Compression\n\n### The Idea\n\nWhat if, instead of hand-crafted block partitions, we let a **neural network** learn to compress the image?\n\ntinyMLP takes a radically different approach: train a small MLP (multi-layer perceptron) to map pixel coordinates to colors:\n\n```\ninput: (x, y)  →  MLP  →  output: (r, g, b)\n```\n\nThe \"compressed file\" is just the **network weights**. A 101K-parameter network weighs ~400KB — for a 512×512 image (768KB) that's **2× compression** achieving ~39 dB PSNR with no block artifacts.\n\n| What's stored | Block compression (tinyBC) | Neural compression (tinyMLP) |\n|---|---|---|\n| Per-block data | 2 endpoints + 16 weights | — |\n| Global model | — | MLP weights (~152KB) |\n| Decoding | `lerp(A, B, w)` per pixel | Forward pass through network |\n| Artifacts | Block boundaries | Smooth, frequency-dependent blur |\n| Compression ratio | Fixed 4:1 | Tunable via model size |\n\n### Architecture: Hash Encoding + SIREN\n\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"images/fig_mlp_architecture.png\" alt=\"tinyMLP architecture\" width=\"100%\"/\u003e\n\u003c/p\u003e\n\ntinyMLP uses two modern components that, combined, dramatically outperform the naive Fourier+GELU approach:\n\n**Multi-resolution Hash Encoding** (Müller et al. 2022, Instant-NGP):  \nFor each coordinate `(x, y)`, we query L=16 spatial grids at geometrically spaced resolutions (coarse 16px → fine 512px). At each level, the 4 surrounding grid corners are looked up in a small learnable hash table and bilinearly interpolated. The 16 interpolated feature vectors are concatenated into a 32-dim descriptor. This is fast, differentiable, and inherently multi-scale — coarse levels capture global structure, fine levels capture detail.\n\n```\n(x, y) → [query 16 grids] → [hash lookup + bilinear interp] → 32-dim features\n```\n\n**SIREN decoder** (Sitzmann et al. 2020):  \nThe 32 hash features are decoded by 2 hidden layers using `sin(ω₀ · Wx + b)` activations rather than ReLU/GELU. Sinusoidal activations are theoretically well-suited for representing continuous signals and their derivatives — they avoid the spectral bias that makes ReLU networks converge slowly on high-frequency content.\n\n```\n32 features → [sin(ω₀·Wx)] × depth → Linear → Sigmoid → (r, g, b)\n```\n\n**Two separate learning rates** ensure stable training: the hash tables update fast (`lr=3e-2`), the SIREN decoder updates slowly (`lr=1e-3`).\n\n### Latent Texture: a Different Compression Primitive\n\n`tinyLatent.py` swaps the **sparse hash tables** for a **dense 2D feature grid** — the _latent texture_ — while keeping the SIREN decoder identical.  This makes the comparison orthogonal: only the encoder changes.\n\n```\nHashEncoding  →  32 features  ─┐\n                                ├─ same SIREN decoder ─ (r, g, b)\nLatentTexture →  32 features  ─┘\n```\n\n**How the latent texture works:**  \nA learnable parameter tensor of shape `C × (H/scale) × (W/scale)` (e.g. `32 × 64 × 64` for a 512×512 image with `--scale 8`) is sampled at the query coordinate using `F.grid_sample` (bilinear, border padding).  The downscaled grid acts as a compressible intermediate representation:\n\n- **Spatial coherence** is enforced by construction — nearby pixels read similar features.\n- **The grid can be saved as float16 `.npy`** and further compressed with standard image codecs (JPEG, PNG), enabling true two-stage compression.\n- **No hash collisions** — every feature occupies an explicit spatial slot. This trades capacity for interpretability: you can literally visualise what the network remembers.\n\n#### Two-Scale Latent\n\nThe single-scale latent's weakness — no multi-resolution structure — can be addressed by adding a second, finer grid, directly mirroring [RTXNTC](https://github.com/NVIDIA-RTX/RTXNTC)'s dual-resolution latent shape:\n\n```\nLo grid  (ch_lo channels, scale_lo=8 → 64×64)   — global structure\nHi grid  (ch_hi channels, scale_hi=4 → 128×128)  — fine detail\n         ↓ concat → 32-dim total → SIREN decoder (unchanged)\n```\n\n```bash\npython tinyLatent.py --ch_lo 16 --ch_hi 2   # two-scale, ~104K params\n```\n\n#### Quantization-Aware Training (QAT)\n\nWith `--qat_bits 8`, the latent values are **fake-quantised** during training using a Straight-Through Estimator (STE): the decoder learns to tolerate discrete integer values, so at save time the latent is stored as real `uint8`. This is exactly what JPEG and PNG expect — making the two-stage compression path far more efficient.\n\n```bash\npython tinyLatent.py --ch_lo 16 --ch_hi 2 --qat_bits 8   # QAT int8\npython tinyLatent.py --ch_lo 16 --ch_hi 2 --qat_bits 8 --save_latent latent.npz\n# latent.npz contains per-channel uint8 data + float32 min/max metadata\n```\n\nThe STE trick: in the forward pass, values are rounded to 256 levels per channel. In the backward pass, gradients flow through the rounding as if it were identity — so standard Adam can still train through the quantisation.\n\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"images/fig_latent_architecture.png\" alt=\"tinyLatent two-scale + QAT architecture\" width=\"100%\"/\u003e\n\u003c/p\u003e\n\n#### Orthogonal Comparison\n\nAll four variants use the same `hidden=64, depth=2` SIREN decoder, trained for the same steps on lossless `sample.png`:\n\n| Encoder | Params | Storage (stage-1) | @500 steps | @2000 steps | @5000 steps |\n|---|---|---|---|---|---|\n| Hash Default (`log2_T=12`) | 101,399 | 397 KB fp32 (1.9×) | 34.9 dB | 37.9 dB | **39.5 dB** |\n| Latent Single-scale (`ch=32, scale=8`) | 137,539 | 550 KB fp32 (1.4×) | 30.4 dB | 35.3 dB | 37.5 dB |\n| Latent Two-scale (`ch_lo=16/s8 + ch_hi=2/s4`) | 103,875 | 185 KB fp16 (4.2×) | 32.0 dB | 35.9 dB | 37.7 dB |\n| **Two-scale + QAT int8** | 103,875 | **28.6 KB** (7.2× → **26.9×** w/ JPEG) | 32.0 dB | 36.0 dB | **37.6 dB** |\n\n**Key observations:**\n- Hash encoding still converges fastest owing to its 16-level multi-scale design.\n- Two-scale latent outperforms single-scale by +0.2 dB at 5000 steps while using **25% fewer parameters** (104K vs 138K) — multi-resolution structure helps even with just two levels.\n- **QAT int8** costs only ~0.2 dB vs non-quantised but unlocks a two-stage path: uint8 grids compress to 17.7 KB under JPEG, and together with the 10.9 KB fp16 decoder the total artifact is **28.6 KB — 26.9× smaller** than the 768 KB raw image.\n- The latent grid's unique advantage remains: it is a **concrete, spatial artifact** — visualisable, quantisable, and compressible with standard image codecs.\n\n#### Decoder Activation: SIREN vs GELU vs SiLU\n\nAlthough bilinear `F.grid_sample` produces spatially smooth feature vectors, swapping SIREN for GELU or SiLU costs **~5 dB** at the same budget. Sinusoidal activations provide a far richer non-linear basis for the 18D→RGB mapping than piecewise-smooth activations can with only 2 layers and 64 hidden units. Use `--activation gelu` for experiments where training stability matters more than peak quality.\n\n| Decoder | @500 steps | @2000 steps | @5000 steps |\n|---|---|---|---|\n| SIREN (default) | 32.0 dB | 36.1 dB | **38.0 dB** |\n| GELU MLP | 28.5 dB | 31.3 dB | 32.8 dB |\n| SiLU MLP | 28.0 dB | 31.0 dB | 32.1 dB |\n\n*(Two-scale latent, `ch_lo=16 + ch_hi=2`, `hidden=64 depth=2`, 5000 steps)*\n\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"images/fig_latent_vs_hash_grid.png\" alt=\"Hash vs Latent quality grid\" width=\"100%\"/\u003e\n\u003c/p\u003e\n\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"images/fig_latent_vs_hash_curves.png\" alt=\"PSNR curves: Hash vs Latent\" width=\"75%\"/\u003e\n\u003c/p\u003e\n\n#### Inside the Latent Grid\n\nThe figure below shows (left to right): the original image crop, the first three latent channels rendered as RGB, the final reconstruction, and the per-pixel error heatmap.\n\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"images/fig_latent_visualization.png\" alt=\"Latent texture internals\" width=\"100%\"/\u003e\n\u003c/p\u003e\n\nThe latent channel preview shows a blurry, compressed-looking version of the scene — the network has \"pre-decoded\" the image into a coarse feature map, leaving the SIREN decoder to hallucinate fine-grained texture from position alone.\n\n#### Two-Stage Compression Potential\n\nBecause the latent grid is a regular array, it can be compressed a second time with standard image codecs:\n\n```bash\n# fp16 path (no QAT) — save as float16 .npy\npython tinyLatent.py --save_latent latent.npy\n# Latent fp16 (.npy): ~256 KB for 32×64×64; further JPEG: ~30–50 KB savings\n\n# uint8 path (QAT) — save as per-channel uint8 .npz  ← much more compressible\npython tinyLatent.py --ch_lo 16 --ch_hi 2 --qat_bits 8 --save_latent latent.npz\n# latent.npz: ~100 KB; JPEG of uint8 latent: typically 3–5× smaller than fp16\n```\n\nFor the default two-scale + QAT int8 configuration on a 512×512 image (768 KB uncompressed), the breakdown is:\n\n| Component | Raw size | After JPEG Q=85 |\n|---|---|---|\n| Lo grid uint8 (16×64×64) | 64.0 KB | **12.2 KB** (5.2×) |\n| Hi grid uint8 (2×128×128) | 32.0 KB | **5.5 KB** (5.9×) |\n| Decoder fp16 | 10.9 KB | 10.9 KB (no codec needed) |\n| **Total** | 96 KB (7.2× stage-1) | **28.6 KB (26.9× stage-2)** |\n\nThe uint8 latent grids are spatially smooth (bilinear sampling enforces spatial coherence), so JPEG achieves 5â6× compression on them — far better than on random float data. This is the same two-stage architecture used by [RTXNTC](https://github.com/NVIDIA-RTX/RTXNTC) in production.\n\n### Interactive Demo\n\nRun `python tinyMLP.py` to watch the network learn an image in real-time:\n\n1. The window shows **Original** (left) and **MLP Reconstruction** (right) side by side.\n2. Within the first seconds, a clean low-frequency image appears — hash tables learn the broad structure fast.\n3. Within 1–2 minutes, fine-grained texture and edges snap into focus.\n4. The status bar tracks step, loss, PSNR, compression ratio, and elapsed time live.\n\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"images/fig_mlp_demo.png\" alt=\"tinyMLP interactive demo screenshot\" width=\"100%\"/\u003e\n\u003c/p\u003e\n\n### Quality vs Steps vs Model Size\n\nThe table below shows measured PSNR for three architectures at three checkpoints on the 512×512 test image:\n\n| Architecture | Params | Size | Compression | 500 steps | 2000 steps | 5000 steps |\n|---|---|---|---|---|---|---|\n| Tiny `--log2_T 10 --hidden 32 --depth 1` | 12,317 | 48 KB | **16x** | 25.6 dB | 27.0 dB | 27.5 dB |\n| Default `--log2_T 12 --hidden 64 --depth 2` | 101,399 | 397 KB | **1.9x** | 34.8 dB | 37.9 dB | **39.4 dB** |\n| Large `--log2_T 14 --hidden 128 --depth 3` | 322,145 | 1.26 MB | 0.6x | 39.4 dB | 43.4 dB | **46.0 dB** |\n\nCompare to the old Fourier+GELU baseline: Default config went from 27.9 dB → **39.4 dB** (+11.5 dB), and Tiny from 24.3 dB → **27.5 dB** (+3.2 dB).\n\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"images/fig_mlp_comparison.png\" alt=\"tinyMLP quality comparison grid\" width=\"100%\"/\u003e\n\u003c/p\u003e\n\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"images/fig_mlp_psnr_curve.png\" alt=\"PSNR vs training steps\" width=\"75%\"/\u003e\n\u003c/p\u003e\n\nThe hash encoding enables rapid early convergence — the Default config already hits 34.8 dB at step 500. The PSNR curves still haven't fully flattened at 5000 steps, so more training continues to help.\n\n---\n\n## Results\n\n### tinyBC — Block Compression\n\n| Metric | PCA Only | PCA + Nelder-Mead |\n|---|---|---|\n| **PSNR** | 38.46 dB | **40.05 dB** |\n| Max per-pixel error | 0.1897 | **0.1601** |\n\nThe Nelder-Mead refinement adds ~1.6 dB PSNR — a meaningful improvement, especially on blocks with complex color distributions (edges, specular highlights, mixed materials).\n\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"images/fig_comparison.png\" alt=\"Quality comparison with error maps\" width=\"100%\"/\u003e\n\u003c/p\u003e\n\nThe error maps (bottom row) use the `inferno` colormap — brighter means more error. Notice how the Nelder-Mead version has fewer bright spots, especially around the grille bars and headlight edges where color variation is highest.\n\n### tinyMLP — Neural Compression (Hash Encoding + SIREN)\n\n| Model | Params | Size | Compression | PSNR @ 5000 steps |\n|---|---|---|---|---|\n| Large `--log2_T 14 --hidden 128 --depth 3` | 322K | 1.26 MB | 0.6x | **46.0 dB** |\n| Default `--log2_T 12 --hidden 64 --depth 2` | 101K | 397 KB | 1.9x | **39.4 dB** |\n| Tiny `--log2_T 10 --hidden 32 --depth 1` | 12K | 48 KB | 16x | **27.5 dB** |\n\nSwitching from Fourier+GELU to Hash Encoding+SIREN boosted Default config by **+11.5 dB PSNR**. Neural compression still trades decoding throughput for smooth, artifact-free quality — but at these PSNR levels the reconstructions are visually nearly indistinguishable from the original.\n\n### tinyLatent — Neural Compression (Latent Texture + SIREN)\n\nSame SIREN decoder as tinyMLP (identical `hidden=64, depth=2`). Only the encoder changes — sparse hash tables → dense bilinear-sampled grid.\n\n| Encoder | Params | Compression | 500 steps | 2000 steps | 5000 steps |\n|---|---|---|---|---|---|\n| Hash Default (`log2_T=12`) | 101,399 | 1.9× (stage-1) | 34.9 dB | 37.9 dB | **39.5 dB** |\n| Latent Single-scale (`ch=32, scale=8`) | 137,539 | 1.4× (stage-1) | 30.4 dB | 35.3 dB | 37.5 dB |\n| Latent Two-scale (`ch_lo=16/s8 + ch_hi=2/s4`) | 103,875 | 4.2× (stage-1) | 32.0 dB | 35.9 dB | **37.7 dB** |\n| Two-scale + QAT int8 (`--qat_bits 8`) | 103,875 | **7.2×** stage-1 / **26.9×** stage-2 | 32.0 dB | 36.0 dB | **37.6 dB** |\n\nTwo-scale latent matches Hash Default's parameter budget and narrows the PSNR gap vs single-scale while using fewer parameters. QAT int8 adds negligible quality cost (−0.2 dB) but unlocks a powerful two-stage compression path: the uint8 grids JPEG-compress to 28.6 KB total — a **26.9×** compression ratio on a 512×512 image at 37.6 dB. See the two-stage compression section above for the full breakdown.\n\n---\n\n## Code Walkthrough\n\n### `tinybc.slang` — The GPU Block Compressor (~379 lines)\n\n| Section | Lines | What it does |\n|---|---|---|\n| `compute_unorm_end_point_and_unorm_weight` | 53–149 | PCA-based initial endpoint + weight estimation |\n| `compute_weights` | 152–172 | Given endpoints, project all 16 pixels onto the endpoint line to get weights |\n| `compute_loss` | 175–188 | MSE between original pixels and their endpoint-interpolated reconstructions |\n| `sort_simplex` | 191–206 | Bubble sort the 9 simplex vertices by loss (ascending) |\n| `compute_centroid` | 209–217 | Mean of the 8 best vertices (excluding the worst) |\n| `encoder` | 219–378 | Main entry point: load block → PCA → (optional) Nelder-Mead → write output |\n\n### `tinybc.py` — The BC7 Python Driver (~72 lines)\n\nLoads the input texture via `sgl.TextureLoader`, creates an output texture, dispatches the `encoder` kernel over all 4×4 tiles, and computes PSNR.\n\n### `tinyMLP.py` — Hash Encoding Neural Compressor (~280 lines)\n\n| Class / function | What it does |\n|---|---|\n| `HashEncoding` | Multi-resolution hash tables; bilinear interp + spatial hash lookup at 16 resolutions |\n| `SirenLayer` | `sin(ω₀ · Wx + b)` with SIREN weight initialization for stable deep networks |\n| `ImageMLP` | Combines `HashEncoding` → `SirenLayer` stack → `Linear + Sigmoid` head |\n| `render_full` | Chunked full-image forward pass (avoids OOM on large images) |\n| `main` | Arg parsing, dual-LR Adam optimizer, cosine LR schedule, OpenCV/mpl display loop |\n\n### `tinyLatent.py` — Latent Texture Neural Compressor (~350 lines)\n\n| Class / function | What it does |\n|---|---|\n| `quantize_ste_perchannel` | Per-channel fake quantisation with Straight-Through Estimator; enables QAT training |\n| `save_latent_uint8` | Saves a float32 latent as per-channel uint8 `.npz` (actual compressed artifact for QAT mode) |\n| `LatentTexture` | Two-scale dense grids (`lo` + optional `hi`); bilinear `F.grid_sample`; QAT applied in `forward` via `set_step` |\n| `LatentImageMLP` | Combines `LatentTexture` → `SirenLayer` stack → `Linear + Sigmoid` (identical decoder to tinyMLP) |\n| `estimate_jpeg_size` / `estimate_jpeg_size_uint8` | Estimate JPEG-compressed latent size for two-stage compression potential reporting |\n| `main` | Single-LR Adam, cosine schedule, `--vis_latent` 3rd panel, `--save_latent` fp16/uint8 export |\n\n### `generate_mlp_figures.py` — Figure Generator\n\n| Function | Output |\n|---|---|\n| `fig_mlp_architecture` | `images/fig_mlp_architecture.png` — Hash Encoding + SIREN architecture diagram |\n| `train_snapshots` | Headless Hash model training; returns PSNR snapshots + curve |\n| `fig_mlp_comparison` | `images/fig_mlp_comparison.png` + `fig_mlp_psnr_curve.png` — Hash model grid |\n| `train_latent_snapshots` | Headless Latent model training; returns PSNR snapshots + curve |\n| `fig_latent_vs_hash` | `images/fig_latent_vs_hash_grid.png` + `fig_latent_vs_hash_curves.png` — orthogonal comparison |\n| `fig_latent_visualization` | `images/fig_latent_visualization.png` — 4-panel latent internals figure |\n\n```\ntinyASTC/\n├── tinybc.slang             # GPU compute shader (block compressor)\n├── tinybc.py                # BC7 driver script\n├── tinyMLP.py               # Neural compression — Hash Encoding + SIREN\n├── tinyLatent.py            # Neural compression — Latent Texture + SIREN\n├── sample.png               # Test input image (lossless)\n├── generate_figures.py      # Generate tinyBC educational figures\n├── generate_mlp_figures.py  # Generate all MLP comparison figures\n├── images/                  # All generated figures\n└── LICENSE                  # MIT\n```\n\n---\n\n## License\n\nMIT License. See [LICENSE](LICENSE) for details.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fweakknight%2Ftinyastc","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fweakknight%2Ftinyastc","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fweakknight%2Ftinyastc/lists"}