{"id":50715244,"url":"https://github.com/nshkrdotcom/ml_musings","last_synced_at":"2026-06-09T18:32:16.644Z","repository":{"id":360507917,"uuid":"1249668824","full_name":"nshkrdotcom/ml_musings","owner":"nshkrdotcom","description":"Foundations: A premium, hands-on educational curriculum exploring high-dimensional geometry, measure concentration, linear probing, self-attention routing, PEFT (LoRA and SVD surgery), evolution strategies, and sparse Mixture of Experts (MoE) gating. Written from scratch in Numerical Elixir (Nx) and compiled via EXLA.","archived":false,"fork":false,"pushed_at":"2026-05-26T17:12:21.000Z","size":158,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-05-26T19:09:29.175Z","etag":null,"topics":["artificial-intelligence","attention-mechanism","automatic-differentiation","black-box-optimization","deep-learning","dimension-reduction","education","elixir","elixir-nx","evolution-strategy","exla","gpu-acceleration","gradient-descent","high-dimensional-geometry","linear-probing","lora","machine-learning","mixture-of-experts","neural-networks","nshkr-research"],"latest_commit_sha":null,"homepage":"","language":"Elixir","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/nshkrdotcom.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-05-26T00:02:43.000Z","updated_at":"2026-05-26T17:13:05.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/nshkrdotcom/ml_musings","commit_stats":null,"previous_names":["nshkrdotcom/ml_musings"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/nshkrdotcom/ml_musings","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nshkrdotcom%2Fml_musings","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nshkrdotcom%2Fml_musings/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nshkrdotcom%2Fml_musings/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nshkrdotcom%2Fml_musings/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/nshkrdotcom","download_url":"https://codeload.github.com/nshkrdotcom/ml_musings/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nshkrdotcom%2Fml_musings/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":34121021,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-09T02:00:06.510Z","response_time":63,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["artificial-intelligence","attention-mechanism","automatic-differentiation","black-box-optimization","deep-learning","dimension-reduction","education","elixir","elixir-nx","evolution-strategy","exla","gpu-acceleration","gradient-descent","high-dimensional-geometry","linear-probing","lora","machine-learning","mixture-of-experts","neural-networks","nshkr-research"],"created_at":"2026-06-09T18:32:16.359Z","updated_at":"2026-06-09T18:32:16.639Z","avatar_url":"https://github.com/nshkrdotcom.png","language":"Elixir","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003cp align=\"center\"\u003e\n  \u003cimg src=\"assets/ml_musings.svg\" width=\"200\" height=\"200\" alt=\"ML Musings Logo\" /\u003e\n\u003c/p\u003e\n\n\u003cp align=\"center\"\u003e\n  \u003ca href=\"https://github.com/nshkrdotcom/ml_musings\"\u003e\n    \u003cimg src=\"https://img.shields.io/github/stars/nshkrdotcom/ml_musings?style=for-the-badge\u0026logo=github\u0026color=38bdf8\u0026logoColor=fff\" alt=\"GitHub Repository\" /\u003e\n  \u003c/a\u003e\n  \u003ca href=\"https://opensource.org/licenses/MIT\"\u003e\n    \u003cimg src=\"https://img.shields.io/badge/license-MIT-purple?style=for-the-badge\u0026color=a855f7\" alt=\"MIT License\" /\u003e\n  \u003c/a\u003e\n\u003c/p\u003e\n\n# TRINITY Foundations: High-Dimensional Geometry, Attention, PEFT, and Black-Box Optimization\n\nThis repository houses a comprehensive, hands-on educational curriculum designed to explore the mathematical, geometric, and computational foundations undergirding modern AI models and coordination systems like **TRINITY**.\n\nAll implementations are written from scratch in **Numerical Elixir (`Nx`)** and compiled natively to your hardware via **EXLA (Elixir XLA Compiler)**. The scripts are fully configured to run with native **CUDA GPU acceleration** (specifically tested on the **NVIDIA GeForce RTX 5060 Ti**), extracting absolute peak performance from the silicon.\n\n---\n\n## 📚 Curriculum Structure \u0026 File Catalog\n\nThe curriculum is structured into 6 distinct lessons, progressing from core high-dimensional tensor mechanics through MoE-style routing and black-box optimization.\n\n### 📐 Lesson 1: The Geometry of High-Dimensional Spaces (Quasi-Orthogonality)\nThis lesson investigates the unique geometric properties of the spaces where LLM vector representations (embeddings and hidden states) reside. It validates the phenomenon of **Quasi-Orthogonality**—often called the \"Blessing of Dimensionality\"—which explains how models store independent semantic concepts in perpendicular directions.\n\n*   **[01_list_math.exs](file:///home/home/p/g/n/ml_musings/trinity_foundations_elixir/01_list_math.exs)**: *CPU Linked-List Baseline*. Demonstrates why standard Elixir lists (singly-linked lists with pointer-chasing overhead and lack of memory contiguity) are computationally prohibitive for machine learning.\n*   **[02_tensor_math.exs](file:///home/home/p/g/n/ml_musings/trinity_foundations_elixir/02_tensor_math.exs)**: *Eager Device Tensors*. Benchmarks contiguous, binary-backed tensors on CPU and GPU, highlighting the orders-of-magnitude speedup achieved by hardware acceleration.\n*   **[03_compiler.exs](file:///home/home/p/g/n/ml_musings/trinity_foundations_elixir/03_compiler.exs)**: *JIT Graph Compilation*. Explains the difference between eager Elixir tensor evaluation and JIT-compiled computation graphs (`defn` with the `EXLA` backend) which compile directly into LLVM/PTX assembly.\n*   **[04_dot_product.exs](file:///home/home/p/g/n/ml_musings/trinity_foundations_elixir/04_dot_product.exs)**: *Cosine Similarity Mechanics*. Implements vector L2-normalization and dot-product similarity, showing how vector angles represent semantic alignment.\n*   **[quasi_orthogonality.exs](file:///home/home/p/g/n/ml_musings/trinity_foundations_elixir/quasi_orthogonality.exs)**: *Empirical Geometry Simulation*. Randomly samples pairs of unit vectors across varying dimensions (from $D=2$ to $D=8192$) and plots their similarity distributions using a custom **ASCII 95% Dispersion Bar** to visually witness the contraction of similarity density towards exactly $0.0$ (perpendicularity).\n*   **[hoeffding_bound.exs](file:///home/home/p/g/n/ml_musings/trinity_foundations_elixir/hoeffding_bound.exs)**: *Mathematical Bound Validation*. Compares empirical vector similarity distributions against the theoretical exponential decay bounds calculated via **Hoeffding's Inequality**.\n*   **[lesson_1_notes.txt](file:///home/home/p/g/n/ml_musings/trinity_foundations_elixir/lesson_1_notes.txt)**: *Theoretical Notes*. Mathematical derivations of quasi-orthogonality, measure concentration, and the Curse vs. Blessing of Dimensionality.\n\n---\n\n### 🔎 Lesson 2: Linear Probing and Representational Geometry\nExplores how semantic information is represented inside an LLM's activation space, and how we can extract that information using a **Linear Probe**—a simple, non-destructive classifier that learns a separating hyperplane.\n\n*   **[05_linear_probe.exs](file:///home/home/p/g/n/ml_musings/trinity_foundations_elixir/05_linear_probe.exs)**: *Differentiable Logistic Classifier*. Implements a binary linear probe from scratch. Uses **Automatic Differentiation (`Nx.grad/2`)** and batch gradient descent on the GPU to learn a separating hyperplane that extracts concept vectors (classifying \"Math\" vs. \"Writing\" embeddings). The latest run reaches ~99.6% validation accuracy (Wilson 95% CI [98.55%, 99.89%], N=500); see `05_linear_probe.exs` for the exact print.\n*   **[lesson_2_notes.txt](file:///home/home/p/g/n/ml_musings/trinity_foundations_elixir/lesson_2_notes.txt)**: *Conceptual Notes*. Deep dive into embedding lookups, the Manifold Hypothesis in deep learning, coordinate systems, and the geometric meaning of separating hyperplanes.\n\n---\n\n### 🔄 Lesson 3: The Self-Attention Mechanism (Queries, Keys, and Values)\nDe-magic-ifies the core engine of modern Transformers by implementing a fully functional, differentiable Self-Attention head from scratch, framing it as an **Information Routing System**.\n\n*   **[06_self_attention.exs](file:///home/home/p/g/n/ml_musings/trinity_foundations_elixir/06_self_attention.exs)**: *Attention Routing Head*. Implements the Query, Key, and Value ($Q, K, V$) projections, the scaled dot-product similarity matrix, stable softmax, and attention routing weights to update token representations dynamically.\n*   **[07_softmax_collapse.exs](file:///home/home/p/g/n/ml_musings/trinity_foundations_elixir/07_softmax_collapse.exs)**: *Softmax Entropy \u0026 Numerical Stability*. Illustrates why the division by the scaling factor $\\sqrt{D_k}$ is mathematically necessary. Geometrically demonstrates **Softmax Collapse** (vanishing gradients/over-saturation) in high-dimensional attention and evaluates numerical overflow behaviors under extreme logit distributions.\n*   **[lesson_3_notes.txt](file:///home/home/p/g/n/ml_musings/trinity_foundations_elixir/lesson_3_notes.txt)**: *Attention Physics*. Detailed notes on semantic vector shifting, entropy dynamics, and mathematical proofs of softmax scale-invariance.\n\n---\n\n### 📉 Lesson 4: Parameter-Efficient Adaptation (PEFT, Rank, SVD, and LoRA)\nExplores how to fine-tune massive pre-trained model weights efficiently. We perform surgery on weight matrices using **Singular Value Decomposition (SVD)** and implement a low-rank bypass adapter (LoRA) from scratch.\n\n*   **[08_lora_and_svd.exs](file:///home/home/p/g/n/ml_musings/trinity_foundations_elixir/08_lora_and_svd.exs)**: *SVD Surgery \u0026 LoRA Layer*. Decomposes a redundant weight matrix on the GPU, truncates it to Rank-1, and implements the parallel forward pass of a **Low-Rank Adaptation (LoRA)** layer. The code uses row-batched inputs (one token per row of $X$), so the implementation expresses the Hu et al. (2021) update $h = W_0 x + (\\alpha/r) \\, B A x$ as $Y = X W_0^\\top + (\\alpha/r) \\, X A^\\top B^\\top$.\n*   **[09_non_redundant_compression.exs](file:///home/home/p/g/n/ml_musings/trinity_foundations_elixir/09_non_redundant_compression.exs)**: *Lossy Compression Benchmark*. Generates a full-rank, non-redundant random matrix, collapses it down to Rank-1 using SVD, and calculates exact information loss metrics using **Mean Squared Error (MSE)** and **Frobenius Norm** reconstruction errors on the device.\n*   **[lesson_4_notes.txt](file:///home/home/p/g/n/ml_musings/trinity_foundations_elixir/lesson_4_notes.txt)**: *Comparative Notes*. Compares LoRA's low-rank coordinate bypass with **Singular Value Fine-Tuning (SVF)**. Explains why adapting *only* diagonal singular values ($\\Sigma$) achieves zero inference computational overhead.\n\n---\n\n### 🎲 Lesson 5: Black-Box Optimization (The Evolution Strategy)\nExplores optimization landscapes where backpropagation is mathematically impossible, such as coordinating and routing between external, isolated APIs (Claude, Gemini, GPT-4) across non-differentiable network boundaries.\n\n*   **[10_evolution_strategy.exs](file:///home/home/p/g/n/ml_musings/trinity_foundations_elixir/10_evolution_strategy.exs)**: *GPU-Compiled Sphere Optimizer*. Implements an **Evolution Strategy (ES)** from scratch in Elixir. Optimizes a noisy, 2D Sphere landscape $f(x_1, x_2) = x_1^2 + x_2^2 + \\text{noise}$ to find the origin $[0.0, 0.0]$ using a population of stochastic \"scout\" mutations and weighted recombinations, completely bypassing gradients.\n*   **[11_rosenbrock_es.exs](file:///home/home/p/g/n/ml_musings/trinity_foundations_elixir/11_rosenbrock_es.exs)**: *Rosenbrock Curved Valley Solver*. Tests the ES optimizer on the highly non-separable, curved **Rosenbrock function** (the \"banana function\") to demonstrate how population-level average scout vectors navigate steep, narrow valleys.\n*   **[lesson_5_notes.txt](file:///home/home/p/g/n/ml_musings/trinity_foundations_elixir/lesson_5_notes.txt)**: *API Boundary Coordination Notes*. Documents the three factors that block backpropagation across third-party models (Weight Isolation, Discrete Token Discontinuity, and Network Socket Barriers). Explores the mathematical architecture of **`sep-CMA-ES`** used in TRINITY and why diagonal covariance scaling achieves $O(D)$ linear complexity.\n\n### 🧩 Lesson 6: Mixture of Experts (MoE) \u0026 Gating Load Balancing\nThis lesson addresses efficient model scaling using sparse parallel \"expert\" neural networks. It covers routing activations and introduces load-balancing techniques to prevent GPU bottlenecks and coordinate API delegation.\n\n*   **[12_moe_gating.exs](file:///home/home/p/g/n/ml_musings/trinity_foundations_elixir/12_moe_gating.exs)**: *MoE Router \u0026 Load Loss*. Implements a Top-1 sparse routing gating projection and compiles an **Auxiliary Load-Balancing Loss** from scratch on the GPU, evaluating collapsed vs. balanced routing distributions.\n*   **[13_loss_curve.exs](file:///home/home/p/g/n/ml_musings/trinity_foundations_elixir/13_loss_curve.exs)**: *Gating Imbalance Penalty Curve*. Simulates multiple token load distributions across 4 parallel experts and calculates simulated auxiliary losses on the GPU, printing a custom ASCII penalty bar chart to visualize the skew penalty.\n*   **[lesson_6_notes.txt](file:///home/home/p/g/n/ml_musings/trinity_foundations_elixir/lesson_6_notes.txt)**: *API System Balance Notes*. Compares intra-model FFN block routing with TRINITY's external API coordination, documenting the critical operational, financial, and rate-limiting impacts of Expert Collapse in multi-agent environments.\n\n---\n\n### 🏛️ Lesson 7: Capstone - The Mini-TRINITY Framework\nThis capstone project compiles all foundational components developed in Lessons 1–6 into a stateful, governed execution loop substrate. It implements a closed-loop control write-path routing and executing intents with runtime coordinate warping for dynamic model escalation.\n\n*   **[14_mini_trinity.exs](file:///home/home/p/g/n/ml_musings/14_mini_trinity.exs)**: *The Mini-TRINITY Substrate*. Integrates JIT-compiled routing projections (`defn` + `stable_softmax`), diverse mock expert models, custom semantic verification sensors, and control-loop representation warping to adjust routing vectors on expert execution failures.\n\n---\n\n\n## ⚡ Prerequisites \u0026 System Installation\n\nEnsure Erlang, Elixir, and the required CUDA drivers are installed on your Linux system.\n\n```bash\n# Update Ubuntu package lists and install Elixir + Erlang BEAM VM\nsudo apt-get update\nsudo apt-get install -y erlang elixir\n\n# Verify Elixir installation\nelixir --version\n```\n\n### Hex Package Management\nAll scripts leverage Elixir's runtime package installer (`Mix.install/2`) to pull the absolute latest stable versions of Numerical Elixir and its XLA compiler bindings directly from Hex:\n*   `nx ~\u003e 0.12.0`\n*   `exla ~\u003e 0.12.0`\n\n---\n\n## 🚀 Execution Guide\n\nRun any of the curriculum scripts directly from your bash terminal.\n\n### Running with Native NVIDIA CUDA GPU Acceleration\nTo compile computation graphs directly into optimized CUDA GPU kernels on your **NVIDIA GeForce RTX 5060 Ti**, set the `XLA_TARGET` environment variable before executing:\n\n```bash\n# Execute Lesson 1 empirical simulations\nXLA_TARGET=cuda12 elixir quasi_orthogonality.exs\nXLA_TARGET=cuda12 elixir hoeffding_bound.exs\n\n# Run the JIT compiler benchmark\nXLA_TARGET=cuda12 elixir 03_compiler.exs\n\n# Train the Linear Probe classifier\nXLA_TARGET=cuda12 elixir 05_linear_probe.exs\n\n# Run the Self-Attention Head implementation\nXLA_TARGET=cuda12 elixir 06_self_attention.exs\n\n# Execute LoRA low-rank adaptation\nXLA_TARGET=cuda12 elixir 08_lora_and_svd.exs\n\n# Run Lesson 5 Evolution Strategy Optimizers\nXLA_TARGET=cuda12 elixir 10_evolution_strategy.exs\nXLA_TARGET=cuda12 elixir 11_rosenbrock_es.exs\n\n# Run Lesson 6 Mixture of Experts Router \u0026 Loss Curve\nXLA_TARGET=cuda12 elixir 12_moe_gating.exs\nXLA_TARGET=cuda12 elixir 13_loss_curve.exs\n\n# Run Lesson 7 Capstone Mini-TRINITY Framework\nXLA_TARGET=cuda12 elixir 14_mini_trinity.exs\n```\n\n### Running on CPU (Fallback)\nIf a CUDA GPU is not available, EXLA will automatically fall back to CPU compilation, or you can run using standard native CPU execution by omitting `XLA_TARGET`:\n\n```bash\nelixir 10_evolution_strategy.exs\nelixir 13_mini_trinity.exs\n```\n\n\n---\n\n## 🔬 Key Pedagogical Highlights\n\n1.  **Hardware-Level Contiguity**: Understand the transition from Elixir's heap-allocated linked lists to flat, binary-backed hardware buffers, speeding up basic matrix multiplications from **30+ ms** to **0.5 ms**.\n2.  **No Arbitrary Placeholders**: Every script is self-contained, using real statistical distributions, empirical bounds checking, and rigorous loss metrics.\n3.  **Automatic Differentiation**: See how `Nx.grad` automatically traverses computation graphs, compiling backpropagation steps to parallelized hardware kernels.\n4.  **Black-Box Robustness**: Witness how Evolution Strategies filter out heavy evaluation noise ($\\sigma = 0.1$ random variance) through population-level mean recombinations, converging reliably where traditional gradient descent would stall.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnshkrdotcom%2Fml_musings","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fnshkrdotcom%2Fml_musings","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnshkrdotcom%2Fml_musings/lists"}