https://github.com/nshkrdotcom/ml_musings
Foundations: A premium, hands-on educational curriculum exploring high-dimensional geometry, measure concentration, linear probing, self-attention routing, PEFT (LoRA and SVD surgery), evolution strategies, and sparse Mixture of Experts (MoE) gating. Written from scratch in Numerical Elixir (Nx) and compiled via EXLA.
https://github.com/nshkrdotcom/ml_musings
artificial-intelligence attention-mechanism automatic-differentiation black-box-optimization deep-learning dimension-reduction education elixir elixir-nx evolution-strategy exla gpu-acceleration gradient-descent high-dimensional-geometry linear-probing lora machine-learning mixture-of-experts neural-networks nshkr-research
Last synced: 8 days ago
JSON representation
Foundations: A premium, hands-on educational curriculum exploring high-dimensional geometry, measure concentration, linear probing, self-attention routing, PEFT (LoRA and SVD surgery), evolution strategies, and sparse Mixture of Experts (MoE) gating. Written from scratch in Numerical Elixir (Nx) and compiled via EXLA.
- Host: GitHub
- URL: https://github.com/nshkrdotcom/ml_musings
- Owner: nshkrdotcom
- License: mit
- Created: 2026-05-26T00:02:43.000Z (23 days ago)
- Default Branch: main
- Last Pushed: 2026-05-26T17:12:21.000Z (22 days ago)
- Last Synced: 2026-05-26T19:09:29.175Z (22 days ago)
- Topics: artificial-intelligence, attention-mechanism, automatic-differentiation, black-box-optimization, deep-learning, dimension-reduction, education, elixir, elixir-nx, evolution-strategy, exla, gpu-acceleration, gradient-descent, high-dimensional-geometry, linear-probing, lora, machine-learning, mixture-of-experts, neural-networks, nshkr-research
- Language: Elixir
- Homepage:
- Size: 154 KB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- License: LICENSE
Awesome Lists containing this project
README
# TRINITY Foundations: High-Dimensional Geometry, Attention, PEFT, and Black-Box Optimization
This repository houses a comprehensive, hands-on educational curriculum designed to explore the mathematical, geometric, and computational foundations undergirding modern AI models and coordination systems like **TRINITY**.
All implementations are written from scratch in **Numerical Elixir (`Nx`)** and compiled natively to your hardware via **EXLA (Elixir XLA Compiler)**. The scripts are fully configured to run with native **CUDA GPU acceleration** (specifically tested on the **NVIDIA GeForce RTX 5060 Ti**), extracting absolute peak performance from the silicon.
---
## π Curriculum Structure & File Catalog
The curriculum is structured into 6 distinct lessons, progressing from core high-dimensional tensor mechanics through MoE-style routing and black-box optimization.
### π Lesson 1: The Geometry of High-Dimensional Spaces (Quasi-Orthogonality)
This lesson investigates the unique geometric properties of the spaces where LLM vector representations (embeddings and hidden states) reside. It validates the phenomenon of **Quasi-Orthogonality**βoften called the "Blessing of Dimensionality"βwhich explains how models store independent semantic concepts in perpendicular directions.
* **[01_list_math.exs](file:///home/home/p/g/n/ml_musings/trinity_foundations_elixir/01_list_math.exs)**: *CPU Linked-List Baseline*. Demonstrates why standard Elixir lists (singly-linked lists with pointer-chasing overhead and lack of memory contiguity) are computationally prohibitive for machine learning.
* **[02_tensor_math.exs](file:///home/home/p/g/n/ml_musings/trinity_foundations_elixir/02_tensor_math.exs)**: *Eager Device Tensors*. Benchmarks contiguous, binary-backed tensors on CPU and GPU, highlighting the orders-of-magnitude speedup achieved by hardware acceleration.
* **[03_compiler.exs](file:///home/home/p/g/n/ml_musings/trinity_foundations_elixir/03_compiler.exs)**: *JIT Graph Compilation*. Explains the difference between eager Elixir tensor evaluation and JIT-compiled computation graphs (`defn` with the `EXLA` backend) which compile directly into LLVM/PTX assembly.
* **[04_dot_product.exs](file:///home/home/p/g/n/ml_musings/trinity_foundations_elixir/04_dot_product.exs)**: *Cosine Similarity Mechanics*. Implements vector L2-normalization and dot-product similarity, showing how vector angles represent semantic alignment.
* **[quasi_orthogonality.exs](file:///home/home/p/g/n/ml_musings/trinity_foundations_elixir/quasi_orthogonality.exs)**: *Empirical Geometry Simulation*. Randomly samples pairs of unit vectors across varying dimensions (from $D=2$ to $D=8192$) and plots their similarity distributions using a custom **ASCII 95% Dispersion Bar** to visually witness the contraction of similarity density towards exactly $0.0$ (perpendicularity).
* **[hoeffding_bound.exs](file:///home/home/p/g/n/ml_musings/trinity_foundations_elixir/hoeffding_bound.exs)**: *Mathematical Bound Validation*. Compares empirical vector similarity distributions against the theoretical exponential decay bounds calculated via **Hoeffding's Inequality**.
* **[lesson_1_notes.txt](file:///home/home/p/g/n/ml_musings/trinity_foundations_elixir/lesson_1_notes.txt)**: *Theoretical Notes*. Mathematical derivations of quasi-orthogonality, measure concentration, and the Curse vs. Blessing of Dimensionality.
---
### π Lesson 2: Linear Probing and Representational Geometry
Explores how semantic information is represented inside an LLM's activation space, and how we can extract that information using a **Linear Probe**βa simple, non-destructive classifier that learns a separating hyperplane.
* **[05_linear_probe.exs](file:///home/home/p/g/n/ml_musings/trinity_foundations_elixir/05_linear_probe.exs)**: *Differentiable Logistic Classifier*. Implements a binary linear probe from scratch. Uses **Automatic Differentiation (`Nx.grad/2`)** and batch gradient descent on the GPU to learn a separating hyperplane that extracts concept vectors (classifying "Math" vs. "Writing" embeddings). The latest run reaches ~99.6% validation accuracy (Wilson 95% CI [98.55%, 99.89%], N=500); see `05_linear_probe.exs` for the exact print.
* **[lesson_2_notes.txt](file:///home/home/p/g/n/ml_musings/trinity_foundations_elixir/lesson_2_notes.txt)**: *Conceptual Notes*. Deep dive into embedding lookups, the Manifold Hypothesis in deep learning, coordinate systems, and the geometric meaning of separating hyperplanes.
---
### π Lesson 3: The Self-Attention Mechanism (Queries, Keys, and Values)
De-magic-ifies the core engine of modern Transformers by implementing a fully functional, differentiable Self-Attention head from scratch, framing it as an **Information Routing System**.
* **[06_self_attention.exs](file:///home/home/p/g/n/ml_musings/trinity_foundations_elixir/06_self_attention.exs)**: *Attention Routing Head*. Implements the Query, Key, and Value ($Q, K, V$) projections, the scaled dot-product similarity matrix, stable softmax, and attention routing weights to update token representations dynamically.
* **[07_softmax_collapse.exs](file:///home/home/p/g/n/ml_musings/trinity_foundations_elixir/07_softmax_collapse.exs)**: *Softmax Entropy & Numerical Stability*. Illustrates why the division by the scaling factor $\sqrt{D_k}$ is mathematically necessary. Geometrically demonstrates **Softmax Collapse** (vanishing gradients/over-saturation) in high-dimensional attention and evaluates numerical overflow behaviors under extreme logit distributions.
* **[lesson_3_notes.txt](file:///home/home/p/g/n/ml_musings/trinity_foundations_elixir/lesson_3_notes.txt)**: *Attention Physics*. Detailed notes on semantic vector shifting, entropy dynamics, and mathematical proofs of softmax scale-invariance.
---
### π Lesson 4: Parameter-Efficient Adaptation (PEFT, Rank, SVD, and LoRA)
Explores how to fine-tune massive pre-trained model weights efficiently. We perform surgery on weight matrices using **Singular Value Decomposition (SVD)** and implement a low-rank bypass adapter (LoRA) from scratch.
* **[08_lora_and_svd.exs](file:///home/home/p/g/n/ml_musings/trinity_foundations_elixir/08_lora_and_svd.exs)**: *SVD Surgery & LoRA Layer*. Decomposes a redundant weight matrix on the GPU, truncates it to Rank-1, and implements the parallel forward pass of a **Low-Rank Adaptation (LoRA)** layer. The code uses row-batched inputs (one token per row of $X$), so the implementation expresses the Hu et al. (2021) update $h = W_0 x + (\alpha/r) \, B A x$ as $Y = X W_0^\top + (\alpha/r) \, X A^\top B^\top$.
* **[09_non_redundant_compression.exs](file:///home/home/p/g/n/ml_musings/trinity_foundations_elixir/09_non_redundant_compression.exs)**: *Lossy Compression Benchmark*. Generates a full-rank, non-redundant random matrix, collapses it down to Rank-1 using SVD, and calculates exact information loss metrics using **Mean Squared Error (MSE)** and **Frobenius Norm** reconstruction errors on the device.
* **[lesson_4_notes.txt](file:///home/home/p/g/n/ml_musings/trinity_foundations_elixir/lesson_4_notes.txt)**: *Comparative Notes*. Compares LoRA's low-rank coordinate bypass with **Singular Value Fine-Tuning (SVF)**. Explains why adapting *only* diagonal singular values ($\Sigma$) achieves zero inference computational overhead.
---
### π² Lesson 5: Black-Box Optimization (The Evolution Strategy)
Explores optimization landscapes where backpropagation is mathematically impossible, such as coordinating and routing between external, isolated APIs (Claude, Gemini, GPT-4) across non-differentiable network boundaries.
* **[10_evolution_strategy.exs](file:///home/home/p/g/n/ml_musings/trinity_foundations_elixir/10_evolution_strategy.exs)**: *GPU-Compiled Sphere Optimizer*. Implements an **Evolution Strategy (ES)** from scratch in Elixir. Optimizes a noisy, 2D Sphere landscape $f(x_1, x_2) = x_1^2 + x_2^2 + \text{noise}$ to find the origin $[0.0, 0.0]$ using a population of stochastic "scout" mutations and weighted recombinations, completely bypassing gradients.
* **[11_rosenbrock_es.exs](file:///home/home/p/g/n/ml_musings/trinity_foundations_elixir/11_rosenbrock_es.exs)**: *Rosenbrock Curved Valley Solver*. Tests the ES optimizer on the highly non-separable, curved **Rosenbrock function** (the "banana function") to demonstrate how population-level average scout vectors navigate steep, narrow valleys.
* **[lesson_5_notes.txt](file:///home/home/p/g/n/ml_musings/trinity_foundations_elixir/lesson_5_notes.txt)**: *API Boundary Coordination Notes*. Documents the three factors that block backpropagation across third-party models (Weight Isolation, Discrete Token Discontinuity, and Network Socket Barriers). Explores the mathematical architecture of **`sep-CMA-ES`** used in TRINITY and why diagonal covariance scaling achieves $O(D)$ linear complexity.
### π§© Lesson 6: Mixture of Experts (MoE) & Gating Load Balancing
This lesson addresses efficient model scaling using sparse parallel "expert" neural networks. It covers routing activations and introduces load-balancing techniques to prevent GPU bottlenecks and coordinate API delegation.
* **[12_moe_gating.exs](file:///home/home/p/g/n/ml_musings/trinity_foundations_elixir/12_moe_gating.exs)**: *MoE Router & Load Loss*. Implements a Top-1 sparse routing gating projection and compiles an **Auxiliary Load-Balancing Loss** from scratch on the GPU, evaluating collapsed vs. balanced routing distributions.
* **[13_loss_curve.exs](file:///home/home/p/g/n/ml_musings/trinity_foundations_elixir/13_loss_curve.exs)**: *Gating Imbalance Penalty Curve*. Simulates multiple token load distributions across 4 parallel experts and calculates simulated auxiliary losses on the GPU, printing a custom ASCII penalty bar chart to visualize the skew penalty.
* **[lesson_6_notes.txt](file:///home/home/p/g/n/ml_musings/trinity_foundations_elixir/lesson_6_notes.txt)**: *API System Balance Notes*. Compares intra-model FFN block routing with TRINITY's external API coordination, documenting the critical operational, financial, and rate-limiting impacts of Expert Collapse in multi-agent environments.
---
### ποΈ Lesson 7: Capstone - The Mini-TRINITY Framework
This capstone project compiles all foundational components developed in Lessons 1β6 into a stateful, governed execution loop substrate. It implements a closed-loop control write-path routing and executing intents with runtime coordinate warping for dynamic model escalation.
* **[14_mini_trinity.exs](file:///home/home/p/g/n/ml_musings/14_mini_trinity.exs)**: *The Mini-TRINITY Substrate*. Integrates JIT-compiled routing projections (`defn` + `stable_softmax`), diverse mock expert models, custom semantic verification sensors, and control-loop representation warping to adjust routing vectors on expert execution failures.
---
## β‘ Prerequisites & System Installation
Ensure Erlang, Elixir, and the required CUDA drivers are installed on your Linux system.
```bash
# Update Ubuntu package lists and install Elixir + Erlang BEAM VM
sudo apt-get update
sudo apt-get install -y erlang elixir
# Verify Elixir installation
elixir --version
```
### Hex Package Management
All scripts leverage Elixir's runtime package installer (`Mix.install/2`) to pull the absolute latest stable versions of Numerical Elixir and its XLA compiler bindings directly from Hex:
* `nx ~> 0.12.0`
* `exla ~> 0.12.0`
---
## π Execution Guide
Run any of the curriculum scripts directly from your bash terminal.
### Running with Native NVIDIA CUDA GPU Acceleration
To compile computation graphs directly into optimized CUDA GPU kernels on your **NVIDIA GeForce RTX 5060 Ti**, set the `XLA_TARGET` environment variable before executing:
```bash
# Execute Lesson 1 empirical simulations
XLA_TARGET=cuda12 elixir quasi_orthogonality.exs
XLA_TARGET=cuda12 elixir hoeffding_bound.exs
# Run the JIT compiler benchmark
XLA_TARGET=cuda12 elixir 03_compiler.exs
# Train the Linear Probe classifier
XLA_TARGET=cuda12 elixir 05_linear_probe.exs
# Run the Self-Attention Head implementation
XLA_TARGET=cuda12 elixir 06_self_attention.exs
# Execute LoRA low-rank adaptation
XLA_TARGET=cuda12 elixir 08_lora_and_svd.exs
# Run Lesson 5 Evolution Strategy Optimizers
XLA_TARGET=cuda12 elixir 10_evolution_strategy.exs
XLA_TARGET=cuda12 elixir 11_rosenbrock_es.exs
# Run Lesson 6 Mixture of Experts Router & Loss Curve
XLA_TARGET=cuda12 elixir 12_moe_gating.exs
XLA_TARGET=cuda12 elixir 13_loss_curve.exs
# Run Lesson 7 Capstone Mini-TRINITY Framework
XLA_TARGET=cuda12 elixir 14_mini_trinity.exs
```
### Running on CPU (Fallback)
If a CUDA GPU is not available, EXLA will automatically fall back to CPU compilation, or you can run using standard native CPU execution by omitting `XLA_TARGET`:
```bash
elixir 10_evolution_strategy.exs
elixir 13_mini_trinity.exs
```
---
## π¬ Key Pedagogical Highlights
1. **Hardware-Level Contiguity**: Understand the transition from Elixir's heap-allocated linked lists to flat, binary-backed hardware buffers, speeding up basic matrix multiplications from **30+ ms** to **0.5 ms**.
2. **No Arbitrary Placeholders**: Every script is self-contained, using real statistical distributions, empirical bounds checking, and rigorous loss metrics.
3. **Automatic Differentiation**: See how `Nx.grad` automatically traverses computation graphs, compiling backpropagation steps to parallelized hardware kernels.
4. **Black-Box Robustness**: Witness how Evolution Strategies filter out heavy evaluation noise ($\sigma = 0.1$ random variance) through population-level mean recombinations, converging reliably where traditional gradient descent would stall.