{"id":50979530,"url":"https://github.com/jayemscript/llm-systems-from-scratch","last_synced_at":"2026-06-19T12:34:30.561Z","repository":{"id":361770766,"uuid":"1255750683","full_name":"jayemscript/llm-systems-from-scratch","owner":"jayemscript","description":"A hands-on learning project for building the core systems behind Large Language Models using C++, Rust, and optional Python/JavaScript bindings. Includes tensor operations, autograd, neural networks, tokenization, and a minimal transformer pipeline.","archived":false,"fork":false,"pushed_at":"2026-06-01T07:29:41.000Z","size":15,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-06-01T08:21:54.999Z","etag":null,"topics":["ai-systems","autograd","c-language","cpp","cuda","educational-project","high-performance-computing","inference-engine","machine-learning","neural-networks-from-scratch","pybind11","tensor-library","tokenization","transformers","wasm"],"latest_commit_sha":null,"homepage":"","language":"Shell","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/jayemscript.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":"SECURITY.md","support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-06-01T06:23:33.000Z","updated_at":"2026-06-01T07:29:45.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/jayemscript/llm-systems-from-scratch","commit_stats":null,"previous_names":["jayemscript/llm-systems-from-scratch"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/jayemscript/llm-systems-from-scratch","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jayemscript%2Fllm-systems-from-scratch","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jayemscript%2Fllm-systems-from-scratch/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jayemscript%2Fllm-systems-from-scratch/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jayemscript%2Fllm-systems-from-scratch/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/jayemscript","download_url":"https://codeload.github.com/jayemscript/llm-systems-from-scratch/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jayemscript%2Fllm-systems-from-scratch/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":34532256,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-19T02:00:06.005Z","response_time":61,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai-systems","autograd","c-language","cpp","cuda","educational-project","high-performance-computing","inference-engine","machine-learning","neural-networks-from-scratch","pybind11","tensor-library","tokenization","transformers","wasm"],"created_at":"2026-06-19T12:34:29.951Z","updated_at":"2026-06-19T12:34:30.550Z","avatar_url":"https://github.com/jayemscript.png","language":"Shell","funding_links":[],"categories":[],"sub_categories":[],"readme":"# LLM Systems From Scratch\n\n\u003e **This is an open, educational repository.**\n\u003e It is not a software product. It is not a deployable system.\n\u003e It is a structured, hands-on learning project — open to everyone.\n\nThis repository exists to help **students, engineers, and developers** deeply understand how Large Language Models work — not by using them, but by building every core function and algorithm they depend on, from scratch, in **C++ and Rust**.\n\nNo cloud. No APIs. No black boxes. Just code and fundamentals.\n\n---\n\n## Who This Is For\n\n- Students studying machine learning, systems programming, or AI\n- Engineers who want to understand what actually happens inside an LLM\n- Developers learning C++ or Rust through a meaningful, real-world problem domain\n- Anyone curious enough to go deeper than the surface\n\n---\n\n## What This Is Not\n\n- This is **not** a language model\n- This is **not** production software\n- This is **not** a framework or library meant for deployment\n- This is **not** affiliated with any AI company or product\n\n---\n\n## What This Is\n\nA **complete map of every core module an LLM is built on** — implemented one phase at a time, from the lowest level up.\n\nEvery phase = one core module. Every module = one concept that a real language model cannot exist without.\n\nBy the end, you will have built from scratch:\n\n```\nTensors → Math → Autograd → Neural Layers → Embeddings →\nTokenizer → Positional Encoding → Attention → Transformer →\nNormalization → Feed-Forward → Sampling → Inference Engine →\nPython Bindings → Rust Memory Layer\n```\n\n---\n\n## Language Stack\n\n| Layer | Language | Purpose |\n|---|---|---|\n| Core engine | C++ | All fundamental algorithms |\n| Memory safety layer | Rust (via FFI) | Specific subsystems plugged into C++ |\n| Bindings | Python / JavaScript | Expose C++ engine for scripting and web |\n\n\u003e Rust does **not** replace C++. It extends it. Rust is used for specific components — memory management, safe concurrent pipelines, inference serving — and plugged into the C++ engine via FFI (Foreign Function Interface).\n\n---\n\n## Core Modules — Phase by Phase\n\nEach phase introduces one core module. That module is a real, named concept inside a real language model architecture.\n\n---\n\n### Phase 1 — Tensor\n\n**Module: Tensor Operations**\n\u003e How LLMs store and compute on data\n\nEverything in a language model is a number in a multidimensional array. This phase builds the foundational data structure.\n\n- N-dimensional array (Tensor class)\n- Shape and stride tracking\n- Element-wise operations: add, subtract, multiply\n- Memory layout (row-major)\n\n---\n\n### Phase 2 — Linear Algebra Engine\n\n**Module: Matrix Multiplication**\n\u003e The single most-used operation in all of deep learning\n\nEvery layer in a transformer — attention, projection, feed-forward — is a matrix multiplication. This phase builds and optimizes it.\n\n- Naive matrix multiplication (baseline)\n- Cache-optimized version (loop tiling)\n- Multithreaded version (`std::thread` / OpenMP)\n\n---\n\n### Phase 3 — Autograd\n\n**Module: Automatic Differentiation**\n\u003e How models learn — the algorithm behind every weight update\n\nWithout autograd, there is no training. This phase builds the computation graph and the backward pass.\n\n- Computation graph (nodes and edges)\n- Forward pass execution\n- Backward pass — gradient computation via chain rule\n\n---\n\n### Phase 4 — Neural Network Layers\n\n**Module: Layer Functions**\n\u003e The building blocks stacked inside every neural network\n\nThis phase builds composable, trainable layers on top of tensors and autograd.\n\n- Dense (fully connected) layer\n- Activation functions: ReLU, tanh, sigmoid, GELU\n- Loss functions: MSE, cross-entropy\n- Test: XOR problem, simple classification\n\n---\n\n### Phase 5 — Embeddings\n\n**Module: Embedding Layer**\n\u003e How LLMs convert token IDs into meaning\n\nTokens are integers. Embeddings turn those integers into vectors that carry semantic meaning — this is the first thing a transformer does with its input.\n\n- Embedding table (vocabulary × dimension)\n- Lookup operation\n- Gradient flow through embeddings\n\n---\n\n### Phase 6 — Tokenizer\n\n**Module: Tokenization**\n\u003e How LLMs read text — converting raw strings into token sequences\n\nLanguage models do not read characters. They read tokens — integer IDs from a fixed vocabulary. This phase builds the algorithm that produces them.\n\n- Whitespace tokenizer (baseline)\n- Byte Pair Encoding (BPE) — the core algorithm used in most real tokenizers\n- Vocabulary building\n- Encode and decode functions\n\n---\n\n### Phase 7 — Positional Encoding\n\n**Module: Positional Encoding**\n\u003e How LLMs know the order of tokens\n\nAttention has no built-in sense of position. This phase implements the algorithms that inject order information into the input sequence.\n\n- Sinusoidal positional encoding (original transformer)\n- Learned positional embeddings\n- Rotary Positional Encoding (RoPE) — used in modern architectures\n\n---\n\n### Phase 8 — Attention\n\n**Module: Attention Mechanism**\n\u003e The core algorithm of transformer-based architectures\n\nThis is the mechanism that allows a model to relate any token to any other token in a sequence, regardless of distance.\n\n- Q, K, V projections (linear transforms)\n- Scaled dot-product attention\n- Softmax\n- Causal (masked) attention — for autoregressive generation\n- Multi-head attention\n\n---\n\n### Phase 9 — Normalization\n\n**Module: Layer Normalization**\n\u003e How LLMs stay stable during training and inference\n\nWithout normalization, deep networks become numerically unstable. This phase implements the normalization used inside every transformer block.\n\n- Layer normalization (LayerNorm)\n- RMS normalization (RMSNorm) — used in modern architectures\n- Learnable scale and shift parameters (gamma, beta)\n\n---\n\n### Phase 10 — Feed-Forward Network\n\n**Module: Feed-Forward Block**\n\u003e The computation that follows attention in every transformer layer\n\nEvery transformer block has two parts: attention and a feed-forward network. This phase builds the FFN.\n\n- Two-layer MLP with activation\n- GELU activation function\n- Projection in → hidden → projection out\n- Role in transformer block (context: runs after attention)\n\n---\n\n### Phase 11 — Transformer Block\n\n**Module: Transformer Block**\n\u003e One complete unit of a transformer — assembled from all prior modules\n\nThis phase assembles the full transformer block from the components built in previous phases.\n\n- Pre-norm architecture (LayerNorm → Attention → residual)\n- Feed-forward sublayer (LayerNorm → FFN → residual)\n- Residual connections\n- Stacking N blocks into a full transformer backbone\n\n---\n\n### Phase 12 — Sampling \u0026 Output Head\n\n**Module: Sampling Algorithms**\n\u003e How LLMs decide what token to generate next\n\nAfter the transformer runs, a probability distribution over the vocabulary is produced. This phase implements the algorithms that select the next token from it.\n\n- Linear projection to vocabulary (logits)\n- Softmax to probability distribution\n- Greedy decoding\n- Temperature scaling\n- Top-k sampling\n- Top-p (nucleus) sampling\n\n---\n\n### Phase 13 — Inference Engine\n\n**Module: Inference Runtime**\n\u003e The execution engine — how a trained model runs efficiently\n\nThis phase builds the runtime that loads weights and executes the full forward pass, without any training overhead.\n\n- Weight serialization and loading from file\n- Full forward pass pipeline\n- KV cache — stores past key/value pairs to avoid recomputation\n- Token generation loop (autoregressive decoding)\n\n---\n\n### Phase 14 — Language Bindings\n\n**Module: Python \u0026 JavaScript Bindings**\n\u003e Exposing the C++ engine to higher-level languages\n\nThis phase wraps the C++ engine so it can be used from Python scripts and web environments.\n\n**Python (pybind11):**\n```cpp\n// C++ side\nTensor matmul(Tensor a, Tensor b);\n```\n```python\n# Python side\nimport llmsys\nllmsys.matmul(a, b)\n```\n\n**JavaScript / Web:**\n- Compile C++ to WebAssembly (WASM) for in-browser execution\n- Or expose a lightweight REST API\n\n---\n\n### Phase 15 — Rust Memory Layer\n\n**Module: Rust FFI Components**\n\u003e Precision components in Rust, plugged into the C++ engine\n\nRust does not replace C++. This phase builds specific subsystems in Rust where its ownership model and memory safety give a concrete advantage — then connects them to the C++ engine via FFI.\n\n- Safe memory allocator (plugged into C++ tensor operations)\n- Concurrent data pipeline (safe multithreaded loading)\n- Inference server (async, no GC pauses, production-grade serving)\n- FFI interface layer between Rust and C++\n\nExplore: [`burn`](https://github.com/tracel-ai/burn), [`candle`](https://github.com/huggingface/candle), [`tch-rs`](https://github.com/LaurentMazare/tch-rs)\n\n---\n\n## Full Module Map\n\n| Phase | Module | Concept in LLM |\n|---|---|---|\n| 1 | Tensor | Data storage and computation |\n| 2 | Matrix Multiplication | Core math operation |\n| 3 | Autograd | How models learn |\n| 4 | Neural Network Layers | Trainable building blocks |\n| 5 | Embeddings | Token → vector conversion |\n| 6 | Tokenizer | Text → token IDs |\n| 7 | Positional Encoding | Token order awareness |\n| 8 | Attention | Token-to-token relationships |\n| 9 | Normalization | Training and inference stability |\n| 10 | Feed-Forward Network | Per-token computation after attention |\n| 11 | Transformer Block | Full assembled unit |\n| 12 | Sampling \u0026 Output Head | Next-token selection |\n| 13 | Inference Engine | Runtime execution |\n| 14 | Language Bindings | Python / JS interface |\n| 15 | Rust Memory Layer | Safe, high-performance components via FFI |\n\n---\n\n## Open Collaboration\n\nThis repository is **open for collaboration**. Everyone is welcome to contribute — whether you are a student, a researcher, a hobbyist, or a professional engineer.\n\nWays to contribute:\n\n- Implement a module in a phase\n- Write tests for an existing implementation\n- Improve documentation or add explanations\n- Add alternative implementations (different algorithms, different approaches)\n- Fix bugs or improve performance\n\nPlease read [CONTRIBUTING.md](./CONTRIBUTING.md) before opening a pull request.\nPlease read [CODE_OF_CONDUCT.md](./CODE_OF_CONDUCT.md) for community standards.\n\n---\n\n## Requirements\n\n- C++17 or later\n- Rust (stable toolchain) — for Phase 15\n- No cloud accounts required\n- No paid APIs required\n- No GPU required (CPU-only by design for learning purposes)\n\n---\n\n## License\n\nMIT — free to use, study, fork, and build upon.\n\n---\n\n## A Note on Purpose\n\nThis project exists because understanding AI at the source level matters.\n\nThe goal is not to compete with existing frameworks. The goal is to make the internals legible — so that anyone who works through these phases comes away with a genuine understanding of what a language model is made of, one algorithm at a time.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjayemscript%2Fllm-systems-from-scratch","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fjayemscript%2Fllm-systems-from-scratch","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjayemscript%2Fllm-systems-from-scratch/lists"}