{"id":46613018,"url":"https://github.com/michaelellis003/lmt","last_synced_at":"2026-03-07T19:01:45.540Z","repository":{"id":304889538,"uuid":"1017716521","full_name":"michaelellis003/LMT","owner":"michaelellis003","description":"PyTorch implementation of transformer-based language models (GPT) for pretraining and fine-tuning","archived":false,"fork":false,"pushed_at":"2026-03-06T03:31:19.000Z","size":1219,"stargazers_count":0,"open_issues_count":7,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-03-06T05:33:09.193Z","etag":null,"topics":["deep-learning","fine-tuning","gpt","language-model","machine-learning","pretraining","python","pytorch","transformer"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/michaelellis003.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-07-11T01:52:37.000Z","updated_at":"2026-03-06T03:31:23.000Z","dependencies_parsed_at":"2025-09-17T03:14:16.537Z","dependency_job_id":"19e3b11b-4e45-4995-b5b5-e45bfe0a5b74","html_url":"https://github.com/michaelellis003/LMT","commit_stats":null,"previous_names":["michaelellis003/languagemodeling","michaelellis003/lmt"],"tags_count":13,"template":false,"template_full_name":"LikeliLab/python-package-template","purl":"pkg:github/michaelellis003/LMT","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/michaelellis003%2FLMT","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/michaelellis003%2FLMT/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/michaelellis003%2FLMT/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/michaelellis003%2FLMT/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/michaelellis003","download_url":"https://codeload.github.com/michaelellis003/LMT/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/michaelellis003%2FLMT/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":30226766,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-03-07T19:01:10.287Z","status":"ssl_error","status_checked_at":"2026-03-07T18:59:58.103Z","response_time":53,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["deep-learning","fine-tuning","gpt","language-model","machine-learning","pretraining","python","pytorch","transformer"],"created_at":"2026-03-07T19:01:41.612Z","updated_at":"2026-03-07T19:01:45.528Z","avatar_url":"https://github.com/michaelellis003.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Language Modeling using Transformers (LMT)\n\n[![CI](https://github.com/michaelellis003/LMT/actions/workflows/ci-cd.yml/badge.svg)](https://github.com/michaelellis003/LMT/actions/workflows/ci-cd.yml)\n[![Python](https://img.shields.io/badge/python-3.11%2B-blue.svg)](https://www.python.org/downloads/)\n[![PyTorch](https://img.shields.io/badge/PyTorch-2.7%2B-red.svg)](https://pytorch.org/)\n[![License](https://img.shields.io/badge/license-Apache%202.0-green.svg)](LICENSE)\n[![Code style: ruff](https://img.shields.io/badge/code%20style-ruff-000000.svg)](https://github.com/astral-sh/ruff)\n\nAn educational PyTorch library for understanding how modern transformer architectures work -- from attention mechanisms to full language models. Every component is written to be **understood**, with clear code, detailed docstrings, and mathematical notation that maps directly to the papers.\n\n## Features\n\n**Attention Mechanisms**\n\n| Component | Description | Paper |\n|-----------|-------------|-------|\n| Multi-Head Attention | Standard scaled dot-product attention | Vaswani et al., 2017 |\n| Grouped Query Attention | Shared KV heads for efficiency | Ainslie et al., 2023 |\n| Sliding Window Attention | Local attention with fixed window | Beltagy et al., 2020 |\n| Multi-Head Latent Attention | KV compression + decoupled RoPE | DeepSeek-AI, 2024 |\n\n**Feed-Forward Networks**\n\n| Component | Description |\n|-----------|-------------|\n| SwiGLU | Gated FFN with Swish activation (LLaMA, Mixtral) |\n| Mixture of Experts | Top-k sparse routing with load balancing loss |\n\n**Other Components**: RMSNorm, Rotary Position Embedding (RoPE)\n\n**Model Architectures**\n\n| Model | Key Components |\n|-------|---------------|\n| GPT | Multi-head attention + GELU FFN + learned position embeddings |\n| LLaMA | RMSNorm + RoPE + SwiGLU + GQA |\n| Mixtral | LLaMA + MoE FFN + sliding window attention |\n\n## Installation\n\n```bash\npip install pylmt\n```\n\nOr install from source for development:\n\n```bash\ngit clone https://github.com/michaelellis003/LMT.git\ncd LMT\npip install uv\nuv sync\n```\n\n## Quick Start\n\n```python\nimport torch\nfrom lmt.models.config import ModelConfig\nfrom lmt.models.llama import LLaMA\n\nconfig = ModelConfig(\n    vocab_size=32000,\n    embed_dim=512,\n    num_heads=8,\n    num_kv_heads=4,     # GQA: 4 KV heads shared across 8 query heads\n    num_layers=6,\n    context_length=1024,\n    dropout=0.0,\n)\n\nmodel = LLaMA(config)\nx = torch.randint(0, config.vocab_size, (1, 128))\nlogits = model(x)  # [1, 128, 32000]\n```\n\n### Using Individual Layers\n\n```python\nfrom lmt.layers.attention import GroupedQueryAttention\nfrom lmt.layers.ffn import SwiGLU\nfrom lmt.layers.normalization import RMSNorm\n\nnorm = RMSNorm(d_model=512)\nattn = GroupedQueryAttention(config)\nffn = SwiGLU(d_model=512)\n\nx = torch.randn(1, 64, 512)\nx = x + attn(norm(x))  # Pre-norm attention\nx = x + ffn(norm(x))   # Pre-norm FFN\n```\n\n### Mixture of Experts\n\n```python\nfrom lmt.models.mixtral import Mixtral\n\nconfig = ModelConfig(\n    vocab_size=32000, embed_dim=512, num_heads=8,\n    num_kv_heads=4, num_layers=8,\n    context_length=2048, window_size=256, dropout=0.0,\n)\n\nmodel = Mixtral(config, num_experts=8, top_k=2)\nlogits = model(x)\naux_loss = model.aux_loss  # load balancing loss for training\n```\n\n## Project Structure\n\n```\nsrc/lmt/\n  layers/\n    attention/     # MHA, GQA, Sliding Window, MLA\n    ffn/           # SwiGLU, MoE (Router + Experts)\n    normalization/ # RMSNorm\n    positional/    # RoPE\n  models/\n    gpt/           # GPT (original decoder-only transformer)\n    llama/         # LLaMA (RMSNorm + RoPE + SwiGLU + GQA)\n    mixtral/       # Mixtral (LLaMA + MoE + sliding window)\n  training/        # Trainer, configs, dataloaders\n  tokenizer/       # BPE, naive tokenizers\n```\n\n## Development\n\n```bash\nuv run pytest tests/ -v       # Run tests (171 passing)\nuv run ruff check src/ tests/ # Lint\nuv run ruff format src/ tests/ # Format\nuv run pyright src/            # Type check\nuv run mkdocs serve            # Local docs server\n```\n\n## License\n\nApache License 2.0. See [LICENSE](LICENSE) for details.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmichaelellis003%2Flmt","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmichaelellis003%2Flmt","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmichaelellis003%2Flmt/lists"}