{"id":31238628,"url":"https://github.com/tekaratzas/RustGPT","last_synced_at":"2025-09-22T19:02:34.878Z","repository":{"id":314687508,"uuid":"1056341464","full_name":"tekaratzas/RustGPT","owner":"tekaratzas","description":"An transformer based LLM. Written completely in Rust","archived":false,"fork":false,"pushed_at":"2025-09-14T04:41:52.000Z","size":62,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-09-14T05:43:01.288Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/tekaratzas.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-09-13T22:05:55.000Z","updated_at":"2025-09-14T04:41:55.000Z","dependencies_parsed_at":"2025-09-14T05:43:06.059Z","dependency_job_id":"966b1802-e43c-4acc-9915-57bb91404aab","html_url":"https://github.com/tekaratzas/RustGPT","commit_stats":null,"previous_names":["tekaratzas/rustgpt"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/tekaratzas/RustGPT","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tekaratzas%2FRustGPT","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tekaratzas%2FRustGPT/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tekaratzas%2FRustGPT/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tekaratzas%2FRustGPT/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/tekaratzas","download_url":"https://codeload.github.com/tekaratzas/RustGPT/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tekaratzas%2FRustGPT/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":276457863,"owners_count":25646016,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-09-22T02:00:08.972Z","response_time":79,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-09-22T19:01:24.481Z","updated_at":"2025-09-22T19:02:34.866Z","avatar_url":"https://github.com/tekaratzas.png","language":"Rust","funding_links":[],"categories":["Rust","A01_文本生成_文本对话"],"sub_categories":["大语言对话模型及数据"],"readme":"# 🦀 Rust LLM from Scratch\n\n[![Rust](https://github.com/tekaratzas/RustGPT/actions/workflows/rust.yml/badge.svg)](https://github.com/tekaratzas/RustGPT/actions/workflows/rust.yml)\n\nhttps://github.com/user-attachments/assets/ec4a4100-b03a-4b3c-a7d6-806ea54ed4ed\n\nA complete **Large Language Model implementation in pure Rust** with no external ML frameworks. Built from the ground up using only `ndarray` for matrix operations.\n\n## 🚀 What This Is\n\nThis project demonstrates how to build a transformer-based language model from scratch in Rust, including:\n- **Pre-training** on factual text completion\n- **Instruction tuning** for conversational AI\n- **Interactive chat mode** for testing\n- **Full backpropagation** with gradient clipping\n- **Modular architecture** with clean separation of concerns\n\n## ❌ What This Isn't\n\nThis is not a production grade LLM. It is so far away from the larger models.\n\nThis is just a toy project that demonstrates how these models work under the hood.\n\n## 🔍 Key Files to Explore\n\nStart with these two core files to understand the implementation:\n\n- **[`src/main.rs`](src/main.rs)** - Training pipeline, data preparation, and interactive mode\n- **[`src/llm.rs`](src/llm.rs)** - Core LLM implementation with forward/backward passes and training logic\n\n## 🏗️ Architecture\n\nThe model uses a **transformer-based architecture** with the following components:\n\n```\nInput Text → Tokenization → Embeddings → Transformer Blocks → Output Projection → Predictions\n```\n\n### Project Structure\n\n```\nsrc/\n├── main.rs              # 🎯 Training pipeline and interactive mode\n├── llm.rs               # 🧠 Core LLM implementation and training logic\n├── lib.rs               # 📚 Library exports and constants\n├── transformer.rs       # 🔄 Transformer block (attention + feed-forward)\n├── self_attention.rs    # 👀 Multi-head self-attention mechanism\n├── feed_forward.rs      # ⚡ Position-wise feed-forward networks\n├── embeddings.rs        # 📊 Token embedding layer\n├── output_projection.rs # 🎰 Final linear layer for vocabulary predictions\n├── vocab.rs            # 📝 Vocabulary management and tokenization\n├── layer_norm.rs       # 🧮 Layer normalization\n└── adam.rs             # 🏃 Adam optimizer implementation\n\ntests/\n├── llm_test.rs         # Tests for core LLM functionality\n├── transformer_test.rs # Tests for transformer blocks\n├── self_attention_test.rs # Tests for attention mechanisms\n├── feed_forward_test.rs # Tests for feed-forward layers\n├── embeddings_test.rs  # Tests for embedding layers\n├── vocab_test.rs       # Tests for vocabulary handling\n├── adam_test.rs        # Tests for optimizer\n└── output_projection_test.rs # Tests for output layer\n```\n\n## 🧪 What The Model Learns\n\nThe implementation includes two training phases:\n\n1. **Pre-training**: Learns basic world knowledge from factual statements\n   - \"The sun rises in the east and sets in the west\"\n   - \"Water flows downhill due to gravity\"\n   - \"Mountains are tall and rocky formations\"\n\n2. **Instruction Tuning**: Learns conversational patterns\n   - \"User: How do mountains form? Assistant: Mountains are formed through tectonic forces...\"\n   - Handles greetings, explanations, and follow-up questions\n\n## 🚀 Quick Start\n\n```bash\n# Clone and run\ngit clone https://github.com/tekaratzas/RustGPT.git\ncd RustGPT\ncargo run\n\n# The model will:\n# 1. Build vocabulary from training data\n# 2. Pre-train on factual statements (100 epochs)\n# 3. Instruction-tune on conversational data (100 epochs)\n# 4. Enter interactive mode for testing\n```\n\n## 🎮 Interactive Mode\n\nAfter training, test the model interactively:\n\n```\nEnter prompt: How do mountains form?\nModel output: Mountains are formed through tectonic forces or volcanism over long geological time periods\n\nEnter prompt: What causes rain?\nModel output: Rain is caused by water vapor in clouds condensing into droplets that become too heavy to remain airborne\n```\n\n## 🧮 Technical Implementation\n\n### Model Configuration\n- **Vocabulary Size**: Dynamic (built from training data)\n- **Embedding Dimension**: 128 (defined by `EMBEDDING_DIM` in `src/lib.rs`)\n- **Hidden Dimension**: 256 (defined by `HIDDEN_DIM` in `src/lib.rs`)\n- **Max Sequence Length**: 80 tokens (defined by `MAX_SEQ_LEN` in `src/lib.rs`)\n- **Architecture**: 3 Transformer blocks + embeddings + output projection\n\n### Training Details\n- **Optimizer**: Adam with gradient clipping\n- **Pre-training LR**: 0.0005 (100 epochs)\n- **Instruction Tuning LR**: 0.0001 (100 epochs)\n- **Loss Function**: Cross-entropy loss\n- **Gradient Clipping**: L2 norm capped at 5.0\n\n### Key Features\n- **Custom tokenization** with punctuation handling\n- **Greedy decoding** for text generation\n- **Gradient clipping** for training stability\n- **Modular layer system** with clean interfaces\n- **Comprehensive test coverage** for all components\n\n## 🔧 Development\n\n```bash\n# Run all tests\ncargo test\n\n# Test specific components\ncargo test --test llm_test\ncargo test --test transformer_test\ncargo test --test self_attention_test\n\n# Build optimized version\ncargo build --release\n\n# Run with verbose output\ncargo test -- --nocapture\n```\n\n## 🧠 Learning Resources\n\nThis implementation demonstrates key ML concepts:\n- **Transformer architecture** (attention, feed-forward, layer norm)\n- **Backpropagation** through neural networks\n- **Language model training** (pre-training + fine-tuning)\n- **Tokenization** and vocabulary management\n- **Gradient-based optimization** with Adam\n\nPerfect for understanding how modern LLMs work under the hood!\n\n## 📊 Dependencies\n\n- `ndarray` - N-dimensional arrays for matrix operations\n- `rand` + `rand_distr` - Random number generation for initialization\n\nNo PyTorch, TensorFlow, or Candle - just pure Rust and linear algebra!\n\n## 🤝 Contributing\n\nContributions are welcome! This project is perfect for learning and experimentation.\n\n### High Priority Features Needed\n- **🏪 Model Persistence** - Save/load trained parameters to disk (currently all in-memory)\n- **⚡ Performance optimizations** - SIMD, parallel training, memory efficiency\n- **🎯 Better sampling** - Beam search, top-k/top-p, temperature scaling\n- **📊 Evaluation metrics** - Perplexity, benchmarks, training visualizations\n\n### Areas for Improvement\n- **Advanced architectures** (multi-head attention, positional encoding, RoPE)\n- **Training improvements** (different optimizers, learning rate schedules, regularization)\n- **Data handling** (larger datasets, tokenizer improvements, streaming)\n- **Model analysis** (attention visualization, gradient analysis, interpretability)\n\n### Getting Started\n1. Fork the repository\n2. Create a feature branch: `git checkout -b feature/model-persistence`\n3. Make your changes and add tests\n4. Run the test suite: `cargo test`\n5. Submit a pull request with a clear description\n\n### Code Style\n- Follow standard Rust conventions (`cargo fmt`)\n- Add comprehensive tests for new features\n- Update documentation and README as needed\n- Keep the \"from scratch\" philosophy - avoid heavy ML dependencies\n\n### Ideas for Contributions\n- 🚀 **Beginner**: Model save/load, more training data, config files\n- 🔥 **Intermediate**: Beam search, positional encodings, training checkpoints\n- ⚡ **Advanced**: Multi-head attention, layer parallelization, custom optimizations\n\nQuestions? Open an issue or start a discussion!\n\nNo PyTorch, TensorFlow, or Candle - just pure Rust and linear algebra!\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftekaratzas%2FRustGPT","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ftekaratzas%2FRustGPT","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftekaratzas%2FRustGPT/lists"}