{"id":28815360,"url":"https://github.com/wronai/docs","last_synced_at":"2026-02-11T21:34:25.094Z","repository":{"id":296064120,"uuid":"992227432","full_name":"wronai/docs","owner":"wronai","description":"WronAI - Kompletna Dokumentacja Tworzenia LLM","archived":false,"fork":false,"pushed_at":"2025-05-28T20:27:06.000Z","size":32,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-09-14T01:56:07.181Z","etag":null,"topics":["computing","computing-science","edge","llm"],"latest_commit_sha":null,"homepage":"https://wronai.github.io/docs/","language":null,"has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/wronai.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-05-28T20:22:43.000Z","updated_at":"2025-05-29T22:48:29.000Z","dependencies_parsed_at":null,"dependency_job_id":"07c88361-eb0c-4565-a3dc-d774d15754b2","html_url":"https://github.com/wronai/docs","commit_stats":null,"previous_names":["wronai/docs"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/wronai/docs","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/wronai%2Fdocs","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/wronai%2Fdocs/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/wronai%2Fdocs/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/wronai%2Fdocs/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/wronai","download_url":"https://codeload.github.com/wronai/docs/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/wronai%2Fdocs/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":29345585,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-02-11T20:11:40.865Z","status":"ssl_error","status_checked_at":"2026-02-11T20:10:41.637Z","response_time":97,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["computing","computing-science","edge","llm"],"created_at":"2025-06-18T16:08:34.472Z","updated_at":"2026-02-11T21:34:25.088Z","avatar_url":"https://github.com/wronai.png","language":null,"readme":"# 🦅 WronAI - Kompletna Dokumentacja Tworzenia LLM\n\n## 📋 Spis Treści\n1. [Wprowadzenie i Architektura](#wprowadzenie)\n2. [Diagramy Procesu](#diagramy)\n3. [Step-by-Step Implementation](#implementation)\n4. [Źródła Danych dla ARM/Edge](#data-sources)\n5. [Optymalizacje Edge Computing](#edge-optimizations)\n6. [Integracja z Golang](#golang-integration)\n7. [Monitoring i Deployment](#monitoring)\n\n---\n\n## 🎯 Wprowadzenie i Architektura {#wprowadzenie}\n\n### Cele WronAI\n- **Język**: Specjalizacja w języku polskim\n- **Platform**: Optymalizacja dla ARM (Raspberry Pi, Jetson)\n- **Kod**: Wsparcie dla Python, Bash, SQL, Go, DSL\n- **Rozmiar**: 50M-500M parametrów (edge-friendly)\n- **Latencja**: \u003c100ms inference na RPi 4\n\n### Kluczowe Decyzje Architektoniczne\n\n| Aspekt | Wybór | Uzasadnienie |\n|--------|-------|--------------|\n| **Architektura** | Transformer Decoder-only | Standardowy wybór dla LLM |\n| **Rozmiar** | 50M-500M parametrów | Balans jakość/wydajność dla edge |\n| **Tokenizer** | SentencePiece BPE | Dobry dla polskiego + kod |\n| **Precyzja** | FP16/INT4 | Optymalizacja pamięci |\n| **Kontekst** | 2048-4096 tokenów | Wystarczający dla większości zadań |\n\n---\n\n## 📊 Diagramy Procesu {#diagramy}\n\n### 1. Ogólny Pipeline Tworzenia LLM\n\n```mermaid\ngraph TD\n    A[📚 Zbieranie Danych] --\u003e B[🔧 Preprocessing]\n    B --\u003e C[🎯 Tokenizacja]\n    C --\u003e D[🏗️ Architektura Modelu]\n    D --\u003e E[⚡ Pre-training]\n    E --\u003e F[🎨 Fine-tuning SFT]\n    F --\u003e G[🏆 RLHF/DPO]\n    G --\u003e H[📦 Kwantyzacja]\n    H --\u003e I[🚀 Deployment]\n    \n    subgraph \"Dane Źródłowe\"\n    A1[Polski Korpus\u003cbr/\u003eOSCAR, mC4]\n    A2[Kod Źródłowy\u003cbr/\u003eGitHub, Stack]\n    A3[Instrukcje\u003cbr/\u003eAlpaca, OASST]\n    end\n    \n    A1 --\u003e A\n    A2 --\u003e A\n    A3 --\u003e A\n    \n    subgraph \"Platformy Docelowe\"\n    I1[🍇 Raspberry Pi]\n    I2[🤖 Jetson Nano]\n    I3[💻 ARM Mac]\n    I4[☁️ Cloud ARM]\n    end\n    \n    I --\u003e I1\n    I --\u003e I2\n    I --\u003e I3\n    I --\u003e I4\n```\n\n### 2. Architektura Transformer dla Edge\n\n```mermaid\ngraph TB\n    subgraph \"Input Layer\"\n    TOK[Token Embeddings\u003cbr/\u003e32k vocab]\n    POS[Positional Embeddings\u003cbr/\u003eRoPE/ALiBi]\n    end\n    \n    TOK --\u003e ADD1[Add \u0026 Norm]\n    POS --\u003e ADD1\n    \n    subgraph \"Transformer Blocks (6-12x)\"\n    ADD1 --\u003e ATTN[Multi-Head Attention\u003cbr/\u003eGQA/MQA]\n    ATTN --\u003e ADD2[Add \u0026 Norm]\n    ADD2 --\u003e FFN[Feed Forward\u003cbr/\u003eSwiGLU]\n    FFN --\u003e ADD3[Add \u0026 Norm]\n    end\n    \n    ADD3 --\u003e LN[Final LayerNorm]\n    LN --\u003e OUT[Output Projection\u003cbr/\u003eWeight Tied]\n    OUT --\u003e SOFT[Softmax]\n    \n    subgraph \"Optymalizacje Edge\"\n    OPT1[📉 Grouped Query Attention]\n    OPT2[🔄 KV-Cache]\n    OPT3[⚡ Flash Attention]\n    OPT4[📦 INT4 Quantization]\n    end\n```\n\n### 3. Pipeline Danych\n\n```mermaid\nflowchart LR\n    subgraph \"Raw Data Sources\"\n    WIKI[📖 Wikipedia PL]\n    NEWS[📰 Polish News]\n    BOOKS[📚 Literature]\n    CODE[💻 GitHub Code]\n    DOCS[📄 Documentation]\n    end\n    \n    subgraph \"Preprocessing\"\n    CLEAN[🧹 Text Cleaning]\n    FILTER[🔍 Quality Filter]\n    DEDUP[🔄 Deduplication]\n    end\n    \n    subgraph \"Tokenization\"\n    SENT[📝 SentencePiece]\n    CHUNK[✂️ Chunking]\n    PACK[📦 Packing]\n    end\n    \n    subgraph \"Training Ready\"\n    TRAIN[🎯 Training Set]\n    VALID[✅ Validation Set]\n    TEST[🧪 Test Set]\n    end\n    \n    WIKI --\u003e CLEAN\n    NEWS --\u003e CLEAN\n    BOOKS --\u003e CLEAN\n    CODE --\u003e CLEAN\n    DOCS --\u003e CLEAN\n    \n    CLEAN --\u003e FILTER\n    FILTER --\u003e DEDUP\n    DEDUP --\u003e SENT\n    SENT --\u003e CHUNK\n    CHUNK --\u003e PACK\n    \n    PACK --\u003e TRAIN\n    PACK --\u003e VALID\n    PACK --\u003e TEST\n```\n\n### 4. Edge Computing Considerations\n\n```mermaid\nmindmap\n  root((Edge Computing\u003cbr/\u003eConstraints))\n    Memory\n      RAM Limit\n        512MB RPi Zero\n        1GB RPi Zero 2W\n        4-8GB RPi 4/5\n      Model Size\n        50M params = ~200MB\n        500M params = ~2GB\n      KV-Cache\n        Grows with sequence\n        Need efficient mgmt\n    \n    Compute\n      CPU Only\n        ARM Cortex-A72\n        Limited SIMD\n        No GPU acceleration\n      Power Efficiency\n        \u003c5W total system\n        Thermal throttling\n      Inference Speed\n        Target \u003c100ms\n        Batch size = 1\n    \n    Storage\n      SD Card I/O\n        Sequential better\n        Random access slow\n      Model Loading\n        GGUF format\n        Memory mapping\n      Caching Strategy\n        Frequently used layers\n```\n\n---\n\n## 🚀 Step-by-Step Implementation {#implementation}\n\n### Krok 1: Przygotowanie Środowiska\n\n```bash\n# 1.1 Podstawowe narzędzia\nsudo apt update \u0026\u0026 sudo apt upgrade -y\nsudo apt install git python3-pip cmake build-essential\n\n# 1.2 Python environment\npython3 -m venv wronai_env\nsource wronai_env/bin/activate\n\n# 1.3 PyTorch dla ARM\npip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu\n\n# 1.4 ML biblioteki\npip install transformers datasets tokenizers sentencepiece\npip install accelerate wandb huggingface_hub\npip install numpy pandas scikit-learn matplotlib seaborn\n\n# 1.5 Opcjonalne optimizacje\npip install onnx onnxruntime  # Dla ONNX inference\npip install bitsandbytes     # Dla quantization (jeśli ARM support)\n```\n\n### Krok 2: Zbieranie i Przygotowanie Danych\n\n#### 2.1 Polskie korpusy tekstowe\n```python\nfrom datasets import load_dataset\n\n# OSCAR - największy korpus polski\noscar_pl = load_dataset(\"oscar-corpus/OSCAR-2301\", \"pl\", \n                       split=\"train\", streaming=True)\n\n# mC4 - Common Crawl\nmc4_pl = load_dataset(\"mc4\", \"pl\", split=\"train\", streaming=True)\n\n# Wikipedia polska\nwiki_pl = load_dataset(\"wikipedia\", \"20231101.pl\", split=\"train\")\n\n# Literatura polska\npolish_lit = load_dataset(\"allegro/polish-literature\", split=\"train\")\n\n# Newsy polskie\npolish_news = load_dataset(\"clarin-pl/polemo2-official\", split=\"train\")\n```\n\n#### 2.2 Kod źródłowy (Python, Bash, SQL, Go)\n```python\n# The Stack - kod z GitHub\nstack_python = load_dataset(\"bigcode/the-stack\", \n                           data_dir=\"data/python\", \n                           split=\"train\", streaming=True)\n\nstack_shell = load_dataset(\"bigcode/the-stack\",\n                          data_dir=\"data/shell\",\n                          split=\"train\", streaming=True)\n\nstack_sql = load_dataset(\"bigcode/the-stack\",\n                        data_dir=\"data/sql\", \n                        split=\"train\", streaming=True)\n\nstack_go = load_dataset(\"bigcode/the-stack\",\n                       data_dir=\"data/go\",\n                       split=\"train\", streaming=True)\n\n# CodeSearchNet\ncode_search = load_dataset(\"code_search_net\", \n                          languages=[\"python\", \"go\"],\n                          split=\"train\")\n\n# GitHub Code deduplikowany\ngithub_code = load_dataset(\"codeparrot/github-code-clean\",\n                          languages=[\"Python\", \"Shell\", \"Go\", \"SQL\"],\n                          split=\"train\", streaming=True)\n```\n\n#### 2.3 Dane instrukcyjne\n```python\n# Polski Alpaca\npolish_alpaca = load_dataset(\"mikechatgpt/polish_alpaca\", split=\"train\")\n\n# OpenAssistant w języku polskim\noasst_pl = load_dataset(\"OpenAssistant/oasst1\", split=\"train\")\noasst_pl = oasst_pl.filter(lambda x: x['lang'] == 'pl')\n\n# Code Alpaca dla programowania\ncode_alpaca = load_dataset(\"sahil2801/CodeAlpaca-20k\", split=\"train\")\n\n# SQL instrukcje\nsql_instruct = load_dataset(\"b-mc2/sql-create-context\", split=\"train\")\n```\n\n### Krok 3: Preprocessing i Quality Control\n\n```python\nimport re\nfrom typing import List, Dict, Optional\n\nclass WronDataProcessor:\n    def __init__(self, min_length=50, max_length=10000):\n        self.min_length = min_length\n        self.max_length = max_length\n        \n    def clean_text(self, text: str) -\u003e Optional[str]:\n        \"\"\"Czyszczenie tekstu z artefaktów\"\"\"\n        if not text or len(text) \u003c self.min_length:\n            return None\n            \n        # Usuwanie kontrolnych znaków\n        text = re.sub(r'[\\x00-\\x08\\x0B\\x0C\\x0E-\\x1F\\x7F]', '', text)\n        \n        # Normalizacja whitespace\n        text = re.sub(r'\\s+', ' ', text)\n        text = re.sub(r'\\n\\s*\\n\\s*\\n+', '\\n\\n', text)\n        \n        # Usuwanie zbyt długich linii (spam)\n        lines = text.split('\\n')\n        filtered_lines = [line for line in lines if len(line) \u003c 1000]\n        text = '\\n'.join(filtered_lines)\n        \n        if len(text) \u003e self.max_length:\n            return None\n            \n        return text.strip()\n    \n    def filter_polish_content(self, text: str) -\u003e bool:\n        \"\"\"Sprawdzanie czy tekst jest po polsku\"\"\"\n        polish_chars = 'ąćęłńóśźż'\n        polish_count = sum(1 for c in text.lower() if c in polish_chars)\n        return polish_count / len(text) \u003e 0.01  # Próg 1%\n    \n    def classify_content_type(self, text: str) -\u003e str:\n        \"\"\"Klasyfikacja typu zawartości\"\"\"\n        if re.search(r'def\\s+\\w+\\s*\\(|import\\s+\\w+|class\\s+\\w+', text):\n            return 'python'\n        elif re.search(r'#!/bin/bash|#!/bin/sh|\\$\\{.*\\}', text):\n            return 'bash'\n        elif re.search(r'SELECT|INSERT|UPDATE|DELETE|CREATE TABLE', text, re.IGNORECASE):\n            return 'sql'\n        elif re.search(r'package\\s+main|func\\s+\\w+\\s*\\(|import\\s*\\(', text):\n            return 'golang'\n        elif self.filter_polish_content(text):\n            return 'polish_text'\n        else:\n            return 'other'\n```\n\n### Krok 4: Tokenizer Training\n\n```python\nimport sentencepiece as spm\nfrom pathlib import Path\n\ndef train_wron_tokenizer():\n    \"\"\"Trenowanie custom tokenizera dla WronAI\"\"\"\n    \n    # Przygotowanie korpusu treningowego\n    corpus_files = []\n    \n    # Sample z każdego typu danych\n    samples = {\n        'polish_text': 1000000,    # 1M próbek polskiego tekstu\n        'python': 200000,          # 200k próbek Python\n        'bash': 50000,             # 50k próbek Bash\n        'sql': 30000,              # 30k próbek SQL\n        'golang': 100000,          # 100k próbek Go\n    }\n    \n    # Tworzenie pliku treningowego\n    with open('wron_corpus.txt', 'w', encoding='utf-8') as f:\n        for data_type, count in samples.items():\n            print(f\"Collecting {count} samples of {data_type}...\")\n            # Tu by była logika zbierania próbek z datasetów\n            # f.write(sample_text + '\\n')\n    \n    # Trenowanie SentencePiece\n    spm.SentencePieceTrainer.train(\n        input='wron_corpus.txt',\n        model_prefix='wronai_tokenizer',\n        vocab_size=32000,\n        character_coverage=0.9995,\n        model_type='bpe',\n        max_sentence_length=4192,\n        shuffle_input_sentence=True,\n        \n        # Specjalne tokeny\n        user_defined_symbols=[\n            # Chat markers\n            '\u003cuser\u003e', '\u003c/user\u003e', '\u003cassistant\u003e', '\u003c/assistant\u003e',\n            # Code markers  \n            '\u003ccode\u003e', '\u003c/code\u003e', '\u003cpython\u003e', '\u003c/python\u003e',\n            '\u003cbash\u003e', '\u003c/bash\u003e', '\u003csql\u003e', '\u003c/sql\u003e', '\u003cgo\u003e', '\u003c/go\u003e',\n            # Special tokens\n            '\u003cthink\u003e', '\u003c/think\u003e', '\u003cresult\u003e', '\u003c/result\u003e',\n            # Function tokens\n            '\u003cfunc\u003e', '\u003c/func\u003e', '\u003cclass\u003e', '\u003c/class\u003e',\n            # Error handling\n            '\u003cerror\u003e', '\u003c/error\u003e', '\u003cwarning\u003e', '\u003c/warning\u003e'\n        ],\n        \n        # Polskie znaki\n        normalization_rule_name='nmt_nfkc_cf',\n        split_by_unicode_script=True,\n        split_by_whitespace=True,\n        split_by_number=True,\n        \n        # Parametry BPE\n        split_digits=True,\n        allow_whitespace_only_pieces=True,\n        byte_fallback=True,\n    )\n    \n    print(\"✅ Tokenizer trained successfully!\")\n    return 'wronai_tokenizer.model'\n```\n\n### Krok 5: Architektura Modelu\n\n```python\nimport torch\nimport torch.nn as nn\nimport torch.nn.functional as F\nimport math\nfrom typing import Optional, Tuple\n\nclass WronAIConfig:\n    \"\"\"Konfiguracja modelu WronAI\"\"\"\n    def __init__(self, \n                 vocab_size: int = 32000,\n                 d_model: int = 512,\n                 n_layers: int = 8,\n                 n_heads: int = 8,\n                 n_kv_heads: Optional[int] = None,  # Dla GQA\n                 max_seq_len: int = 2048,\n                 intermediate_size: Optional[int] = None,\n                 dropout: float = 0.0,\n                 rope_theta: float = 10000.0,\n                 layer_norm_eps: float = 1e-5):\n        \n        self.vocab_size = vocab_size\n        self.d_model = d_model\n        self.n_layers = n_layers\n        self.n_heads = n_heads\n        self.n_kv_heads = n_kv_heads or n_heads  # Default to MHA\n        self.max_seq_len = max_seq_len\n        self.intermediate_size = intermediate_size or int(d_model * 2.67)  # SwiGLU ratio\n        self.dropout = dropout\n        self.rope_theta = rope_theta\n        self.layer_norm_eps = layer_norm_eps\n        \n        # Validate GQA configuration\n        assert self.n_heads % self.n_kv_heads == 0, \"n_heads must be divisible by n_kv_heads\"\n\nclass RMSNorm(nn.Module):\n    \"\"\"Root Mean Square Layer Normalization\"\"\"\n    def __init__(self, hidden_size: int, eps: float = 1e-6):\n        super().__init__()\n        self.weight = nn.Parameter(torch.ones(hidden_size))\n        self.variance_epsilon = eps\n\n    def forward(self, hidden_states: torch.Tensor) -\u003e torch.Tensor:\n        input_dtype = hidden_states.dtype\n        hidden_states = hidden_states.to(torch.float32)\n        variance = hidden_states.pow(2).mean(-1, keepdim=True)\n        hidden_states = hidden_states * torch.rsqrt(variance + self.variance_epsilon)\n        return self.weight * hidden_states.to(input_dtype)\n\nclass RotaryPositionalEmbedding(nn.Module):\n    \"\"\"Rotary Position Embedding (RoPE)\"\"\"\n    def __init__(self, dim: int, max_seq_len: int, theta: float = 10000.0):\n        super().__init__()\n        self.dim = dim\n        self.max_seq_len = max_seq_len\n        self.theta = theta\n        \n        # Precompute frequencies\n        inv_freq = 1.0 / (theta ** (torch.arange(0, dim, 2).float() / dim))\n        self.register_buffer('inv_freq', inv_freq, persistent=False)\n        \n    def forward(self, x: torch.Tensor, seq_len: int) -\u003e Tuple[torch.Tensor, torch.Tensor]:\n        t = torch.arange(seq_len, device=x.device, dtype=self.inv_freq.dtype)\n        freqs = torch.outer(t, self.inv_freq)\n        \n        cos = freqs.cos()\n        sin = freqs.sin()\n        \n        return cos, sin\n\ndef apply_rotary_pos_emb(q: torch.Tensor, k: torch.Tensor, \n                        cos: torch.Tensor, sin: torch.Tensor) -\u003e Tuple[torch.Tensor, torch.Tensor]:\n    \"\"\"Apply rotary position embedding to query and key tensors\"\"\"\n    def rotate_half(x):\n        x1, x2 = x[..., :x.shape[-1]//2], x[..., x.shape[-1]//2:]\n        return torch.cat((-x2, x1), dim=-1)\n    \n    q_embed = (q * cos) + (rotate_half(q) * sin)\n    k_embed = (k * cos) + (rotate_half(k) * sin)\n    \n    return q_embed, k_embed\n\nclass WronAttention(nn.Module):\n    \"\"\"Multi-Head Attention with Grouped Query Attention (GQA)\"\"\"\n    def __init__(self, config: WronAIConfig):\n        super().__init__()\n        self.config = config\n        self.hidden_size = config.d_model\n        self.num_heads = config.n_heads\n        self.num_kv_heads = config.n_kv_heads\n        self.head_dim = self.hidden_size // self.num_heads\n        self.num_key_value_groups = self.num_heads // self.num_kv_heads\n        \n        # Linear projections\n        self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=False)\n        self.k_proj = nn.Linear(self.hidden_size, self.num_kv_heads * self.head_dim, bias=False)\n        self.v_proj = nn.Linear(self.hidden_size, self.num_kv_heads * self.head_dim, bias=False)\n        self.o_proj = nn.Linear(self.num_heads * self.head_dim, self.hidden_size, bias=False)\n        \n        # RoPE\n        self.rotary_emb = RotaryPositionalEmbedding(\n            self.head_dim, config.max_seq_len, config.rope_theta\n        )\n        \n    def forward(self, hidden_states: torch.Tensor, \n                attention_mask: Optional[torch.Tensor] = None,\n                kv_cache: Optional[Tuple[torch.Tensor, torch.Tensor]] = None) -\u003e torch.Tensor:\n        \n        batch_size, seq_len, _ = hidden_states.size()\n        \n        # Project to Q, K, V\n        query_states = self.q_proj(hidden_states)\n        key_states = self.k_proj(hidden_states)\n        value_states = self.v_proj(hidden_states)\n        \n        # Reshape for multi-head attention\n        query_states = query_states.view(batch_size, seq_len, self.num_heads, self.head_dim).transpose(1, 2)\n        key_states = key_states.view(batch_size, seq_len, self.num_kv_heads, self.head_dim).transpose(1, 2)\n        value_states = value_states.view(batch_size, seq_len, self.num_kv_heads, self.head_dim).transpose(1, 2)\n        \n        # Apply RoPE\n        cos, sin = self.rotary_emb(query_states, seq_len)\n        query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin)\n        \n        # Handle KV cache\n        if kv_cache is not None:\n            cache_k, cache_v = kv_cache\n            key_states = torch.cat([cache_k, key_states], dim=2)\n            value_states = torch.cat([cache_v, value_states], dim=2)\n        \n        # Repeat KV heads for GQA\n        if self.num_key_value_groups \u003e 1:\n            key_states = key_states.repeat_interleave(self.num_key_value_groups, dim=1)\n            value_states = value_states.repeat_interleave(self.num_key_value_groups, dim=1)\n        \n        # Scaled dot-product attention\n        attn_weights = torch.matmul(query_states, key_states.transpose(2, 3)) / math.sqrt(self.head_dim)\n        \n        # Apply causal mask\n        if attention_mask is not None:\n            attn_weights = attn_weights + attention_mask\n        \n        attn_weights = F.softmax(attn_weights, dim=-1)\n        attn_output = torch.matmul(attn_weights, value_states)\n        \n        # Reshape and project output\n        attn_output = attn_output.transpose(1, 2).contiguous().view(batch_size, seq_len, self.hidden_size)\n        attn_output = self.o_proj(attn_output)\n        \n        return attn_output, (key_states, value_states)\n\nclass WronMLP(nn.Module):\n    \"\"\"SwiGLU Feed-Forward Network\"\"\"\n    def __init__(self, config: WronAIConfig):\n        super().__init__()\n        self.config = config\n        self.hidden_size = config.d_model\n        self.intermediate_size = config.intermediate_size\n        \n        self.gate_proj = nn.Linear(self.hidden_size, self.intermediate_size, bias=False)\n        self.up_proj = nn.Linear(self.hidden_size, self.intermediate_size, bias=False)\n        self.down_proj = nn.Linear(self.intermediate_size, self.hidden_size, bias=False)\n        \n    def forward(self, x: torch.Tensor) -\u003e torch.Tensor:\n        gate = F.silu(self.gate_proj(x))  # SiLU activation\n        up = self.up_proj(x)\n        return self.down_proj(gate * up)\n\nclass WronDecoderLayer(nn.Module):\n    \"\"\"Single Transformer Decoder Layer\"\"\"\n    def __init__(self, config: WronAIConfig):\n        super().__init__()\n        self.hidden_size = config.d_model\n        \n        self.self_attn = WronAttention(config)\n        self.mlp = WronMLP(config)\n        self.input_layernorm = RMSNorm(config.d_model, eps=config.layer_norm_eps)\n        self.post_attention_layernorm = RMSNorm(config.d_model, eps=config.layer_norm_eps)\n        \n    def forward(self, hidden_states: torch.Tensor,\n                attention_mask: Optional[torch.Tensor] = None,\n                kv_cache: Optional[Tuple[torch.Tensor, torch.Tensor]] = None) -\u003e torch.Tensor:\n        \n        # Self-attention with residual connection\n        residual = hidden_states\n        hidden_states = self.input_layernorm(hidden_states)\n        hidden_states, new_kv_cache = self.self_attn(hidden_states, attention_mask, kv_cache)\n        hidden_states = residual + hidden_states\n        \n        # Feed-forward with residual connection\n        residual = hidden_states\n        hidden_states = self.post_attention_layernorm(hidden_states)\n        hidden_states = self.mlp(hidden_states)\n        hidden_states = residual + hidden_states\n        \n        return hidden_states, new_kv_cache\n\nclass WronAIModel(nn.Module):\n    \"\"\"WronAI Language Model\"\"\"\n    def __init__(self, config: WronAIConfig):\n        super().__init__()\n        self.config = config\n        self.vocab_size = config.vocab_size\n        \n        # Token embeddings\n        self.embed_tokens = nn.Embedding(config.vocab_size, config.d_model)\n        \n        # Transformer layers\n        self.layers = nn.ModuleList([\n            WronDecoderLayer(config) for _ in range(config.n_layers)\n        ])\n        \n        # Final normalization\n        self.norm = RMSNorm(config.d_model, eps=config.layer_norm_eps)\n        \n        # Output projection (weight tied with embeddings)\n        self.lm_head = nn.Linear(config.d_model, config.vocab_size, bias=False)\n        self.lm_head.weight = self.embed_tokens.weight  # Weight tying\n        \n        # Initialize weights\n        self.apply(self._init_weights)\n        \n    def _init_weights(self, module):\n        if isinstance(module, nn.Linear):\n            torch.nn.init.normal_(module.weight, mean=0.0, std=0.02)\n            if module.bias is not None:\n                torch.nn.init.zeros_(module.bias)\n        elif isinstance(module, nn.Embedding):\n            torch.nn.init.normal_(module.weight, mean=0.0, std=0.02)\n    \n    def forward(self, input_ids: torch.Tensor,\n                attention_mask: Optional[torch.Tensor] = None,\n                kv_caches: Optional[list] = None) -\u003e torch.Tensor:\n        \n        batch_size, seq_len = input_ids.shape\n        \n        # Token embeddings\n        hidden_states = self.embed_tokens(input_ids)\n        \n        # Create causal mask\n        if attention_mask is None:\n            attention_mask = torch.triu(\n                torch.full((seq_len, seq_len), float('-inf'), device=input_ids.device),\n                diagonal=1\n            )[None, None, :, :]\n        \n        # Initialize KV caches if not provided\n        if kv_caches is None:\n            kv_caches = [None] * len(self.layers)\n        \n        new_kv_caches = []\n        \n        # Pass through transformer layers\n        for i, (layer, kv_cache) in enumerate(zip(self.layers, kv_caches)):\n            hidden_states, new_kv_cache = layer(hidden_states, attention_mask, kv_cache)\n            new_kv_caches.append(new_kv_cache)\n        \n        # Final normalization\n        hidden_states = self.norm(hidden_states)\n        \n        # Output projection\n        logits = self.lm_head(hidden_states)\n        \n        return logits, new_kv_caches\n\n# Przykład użycia\ndef create_wronai_model(model_size: str = \"mini\") -\u003e WronAIModel:\n    \"\"\"Create WronAI model with predefined configurations\"\"\"\n    \n    configs = {\n        \"nano\": WronAIConfig(\n            vocab_size=32000,\n            d_model=384,\n            n_layers=6,\n            n_heads=6,\n            n_kv_heads=2,  # GQA 3:1 ratio\n            max_seq_len=2048,\n        ),\n        \"micro\": WronAIConfig(\n            vocab_size=32000,\n            d_model=512,\n            n_layers=8,\n            n_heads=8,\n            n_kv_heads=2,  # GQA 4:1 ratio\n            max_seq_len=2048,\n        ),\n        \"mini\": WronAIConfig(\n            vocab_size=32000,\n            d_model=768,\n            n_layers=12,\n            n_heads=12,\n            n_kv_heads=4,  # GQA 3:1 ratio\n            max_seq_len=4096,\n        ),\n    }\n    \n    config = configs[model_size]\n    model = WronAIModel(config)\n    \n    # Print model statistics\n    total_params = sum(p.numel() for p in model.parameters())\n    trainable_params = sum(p.numel() for p in model.parameters() if p.requires_grad)\n    \n    print(f\"✅ WronAI-{model_size} created:\")\n    print(f\"  📊 Total parameters: {total_params:,}\")\n    print(f\"  🎯 Trainable parameters: {trainable_params:,}\")\n    print(f\"  💾 Estimated size (FP16): {total_params * 2 / 1024**2:.1f} MB\")\n    print(f\"  🏗️ Architecture: {config.n_layers}L-{config.d_model}H-{config.n_heads}A\")\n    print(f\"  🔄 GQA Ratio: {config.n_heads}:{config.n_kv_heads}\")\n    \n    return model\n```\n\n---\n\n## 📊 Źródła Danych dla ARM/Edge Computing {#data-sources}\n\n### Specjalistyczne Datasety dla Edge/ARM\n\n#### 1. Dokumentacja ARM i Embedded Systems\n```python\ndef collect_arm_embedded_data():\n    \"\"\"Zbieranie danych związanych z ARM i embedded systems\"\"\"\n    \n    sources = {\n        # Oficjalne dokumentacje\n        \"arm_docs\": [\n            \"ARM Architecture Reference Manual\",\n            \"Cortex-A Series Programming Guide\", \n            \"NEON Programmer's Guide\",\n            \"ARM Assembly Language Documentation\"\n        ],\n        \n        # Raspberry Pi specific\n        \"rpi_docs\": [\n            \"https://www.raspberrypi.org/documentation/\",\n            \"RPi GPIO Programming\",\n            \"BCM2835 ARM Peripherals Guide\",\n            \"VideoCore IV Programming\"\n        ],\n        \n        # Performance optimization\n        \"optimization\": [\n            \"ARM NEON optimization guides\",\n            \"Cache optimization for ARM\",\n            \"Power management ARM Cortex\",\n            \"Thermal management embedded systems\"\n        ],\n        \n        # Real-world projects\n        \"projects\": [\n            \"GitHub repos tagged: raspberry-pi, arm, embedded\",\n            \"IoT project documentation\",\n            \"Edge computing case studies\",\n            \"ARM assembly optimizations\"\n        ]\n    }\n    \n    return sources\n\n# Kod do automatycznego scraping dokumentacji\nimport requests\nfrom bs4 import BeautifulSoup\nimport time\n\nclass ARMDocScraper:\n    def __init__(self):\n        self.session = requests.Session()\n        self.session.headers.update({\n            'User-Agent': 'Mozilla/5.0 (compatible; WronAI-DataCollector/1.0)'\n        })\n    \n    def scrape_arm_official_docs(self):\n        \"\"\"Scraping oficjalnej dokumentacji ARM\"\"\"\n        base_urls = [\n            \"https://developer.arm.com/documentation/\",\n            \"https://developer.arm.com/tools-and-software/\",\n        ]\n        \n        collected_docs = []\n        \n        for url in base_urls:\n            try:\n                response = self.session.get(url)\n                soup = BeautifulSoup(response.content, 'html.parser')\n                \n                # Extract documentation links\n                doc_links = soup.find_all('a', href=True)\n                for link in doc_links:\n                    if any(keyword in link.get('href', '').lower() \n                          for keyword in ['cortex', 'neon', 'assembly', 'optimization']):\n                        collected_docs.append({\n                            'url': link['href'],\n                            'title': link.text.strip(),\n                            'source': 'arm_official'\n                        })\n                \n                time.sleep(1)  # Rate limiting\n                \n            except Exception as e:\n                print(f\"Error scraping {url}: {e}\")\n        \n        return collected_docs\n```\n\n#### 2. Kod Źródłowy Zoptymalizowany dla ARM\n```python\ndef collect_arm_optimized_code():\n    \"\"\"Zbieranie kodu zoptymalizowanego dla ARM\"\"\"\n    \n    # GitHub repositories z ARM optimizations\n    arm_repos = [\n        # SIMD/NEON libraries\n        \"ARM-software/ComputeLibrary\",\n        \"libjpeg-turbo/libjpeg-turbo\",  # NEON optimizations\n        \"madler/zlib\",                  # ARM assembly\n        \"opencv/opencv\",                # ARM NEON optimizations\n        \n        # Embedded/IoT frameworks\n        \"ARMmbed/mbed-os\",\n        \"zephyrproject-rtos/zephyr\",\n        \"espressif/esp-idf\",\n        \n        # Machine Learning for ARM\n        \"tensorflow/tensorflow\",        # TensorFlow Lite\n        \"pytorch/pytorch\",              # Mobile optimizations\n        \"apache/tvm\",                   # Tensor compiler\n        \"ARM-software/ML-zoo\",\n        \n        # System programming\n        \"torvalds/linux\",               # ARM kernel code\n        \"u-boot/u-boot\",               # Bootloader\n        \"buildroot/buildroot\",         # Embedded Linux\n    ]\n    \n    code_patterns = [\n        # ARM Assembly patterns\n        r'\\.arm\\s+|\\.thumb\\s+',\n        r'vld1\\.|vst1\\.|vadd\\.|vmul\\.',  # NEON instructions\n        r'#ifdef\\s+__ARM_NEON',\n        r'arm_neon\\.h',\n        \n        # Performance optimizations\n        r'__builtin_prefetch',\n        r'likely\\(|unlikely\\(',\n        r'__attribute__.*aligned',\n        r'cache_line_size',\n        \n        # ARM-specific defines\n        r'CONFIG_ARM|ARM_ARCH',\n        r'__aarch64__|__arm__',\n        r'cortex[_-]a\\d+',\n    ]\n    \n    return arm_repos, code_patterns\n\n# Implementacja data collector\nfrom datasets import Dataset\nimport subprocess\nimport os\nimport fnmatch\n\nclass ARMCodeCollector:\n    def __init__(self, output_dir=\"arm_code_data\"):\n        self.output_dir = output_dir\n        os.makedirs(output_dir, exist_ok=True)\n    \n    def clone_and_extract(self, repo_url, target_extensions=None):\n        \"\"\"Clone repo i extract relevant files\"\"\"\n        if target_extensions is None:\n            target_extensions = ['.c', '.cpp', '.h', '.hpp', '.s', '.S', '.py', '.go']\n        \n        repo_name = repo_url.split('/')[-1]\n        repo_path = os.path.join(self.output_dir, repo_name)\n        \n        try:\n            # Clone repository (shallow)\n            subprocess.run([\n                'git', 'clone', '--depth', '1', \n                f'https://github.com/{repo_url}', repo_path\n            ], check=True, capture_output=True)\n            \n            # Extract relevant files\n            code_files = []\n            for root, dirs, files in os.walk(repo_path):\n                # Skip .git and build directories\n                dirs[:] = [d for d in dirs if not d.startswith('.') and d not in ['build', 'obj']]\n                \n                for file in files:\n                    if any(file.endswith(ext) for ext in target_extensions):\n                        file_path = os.path.join(root, file)\n                        try:\n                            with open(file_path, 'r', encoding='utf-8', errors='ignore') as f:\n                                content = f.read()\n                                \n                            # Check if ARM-related\n                            if self.is_arm_related(content):\n                                code_files.append({\n                                    'repo': repo_url,\n                                    'file_path': file_path.replace(repo_path, ''),\n                                    'content': content,\n                                    'language': self.detect_language(file),\n                                    'size': len(content)\n                                })\n                        except Exception as e:\n                            continue\n            \n            # Cleanup repo\n            subprocess.run(['rm', '-rf', repo_path], check=True)\n            \n            return code_files\n            \n        except Exception as e:\n            print(f\"Error processing {repo_url}: {e}\")\n            return []\n    \n    def is_arm_related(self, content):\n        \"\"\"Check if code is ARM-related\"\"\"\n        arm_indicators = [\n            'arm', 'neon', 'cortex', 'aarch64', '__arm__',\n            'vld1', 'vst1', 'vadd', 'vmul',  # NEON\n            'raspberry', 'rpi', 'bcm2835',   # RPi specific\n            'embedded', 'microcontroller',\n            '__builtin_prefetch', 'cache_line'\n        ]\n        \n        content_lower = content.lower()\n        return any(indicator in content_lower for indicator in arm_indicators)\n    \n    def detect_language(self, filename):\n        \"\"\"Detect programming language\"\"\"\n        lang_map = {\n            '.c': 'c', '.h': 'c',\n            '.cpp': 'cpp', '.cc': 'cpp', '.cxx': 'cpp',\n            '.hpp': 'cpp', '.hxx': 'cpp',\n            '.py': 'python',\n            '.go': 'go',\n            '.s': 'assembly', '.S': 'assembly',\n            '.sh': 'bash', '.bash': 'bash',\n            '.sql': 'sql',\n            '.rs': 'rust',\n            '.js': 'javascript',\n            '.ts': 'typescript'\n        }\n        \n        ext = os.path.splitext(filename)[1].lower()\n        return lang_map.get(ext, 'unknown')\n```\n\n#### 3. DSL i Domain-Specific Languages\n```python\ndef collect_dsl_data():\n    \"\"\"Zbieranie danych DSL używanych w edge computing\"\"\"\n    \n    dsl_sources = {\n        # Configuration DSLs\n        \"config_dsls\": [\n            \"YAML configurations (Docker, K8s, CI/CD)\",\n            \"TOML configs (Rust, Hugo)\",\n            \"JSON configs (package.json, tsconfig)\",\n            \"INI files (systemd, git config)\",\n            \"HCL (Terraform, Packer)\",\n        ],\n        \n        # Build \u0026 Deploy DSLs\n        \"build_dsls\": [\n            \"Dockerfile instructions\",\n            \"Makefile recipes\", \n            \"CMakeLists.txt\",\n            \"Bazel BUILD files\",\n            \"GitHub Actions YAML\",\n            \"Ansible playbooks\",\n        ],\n        \n        # Query DSLs\n        \"query_dsls\": [\n            \"SQL dialects (PostgreSQL, MySQL, SQLite)\",\n            \"PromQL (Prometheus queries)\",\n            \"JQ expressions (JSON processing)\",\n            \"XPath expressions\",\n            \"GraphQL schemas and queries\",\n        ],\n        \n        # Hardware Description\n        \"hardware_dsls\": [\n            \"Device Tree Source (.dts)\",\n            \"SystemVerilog/Verilog\",\n            \"VHDL\",\n            \"OpenCL kernels\",\n            \"CUDA kernels\",\n        ],\n        \n        # Embedded-specific\n        \"embedded_dsls\": [\n            \"Zephyr device tree overlays\",\n            \"PlatformIO configurations\",\n            \"Arduino IDE configurations\",\n            \"FreeRTOS config files\",\n            \"Yocto recipes (.bb files)\",\n        ]\n    }\n    \n    return dsl_sources\n\n# Collector implementation for DSLs\nclass DSLDataCollector:\n    def __init__(self):\n        self.dsl_patterns = {\n            'dockerfile': r'FROM\\s+|RUN\\s+|COPY\\s+|ADD\\s+',\n            'makefile': r'^[A-Za-z][^:]*:\\s*$|^\\t',\n            'cmake': r'cmake_minimum_required|add_executable|target_link_libraries',\n            'yaml': r'^---$|^\\s*[-\\w]+:\\s*,\n            'sql': r'SELECT|INSERT|UPDATE|DELETE|CREATE|ALTER|DROP',\n            'devicetree': r'/dts-v1/|compatible\\s*=|reg\\s*=',\n            'promql': r'rate\\(|increase\\(|histogram_quantile',\n            'jq': r'\\.\\w+|\\[\\]|\\|',\n        }\n    \n    def extract_dsl_samples(self, text, dsl_type):\n        \"\"\"Extract DSL code samples from text\"\"\"\n        import re\n        \n        if dsl_type not in self.dsl_patterns:\n            return []\n        \n        pattern = self.dsl_patterns[dsl_type]\n        matches = []\n        \n        lines = text.split('\\n')\n        current_block = []\n        in_block = False\n        \n        for line in lines:\n            if re.search(pattern, line, re.IGNORECASE):\n                if not in_block:\n                    in_block = True\n                    current_block = [line]\n                else:\n                    current_block.append(line)\n            elif in_block:\n                if line.strip() == '' or line.startswith(' ') or line.startswith('\\t'):\n                    current_block.append(line)\n                else:\n                    # End of block\n                    if len(current_block) \u003e 2:  # Minimum viable block\n                        matches.append('\\n'.join(current_block))\n                    current_block = []\n                    in_block = False\n        \n        # Handle final block\n        if in_block and len(current_block) \u003e 2:\n            matches.append('\\n'.join(current_block))\n        \n        return matches\n```\n\n### Krok 6: Training Pipeline z Optymalizacjami\n\n```python\nimport torch\nimport torch.nn.functional as F\nfrom torch.utils.data import DataLoader, Dataset\nfrom torch.cuda.amp import autocast, GradScaler\nimport wandb\nfrom tqdm import tqdm\nimport math\nimport os\n\nclass WronDataset(Dataset):\n    \"\"\"Dataset class for WronAI training\"\"\"\n    def __init__(self, tokenized_texts, max_length=2048):\n        self.texts = tokenized_texts\n        self.max_length = max_length\n    \n    def __len__(self):\n        return len(self.texts)\n    \n    def __getitem__(self, idx):\n        tokens = self.texts[idx]\n        \n        # Truncate if too long\n        if len(tokens) \u003e self.max_length:\n            tokens = tokens[:self.max_length]\n        \n        # Convert to tensors\n        input_ids = torch.tensor(tokens[:-1], dtype=torch.long)\n        labels = torch.tensor(tokens[1:], dtype=torch.long)\n        \n        return {\n            'input_ids': input_ids,\n            'labels': labels\n        }\n\ndef collate_fn(batch):\n    \"\"\"Custom collate function with padding\"\"\"\n    max_len = max(len(item['input_ids']) for item in batch)\n    \n    input_ids = []\n    labels = []\n    \n    for item in batch:\n        # Pad sequences\n        pad_len = max_len - len(item['input_ids'])\n        \n        padded_input = F.pad(item['input_ids'], (0, pad_len), value=0)\n        padded_labels = F.pad(item['labels'], (0, pad_len), value=-100)  # -100 ignored in loss\n        \n        input_ids.append(padded_input)\n        labels.append(padded_labels)\n    \n    return {\n        'input_ids': torch.stack(input_ids),\n        'labels': torch.stack(labels)\n    }\n\nclass WronTrainer:\n    \"\"\"Advanced trainer for WronAI with edge optimizations\"\"\"\n    \n    def __init__(self, model, tokenizer, config):\n        self.model = model\n        self.tokenizer = tokenizer\n        self.config = config\n        \n        # Optimizer with weight decay\n        self.optimizer = torch.optim.AdamW(\n            model.parameters(),\n            lr=config.learning_rate,\n            betas=(0.9, 0.95),\n            weight_decay=config.weight_decay,\n            eps=1e-8\n        )\n        \n        # Learning rate scheduler\n        self.scheduler = self.create_scheduler()\n        \n        # Mixed precision scaler\n        self.scaler = GradScaler() if config.use_amp else None\n        \n        # Monitoring\n        self.step = 0\n        self.best_loss = float('inf')\n        \n        # Initialize wandb\n        if config.use_wandb:\n            wandb.init(\n                project=\"wronai-training\",\n                config=vars(config),\n                name=f\"wronai-{config.model_size}-{config.run_name}\"\n            )\n    \n    def create_scheduler(self):\n        \"\"\"Create learning rate scheduler\"\"\"\n        if self.config.scheduler_type == 'cosine':\n            return torch.optim.lr_scheduler.CosineAnnealingLR(\n                self.optimizer,\n                T_max=self.config.max_steps,\n                eta_min=self.config.learning_rate * 0.1\n            )\n        elif self.config.scheduler_type == 'linear_warmup':\n            return self.create_linear_warmup_scheduler()\n        else:\n            return torch.optim.lr_scheduler.StepLR(self.optimizer, step_size=1000, gamma=0.95)\n    \n    def create_linear_warmup_scheduler(self):\n        \"\"\"Create linear warmup + cosine decay scheduler\"\"\"\n        def lr_lambda(step):\n            if step \u003c self.config.warmup_steps:\n                return step / self.config.warmup_steps\n            else:\n                progress = (step - self.config.warmup_steps) / (self.config.max_steps - self.config.warmup_steps)\n                return 0.5 * (1 + math.cos(math.pi * progress))\n        \n        return torch.optim.lr_scheduler.LambdaLR(self.optimizer, lr_lambda)\n    \n    def train_step(self, batch):\n        \"\"\"Single training step with optimizations\"\"\"\n        self.model.train()\n        \n        input_ids = batch['input_ids'].to(self.config.device)\n        labels = batch['labels'].to(self.config.device)\n        \n        # Forward pass with optional mixed precision\n        if self.config.use_amp and self.scaler:\n            with autocast():\n                logits, _ = self.model(input_ids)\n                loss = F.cross_entropy(\n                    logits.view(-1, logits.size(-1)),\n                    labels.view(-1),\n                    ignore_index=-100\n                )\n        else:\n            logits, _ = self.model(input_ids)\n            loss = F.cross_entropy(\n                logits.view(-1, logits.size(-1)),\n                labels.view(-1),\n                ignore_index=-100\n            )\n        \n        # Backward pass\n        if self.config.use_amp and self.scaler:\n            self.scaler.scale(loss).backward()\n            \n            # Gradient clipping\n            self.scaler.unscale_(self.optimizer)\n            torch.nn.utils.clip_grad_norm_(self.model.parameters(), self.config.max_grad_norm)\n            \n            self.scaler.step(self.optimizer)\n            self.scaler.update()\n        else:\n            loss.backward()\n            torch.nn.utils.clip_grad_norm_(self.model.parameters(), self.config.max_grad_norm)\n            self.optimizer.step()\n        \n        self.scheduler.step()\n        self.optimizer.zero_grad()\n        \n        return loss.item()\n    \n    def train(self, train_loader, eval_loader=None):\n        \"\"\"Main training loop\"\"\"\n        print(f\"🚀 Starting training for {self.config.max_steps} steps...\")\n        \n        progress_bar = tqdm(total=self.config.max_steps, desc=\"Training\")\n        \n        while self.step \u003c self.config.max_steps:\n            epoch_loss = 0\n            num_batches = 0\n            \n            for batch in train_loader:\n                loss = self.train_step(batch)\n                epoch_loss += loss\n                num_batches += 1\n                self.step += 1\n                \n                # Logging\n                if self.step % self.config.log_interval == 0:\n                    avg_loss = epoch_loss / num_batches\n                    lr = self.scheduler.get_last_lr()[0]\n                    \n                    progress_bar.set_postfix({\n                        'loss': f'{avg_loss:.4f}',\n                        'lr': f'{lr:.2e}',\n                        'step': self.step\n                    })\n                    \n                    if self.config.use_wandb:\n                        wandb.log({\n                            'train_loss': avg_loss,\n                            'learning_rate': lr,\n                            'step': self.step\n                        })\n                \n                # Evaluation\n                if eval_loader and self.step % self.config.eval_interval == 0:\n                    eval_loss = self.evaluate(eval_loader)\n                    print(f\"\\n📊 Step {self.step} - Eval Loss: {eval_loss:.4f}\")\n                    \n                    if self.config.use_wandb:\n                        wandb.log({'eval_loss': eval_loss, 'step': self.step})\n                    \n                    # Save best model\n                    if eval_loss \u003c self.best_loss:\n                        self.best_loss = eval_loss\n                        self.save_model(\"best_model\")\n                \n                # Save checkpoint\n                if self.step % self.config.save_interval == 0:\n                    self.save_model(f\"checkpoint_step_{self.step}\")\n                \n                progress_bar.update(1)\n                \n                if self.step \u003e= self.config.max_steps:\n                    break\n        \n        progress_bar.close()\n        print(\"✅ Training completed!\")\n        \n        # Final save\n        self.save_model(\"final_model\")\n    \n    def evaluate(self, eval_loader):\n        \"\"\"Evaluation loop\"\"\"\n        self.model.eval()\n        total_loss = 0\n        num_batches = 0\n        \n        with torch.no_grad():\n            for batch in eval_loader:\n                input_ids = batch['input_ids'].to(self.config.device)\n                labels = batch['labels'].to(self.config.device)\n                \n                if self.config.use_amp:\n                    with autocast():\n                        logits, _ = self.model(input_ids)\n                        loss = F.cross_entropy(\n                            logits.view(-1, logits.size(-1)),\n                            labels.view(-1),\n                            ignore_index=-100\n                        )\n                else:\n                    logits, _ = self.model(input_ids)\n                    loss = F.cross_entropy(\n                        logits.view(-1, logits.size(-1)),\n                        labels.view(-1),\n                        ignore_index=-100\n                    )\n                \n                total_loss += loss.item()\n                num_batches += 1\n        \n        self.model.train()\n        return total_loss / num_batches\n    \n    def save_model(self, name):\n        \"\"\"Save model checkpoint\"\"\"\n        save_dir = os.path.join(self.config.output_dir, name)\n        os.makedirs(save_dir, exist_ok=True)\n        \n        # Save model state\n        torch.save({\n            'model_state_dict': self.model.state_dict(),\n            'optimizer_state_dict': self.optimizer.state_dict(),\n            'scheduler_state_dict': self.scheduler.state_dict(),\n            'step': self.step,\n            'best_loss': self.best_loss,\n            'config': self.config\n        }, os.path.join(save_dir, 'pytorch_model.pt'))\n        \n        # Save config\n        with open(os.path.join(save_dir, 'config.json'), 'w') as f:\n            import json\n            json.dump(vars(self.config), f, indent=2)\n        \n        print(f\"💾 Model saved to {save_dir}\")\n\nclass TrainingConfig:\n    \"\"\"Training configuration class\"\"\"\n    def __init__(self):\n        # Model config\n        self.model_size = \"mini\"\n        self.vocab_size = 32000\n        \n        # Training hyperparameters\n        self.learning_rate = 3e-4\n        self.weight_decay = 0.1\n        self.max_grad_norm = 1.0\n        self.batch_size = 8\n        self.gradient_accumulation_steps = 4\n        self.max_steps = 100000\n        \n        # Scheduler\n        self.scheduler_type = \"linear_warmup\"\n        self.warmup_steps = 2000\n        \n        # Mixed precision\n        self.use_amp = True\n        \n        # Logging and saving\n        self.log_interval = 100\n        self.eval_interval = 2000\n        self.save_interval = 5000\n        self.use_wandb = True\n        self.run_name = \"v1\"\n        \n        # Paths\n        self.output_dir = \"./wronai_output\"\n        self.device = \"cuda\" if torch.cuda.is_available() else \"cpu\"\n\n# Training execution\ndef run_training():\n    \"\"\"Execute full training pipeline\"\"\"\n    \n    # Configuration\n    config = TrainingConfig()\n    \n    # Create model\n    model = create_wronai_model(config.model_size)\n    model.to(config.device)\n    \n    # Load tokenizer (placeholder - you'd load your trained tokenizer)\n    # tokenizer = spm.SentencePieceProcessor()\n    # tokenizer.load('wronai_tokenizer.model')\n    \n    # Prepare datasets (placeholder - you'd prepare your actual data)\n    # train_dataset = WronDataset(train_texts)\n    # eval_dataset = WronDataset(eval_texts)\n    \n    # train_loader = DataLoader(\n    #     train_dataset, \n    #     batch_size=config.batch_size,\n    #     shuffle=True,\n    #     collate_fn=collate_fn,\n    #     num_workers=4\n    # )\n    \n    # eval_loader = DataLoader(\n    #     eval_dataset,\n    #     batch_size=config.batch_size,\n    #     shuffle=False,\n    #     collate_fn=collate_fn,\n    #     num_workers=4\n    # )\n    \n    # Initialize trainer\n    # trainer = WronTrainer(model, tokenizer, config)\n    \n    # Start training\n    # trainer.train(train_loader, eval_loader)\n    \n    print(\"🎯 Training pipeline setup complete!\")\n\nif __name__ == \"__main__\":\n    run_training()\n```\n\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fwronai%2Fdocs","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fwronai%2Fdocs","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fwronai%2Fdocs/lists"}