{"id":48197591,"url":"https://github.com/synapticore-io/torch-cuda","last_synced_at":"2026-04-04T18:16:37.672Z","repository":{"id":295968968,"uuid":"990873574","full_name":"synapticore-io/torch-cuda","owner":"synapticore-io","description":"PyTorch CUDA 12.8 project template - GPU-accelerated ML with modern Python packaging (uv), checkpointing, early stopping, and reproducibility utilities.","archived":false,"fork":false,"pushed_at":"2026-03-24T14:20:15.000Z","size":853,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-03-25T18:31:32.740Z","etag":null,"topics":["cuda","gpu","project-template","python","pytorch"],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/synapticore-io.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-05-26T19:18:30.000Z","updated_at":"2026-03-24T14:20:14.000Z","dependencies_parsed_at":"2025-05-28T10:20:43.474Z","dependency_job_id":"1ec07ce7-8c0d-451a-b2d5-cdcd970d68cc","html_url":"https://github.com/synapticore-io/torch-cuda","commit_stats":null,"previous_names":["bjoernbethge/torch-cuda","synapticore-io/torch-cuda"],"tags_count":0,"template":true,"template_full_name":null,"purl":"pkg:github/synapticore-io/torch-cuda","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/synapticore-io%2Ftorch-cuda","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/synapticore-io%2Ftorch-cuda/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/synapticore-io%2Ftorch-cuda/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/synapticore-io%2Ftorch-cuda/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/synapticore-io","download_url":"https://codeload.github.com/synapticore-io/torch-cuda/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/synapticore-io%2Ftorch-cuda/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31407912,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-04T10:20:44.708Z","status":"ssl_error","status_checked_at":"2026-04-04T10:20:06.846Z","response_time":60,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cuda","gpu","project-template","python","pytorch"],"created_at":"2026-04-04T18:16:33.937Z","updated_at":"2026-04-04T18:16:36.980Z","avatar_url":"https://github.com/synapticore-io.png","language":"Python","funding_links":["https://ko-fi.com/N4N71WOHZ3"],"categories":[],"sub_categories":[],"readme":"# ⚡ PyTorch CUDA Template\r\n\r\n[![ko-fi](https://ko-fi.com/img/githubbutton_sm.svg)](https://ko-fi.com/N4N71WOHZ3)\r\n\r\n\u003cdiv align=\"center\"\u003e\r\n\r\n\u003c!-- Badges --\u003e\r\n[![Python Version](https://img.shields.io/badge/python-3.11%2B-blue.svg)](https://python.org)\r\n[![PyTorch](https://img.shields.io/badge/PyTorch-2.7%2B-red.svg)](https://pytorch.org)\r\n[![CUDA](https://img.shields.io/badge/CUDA-12.8-green.svg)](https://developer.nvidia.com/cuda-toolkit)\r\n[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)\r\n[![uv](https://img.shields.io/badge/uv-enabled-brightgreen.svg)](https://github.com/astral-sh/uv)\r\n[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)\r\n\r\n\u003c!-- Logo/Header --\u003e\r\n\u003ch3\u003e🚀 A blazing-fast Python template for GPU-accelerated machine learning\u003c/h3\u003e\r\n\r\n*Harness the full power of modern PyTorch with CUDA 12.8 acceleration* 🔥\r\n\r\n\u003c/div\u003e\r\n\r\n---\r\n\r\n## 🌟 Overview\r\n\r\n**PyTorch CUDA Template** provides everything you need to jumpstart your GPU-accelerated machine learning projects. Built with modern Python packaging standards and optimized for **PyTorch 2.7+** with **CUDA 12.8** support, this template eliminates setup friction so you can focus on building amazing models.\r\n\r\n### 🎯 Key Features\r\n\r\n- 🔥 **Cutting-Edge PyTorch** - Latest PyTorch 2.7+ with optimized CUDA 12.8 support\r\n- ⚡ **GPU-Ready Architecture** - Pre-configured CUDA acceleration with intelligent CPU fallback\r\n- 🛠️ **Modern Development Stack** - Integrated linting, formatting, testing, and type checking\r\n- 📊 **ML Ops Ready** - MLflow experiment tracking and Polars for high-performance data processing\r\n- 🚀 **Lightning-Fast Setup** - Powered by `uv` for blazing-fast dependency resolution\r\n- 🏗️ **Production-Ready Structure** - Following modern Python packaging best practices\r\n\r\n---\r\n\r\n## 📋 Requirements\r\n\r\n- 🐍 **Python** ≥ 3.11\r\n- 🎮 **CUDA** 12.8 (for GPU acceleration)\r\n- 💻 **GPU** Compatible NVIDIA GPU (optional, gracefully falls back to CPU)\r\n- ⚡ **uv** Package manager (recommended for fastest installs)\r\n\r\n---\r\n\r\n## 🚀 Installation\r\n\r\n### ⚡ Lightning-Fast Setup\r\n\r\n```bash\r\n# Clone the template\r\ngit clone https://github.com/bjoernbethge/torch-cuda.git\r\ncd torch-cuda\r\n\r\n# Install everything with uv (recommended)\r\nuv sync\r\n```\r\n\r\n### 🎛️ Customized Installation\r\n\r\nChoose exactly what you need:\r\n\r\n```bash\r\n# 🔥 Basic PyTorch setup\r\nuv sync\r\n\r\n# 🧪 Development environment (testing, linting, formatting)\r\nuv sync --extra dev\r\n\r\n# 📊 ML Ops toolkit (MLflow, Polars, Plotly, profiling tools)\r\nuv sync --extra extras\r\n\r\n# 🌟 Everything included (the full experience)\r\nuv sync --extra all\r\n# Add new packages\r\nuv add torchvision\r\n```\r\n\r\n---\r\n\r\n## 🚀 Quick Start Guide\r\n\r\n### 1. 🔍 Verify Your GPU Setup\r\n\r\n```python\r\nimport torch\r\n\r\nprint(f\"🔥 PyTorch version: {torch.__version__}\")\r\nprint(f\"⚡ CUDA available: {torch.cuda.is_available()}\")\r\nprint(f\"🎮 CUDA version: {torch.version.cuda}\")\r\nprint(f\"💻 GPU count: {torch.cuda.device_count()}\")\r\n\r\nif torch.cuda.is_available():\r\n    print(f\"🚀 Current GPU: {torch.cuda.get_device_name()}\")\r\n    print(f\"💾 GPU Memory: {torch.cuda.get_device_properties(0).total_memory / 1e9:.1f} GB\")\r\n```\r\n\r\n### 2. 🧠 Create Your First Model\r\n\r\n```python\r\nimport torch\r\nimport torch.nn as nn\r\n\r\n# 🎯 Automatically detect best device\r\ndevice = torch.device(\"cuda\" if torch.cuda.is_available() else \"cpu\")\r\nprint(f\"🚀 Using device: {device}\")\r\n\r\n# 🧠 Build a neural network\r\nclass SimpleNet(nn.Module):\r\n    def __init__(self, input_size=784, hidden_size=256, num_classes=10):\r\n        super().__init__()\r\n        self.network = nn.Sequential(\r\n            nn.Linear(input_size, hidden_size),\r\n            nn.ReLU(),\r\n            nn.Dropout(0.2),\r\n            nn.Linear(hidden_size, hidden_size),\r\n            nn.ReLU(),\r\n            nn.Dropout(0.2),\r\n            nn.Linear(hidden_size, num_classes)\r\n        )\r\n    \r\n    def forward(self, x):\r\n        return self.network(x)\r\n\r\n# 🚀 Instantiate and move to GPU\r\nmodel = SimpleNet().to(device)\r\n\r\n# 📊 Model info\r\ntotal_params = sum(p.numel() for p in model.parameters())\r\nprint(f\"🧠 Model parameters: {total_params:,}\")\r\n\r\n# 🎯 Test forward pass\r\nsample_input = torch.randn(32, 784).to(device)\r\noutput = model(sample_input)\r\nprint(f\"📊 Input shape: {sample_input.shape}\")\r\nprint(f\"📈 Output shape: {output.shape}\")\r\n```\r\n\r\n### 3. 🏋️ Train with MLflow Tracking\r\n\r\n```python\r\nimport torch\r\nimport torch.nn as nn\r\nimport torch.optim as optim\r\nfrom torch.utils.data import DataLoader, TensorDataset\r\nfrom tqdm import tqdm\r\nimport mlflow\r\nimport mlflow.pytorch\r\n\r\n# 📊 Initialize MLflow experiment\r\nmlflow.set_experiment(\"pytorch-cuda-training\")\r\nmlflow.start_run()\r\n\r\n# 🎯 Setup training environment\r\ndevice = torch.device(\"cuda\" if torch.cuda.is_available() else \"cpu\")\r\nmodel = SimpleNet().to(device)\r\noptimizer = optim.AdamW(model.parameters(), lr=0.001, weight_decay=0.01)\r\ncriterion = nn.CrossEntropyLoss()\r\n\r\n# 📈 Log hyperparameters\r\nmlflow.log_params({\r\n    \"learning_rate\": 0.001,\r\n    \"batch_size\": 32,\r\n    \"epochs\": 10,\r\n    \"device\": str(device),\r\n    \"model_params\": sum(p.numel() for p in model.parameters())\r\n})\r\n\r\n# 📊 Create sample dataset\r\nX = torch.randn(1000, 784)\r\ny = torch.randint(0, 10, (1000,))\r\ndataset = TensorDataset(X, y)\r\ndataloader = DataLoader(\r\n    dataset, \r\n    batch_size=32, \r\n    shuffle=True,\r\n    num_workers=4,  # 🚀 Parallel data loading\r\n    pin_memory=True  # ⚡ Faster GPU transfer\r\n)\r\n\r\n# 🏋️ Training loop with MLflow logging\r\nmodel.train()\r\nfor epoch in range(10):\r\n    epoch_loss = 0\r\n    correct_predictions = 0\r\n    \r\n    pbar = tqdm(dataloader, desc=f\"🏋️ Epoch {epoch+1}/10\")\r\n    \r\n    for batch_x, batch_y in pbar:\r\n        batch_x, batch_y = batch_x.to(device), batch_y.to(device)\r\n        \r\n        optimizer.zero_grad()\r\n        outputs = model(batch_x)\r\n        loss = criterion(outputs, batch_y)\r\n        \r\n        loss.backward()\r\n        optimizer.step()\r\n        \r\n        epoch_loss += loss.item()\r\n        pred = outputs.argmax(dim=1)\r\n        correct_predictions += (pred == batch_y).sum().item()\r\n        \r\n        pbar.set_postfix({'Loss': f'{loss.item():.4f}'})\r\n    \r\n    # 📊 Log metrics to MLflow\r\n    avg_loss = epoch_loss / len(dataloader)\r\n    accuracy = correct_predictions / len(dataset)\r\n    \r\n    mlflow.log_metrics({\r\n        \"loss\": avg_loss,\r\n        \"accuracy\": accuracy,\r\n        \"epoch\": epoch + 1\r\n    })\r\n    \r\n    print(f\"🎯 Epoch {epoch+1}: Loss = {avg_loss:.4f}, Accuracy = {accuracy:.3f}\")\r\n\r\n# 💾 Save model\r\nmlflow.pytorch.log_model(model, \"model\")\r\nmlflow.end_run()\r\n```\r\n\r\n### 4. 📊 High-Performance Data Processing with Polars\r\n\r\n```python\r\nimport polars as pl\r\nimport torch\r\nfrom torch.utils.data import Dataset, DataLoader\r\n\r\n# 📊 Create and process data with Polars (much faster than pandas)\r\ndef create_sample_dataset():\r\n    \"\"\"Create a sample dataset using Polars for high-performance processing\"\"\"\r\n    \r\n    # 🚀 Generate sample data with Polars\r\n    df = pl.DataFrame({\r\n        \"feature_1\": pl.Series([i * 0.1 for i in range(10000)]),\r\n        \"feature_2\": pl.Series([i * 0.2 + 1 for i in range(10000)]),\r\n        \"feature_3\": pl.Series([i * 0.05 - 0.5 for i in range(10000)]),\r\n        \"target\": pl.Series([i % 3 for i in range(10000)])\r\n    })\r\n    \r\n    # 📈 High-performance data transformations\r\n    processed_df = (\r\n        df\r\n        .with_columns([\r\n            # 🔄 Feature engineering\r\n            ((pl.col(\"feature_1\") * pl.col(\"feature_2\")).alias(\"interaction_1\")),\r\n            (pl.col(\"feature_3\").pow(2).alias(\"feature_3_squared\")),\r\n            # 📊 Normalization\r\n            ((pl.col(\"feature_1\") - pl.col(\"feature_1\").mean()) / pl.col(\"feature_1\").std()).alias(\"feature_1_norm\"),\r\n            ((pl.col(\"feature_2\") - pl.col(\"feature_2\").mean()) / pl.col(\"feature_2\").std()).alias(\"feature_2_norm\")\r\n        ])\r\n        .filter(pl.col(\"feature_1\") \u003e 0.5)  # 🎯 Fast filtering\r\n    )\r\n    \r\n    print(f\"📊 Processed {len(processed_df)} samples\")\r\n    return processed_df\r\n\r\n# 🎯 Custom Dataset class for Polars integration\r\nclass PolarsDataset(Dataset):\r\n    def __init__(self, df: pl.DataFrame, feature_cols: list, target_col: str):\r\n        self.features = torch.tensor(df.select(feature_cols).to_numpy(), dtype=torch.float32)\r\n        self.targets = torch.tensor(df.select(target_col).to_numpy().flatten(), dtype=torch.long)\r\n    \r\n    def __len__(self):\r\n        return len(self.features)\r\n    \r\n    def __getitem__(self, idx):\r\n        return self.features[idx], self.targets[idx]\r\n\r\n# 🚀 Use the high-performance dataset\r\ndf = create_sample_dataset()\r\nfeature_cols = [\"feature_1_norm\", \"feature_2_norm\", \"feature_3_squared\", \"interaction_1\"]\r\n\r\ndataset = PolarsDataset(df, feature_cols, \"target\")\r\ndataloader = DataLoader(dataset, batch_size=64, shuffle=True, pin_memory=True)\r\n\r\nprint(f\"✅ Created dataset with {len(dataset)} samples and {len(feature_cols)} features\")\r\n```\r\n\r\n### 5. 📈 Interactive Visualization with Plotly\r\n\r\n```python\r\nimport plotly.express as px\r\nimport plotly.graph_objects as go\r\nfrom plotly.subplots import make_subplots\r\nimport torch\r\nimport numpy as np\r\n\r\ndef visualize_training_metrics(losses, accuracies, gpu_utilization=None):\r\n    \"\"\"Create interactive training visualizations\"\"\"\r\n    \r\n    # 📊 Create subplots\r\n    fig = make_subplots(\r\n        rows=2, cols=2,\r\n        subplot_titles=('🏋️ Training Loss', '🎯 Accuracy', '⚡ GPU Utilization', '📈 Learning Curve'),\r\n        specs=[[{\"secondary_y\": False}, {\"secondary_y\": False}],\r\n               [{\"secondary_y\": False}, {\"secondary_y\": False}]]\r\n    )\r\n    \r\n    epochs = list(range(1, len(losses) + 1))\r\n    \r\n    # 📉 Loss curve\r\n    fig.add_trace(\r\n        go.Scatter(x=epochs, y=losses, mode='lines+markers', name='Loss', line=dict(color='red')),\r\n        row=1, col=1\r\n    )\r\n    \r\n    # 🎯 Accuracy curve\r\n    fig.add_trace(\r\n        go.Scatter(x=epochs, y=accuracies, mode='lines+markers', name='Accuracy', line=dict(color='green')),\r\n        row=1, col=2\r\n    )\r\n    \r\n    # ⚡ GPU utilization (if available)\r\n    if gpu_utilization:\r\n        fig.add_trace(\r\n            go.Scatter(x=epochs, y=gpu_utilization, mode='lines+markers', name='GPU %', line=dict(color='blue')),\r\n            row=2, col=1\r\n        )\r\n    \r\n    # 📈 Combined learning curve\r\n    fig.add_trace(\r\n        go.Scatter(x=epochs, y=losses, mode='lines', name='Loss (normalized)', line=dict(color='red', dash='dot')),\r\n        row=2, col=2\r\n    )\r\n    fig.add_trace(\r\n        go.Scatter(x=epochs, y=accuracies, mode='lines', name='Accuracy', line=dict(color='green')),\r\n        row=2, col=2\r\n    )\r\n    \r\n    # 🎨 Update layout\r\n    fig.update_layout(\r\n        title=\"🚀 PyTorch CUDA Training Dashboard\",\r\n        showlegend=True,\r\n        height=600\r\n    )\r\n    \r\n    return fig\r\n\r\n# 📊 Example usage\r\nsample_losses = [2.3, 1.8, 1.4, 1.1, 0.9, 0.7, 0.6, 0.5, 0.4, 0.35]\r\nsample_accuracies = [0.1, 0.3, 0.5, 0.65, 0.75, 0.82, 0.87, 0.91, 0.94, 0.96]\r\nsample_gpu_util = [85, 87, 90, 88, 92, 89, 91, 88, 90, 87]\r\n\r\nfig = visualize_training_metrics(sample_losses, sample_accuracies, sample_gpu_util)\r\nfig.show()  # 🎯 Interactive visualization in browser\r\n```\r\n\r\n### 6. ⚡ Performance Monitoring with GPU Profiling\r\n\r\n```python\r\nimport torch\r\nfrom torch.profiler import profile, record_function, ProfilerActivity\r\nimport psutil\r\nimport time\r\n\r\ndef profile_training_step(model, data_loader, device):\r\n    \"\"\"Profile training performance with detailed GPU metrics\"\"\"\r\n    \r\n    # 🔍 Start profiling\r\n    with profile(\r\n        activities=[ProfilerActivity.CPU, ProfilerActivity.CUDA],\r\n        record_shapes=True,\r\n        profile_memory=True,\r\n        with_stack=True\r\n    ) as prof:\r\n        \r\n        model.train()\r\n        for i, (batch_x, batch_y) in enumerate(data_loader):\r\n            if i \u003e= 5:  # Profile first 5 batches\r\n                break\r\n                \r\n            batch_x, batch_y = batch_x.to(device), batch_y.to(device)\r\n            \r\n            with record_function(\"forward_pass\"):\r\n                outputs = model(batch_x)\r\n                loss = torch.nn.functional.cross_entropy(outputs, batch_y)\r\n            \r\n            with record_function(\"backward_pass\"):\r\n                loss.backward()\r\n            \r\n            with record_function(\"optimizer_step\"):\r\n                torch.optim.Adam(model.parameters()).step()\r\n    \r\n    # 📊 Print profiling results\r\n    print(\"🔥 GPU Profiling Results:\")\r\n    print(prof.key_averages().table(sort_by=\"cuda_time_total\", row_limit=10))\r\n    \r\n    # 💾 Export for visualization\r\n    prof.export_chrome_trace(\"trace.json\")\r\n    print(\"📈 Trace exported to trace.json - open in chrome://tracing\")\r\n\r\ndef monitor_system_resources():\r\n    \"\"\"Monitor CPU, memory, and GPU usage\"\"\"\r\n    \r\n    # 💻 System resources\r\n    cpu_percent = psutil.cpu_percent(interval=1)\r\n    memory = psutil.virtual_memory()\r\n    \r\n    print(f\"💻 CPU Usage: {cpu_percent}%\")\r\n    print(f\"💾 RAM Usage: {memory.percent}% ({memory.used / 1e9:.1f}GB / {memory.total / 1e9:.1f}GB)\")\r\n    \r\n    # 🎮 GPU resources\r\n    if torch.cuda.is_available():\r\n        gpu_memory = torch.cuda.memory_allocated() / 1e9\r\n        gpu_cached = torch.cuda.memory_reserved() / 1e9\r\n        gpu_total = torch.cuda.get_device_properties(0).total_memory / 1e9\r\n        \r\n        print(f\"🎮 GPU Memory: {gpu_memory:.1f}GB allocated, {gpu_cached:.1f}GB cached, {gpu_total:.1f}GB total\")\r\n        print(f\"📊 GPU Utilization: {(gpu_memory/gpu_total)*100:.1f}%\")\r\n\r\n# 🚀 Example usage\r\ndevice = torch.device(\"cuda\" if torch.cuda.is_available() else \"cpu\")\r\nmodel = SimpleNet().to(device)\r\n\r\n# Monitor during training\r\nmonitor_system_resources()\r\n```\r\n\r\n---\r\n\r\n## 🧪 Development Workflow\r\n\r\n### 🛠️ Setup Development Environment\r\n\r\n```bash\r\n# 📦 Install all development tools\r\nuv sync --extra dev\r\n\r\n# 🪝 Setup pre-commit hooks for code quality\r\npre-commit install\r\n\r\n# 🧪 Verify everything works\r\npytest --version \u0026\u0026 black --version \u0026\u0026 mypy --version\r\n```\r\n\r\n### ✨ Code Quality Arsenal\r\n\r\n```bash\r\n# 🎨 Format your code beautifully\r\nblack src/ tests/\r\nisort src/ tests/\r\n\r\n# 🔍 Lint and catch issues\r\nruff check src/ tests/\r\n\r\n# 🎯 Type checking for better code\r\nmypy src/\r\n\r\n# 🧪 Run comprehensive tests\r\npytest\r\n\r\n# 📊 Test coverage analysis\r\npytest --cov=src --cov-report=html\r\n```\r\n\r\n---\r\n\r\n## 🚀 Performance Optimization Guide\r\n\r\n### ⚡ GPU Memory Optimization\r\n\r\n```python\r\n# 💾 Monitor GPU memory usage\r\ndef print_gpu_memory():\r\n    if torch.cuda.is_available():\r\n        allocated = torch.cuda.memory_allocated() / 1e9\r\n        cached = torch.cuda.memory_reserved() / 1e9\r\n        print(f\"💾 GPU Memory - Allocated: {allocated:.2f}GB, Cached: {cached:.2f}GB\")\r\n\r\n# 🧹 Memory cleanup strategies\r\ndef cleanup_gpu_memory():\r\n    \"\"\"Clean up GPU memory periodically\"\"\"\r\n    if torch.cuda.is_available():\r\n        torch.cuda.empty_cache()\r\n        torch.cuda.synchronize()\r\n\r\n# 📊 Gradient accumulation for large effective batch sizes\r\naccumulation_steps = 4\r\nfor i, (batch_x, batch_y) in enumerate(dataloader):\r\n    outputs = model(batch_x)\r\n    loss = criterion(outputs, batch_y) / accumulation_steps\r\n    loss.backward()\r\n    \r\n    if (i + 1) % accumulation_steps == 0:\r\n        optimizer.step()\r\n        optimizer.zero_grad()\r\n```\r\n\r\n### 🔥 Training Acceleration\r\n\r\n```python\r\n# ⚡ DataLoader optimization\r\ndataloader = DataLoader(\r\n    dataset,\r\n    batch_size=64,\r\n    shuffle=True,\r\n    num_workers=min(8, os.cpu_count()),  # Optimal worker count\r\n    pin_memory=True,  # Faster GPU transfer\r\n    persistent_workers=True,  # Keep workers alive\r\n    prefetch_factor=2  # Prefetch batches\r\n)\r\n\r\n# 🚀 Model compilation (PyTorch 2.0+)\r\nmodel = torch.compile(\r\n    model, \r\n    mode=\"max-autotune\",  # Maximum optimization\r\n    dynamic=False  # Static shapes for better optimization\r\n)\r\n\r\n# 💡 Learning rate scheduling\r\nscheduler = optim.lr_scheduler.OneCycleLR(\r\n    optimizer,\r\n    max_lr=0.01,\r\n    steps_per_epoch=len(dataloader),\r\n    epochs=num_epochs,\r\n    pct_start=0.3,  # 30% warmup\r\n    anneal_strategy='cos'\r\n)\r\n```\r\n\r\n---\r\n\r\n## 🤝 Contributing\r\n\r\nWe welcome contributions from the community! Here's how to get involved:\r\n\r\n### 🛠️ Development Setup\r\n\r\n1. **🍴 Fork** the repository on GitHub\r\n2. **📥 Clone** your fork: `git clone https://github.com/yourusername/torch-cuda.git`\r\n3. **📦 Install** in development mode: `uv sync --extra dev`\r\n4. **🌿 Create** a feature branch: `git checkout -b feature/amazing-feature`\r\n5. **✨ Make** your changes and add comprehensive tests\r\n6. **🧪 Run** the test suite: `pytest`\r\n7. **🎨 Format** your code: `black . \u0026\u0026 isort .`\r\n8. **📝 Commit** your changes: `git commit -m 'Add amazing feature'`\r\n9. **🚀 Push** to your branch: `git push origin feature/amazing-feature`\r\n10. **🔄 Submit** a Pull Request\r\n\r\n---\r\n\r\n## 🆘 Troubleshooting\r\n\r\n### 🔥 Common CUDA Issues\r\n\r\n**❌ CUDA Out of Memory**\r\n```python\r\n# 💡 Solutions:\r\n# 1. Reduce batch size\r\nbatch_size = 16  # Instead of 64\r\n\r\n# 2. Use gradient accumulation\r\naccumulation_steps = 4\r\n\r\n# 3. Enable mixed precision\r\nfrom torch.cuda.amp import autocast\r\nwith autocast():\r\n    outputs = model(inputs)\r\n\r\n# 4. Clear cache periodically\r\ntorch.cuda.empty_cache()\r\n```\r\n\r\n**🐌 Slow Training Performance**\r\n```python\r\n# 💡 Performance boosters:\r\n# 1. Optimize DataLoader\r\ndataloader = DataLoader(\r\n    dataset,\r\n    num_workers=4,      # Parallel loading\r\n    pin_memory=True,    # Faster GPU transfer\r\n    persistent_workers=True  # Keep workers alive\r\n)\r\n\r\n# 2. Enable optimizations\r\ntorch.backends.cudnn.benchmark = True\r\nmodel = torch.compile(model)\r\n\r\n# 3. Use appropriate batch sizes\r\n# Sweet spot is usually 32-128 depending on model size\r\n```\r\n\r\n**🚫 Installation Issues**\r\n```bash\r\n# 🔄 Refresh installation\r\nuv sync --extra all\r\n\r\n# 🧹 Clean cache and reinstall\r\nuv cache clean \u0026\u0026 uv sync\r\n\r\n# 🎯 Verify uv configuration\r\nuv tree\r\n```\r\n\r\n### 🆘 Getting Help\r\n\r\n- 🐛 **Issues**: Check our [GitHub Issues](https://github.com/bjoernbethge/torch-cuda/issues)\r\n- 📚 **Documentation**: [PyTorch Official Docs](https://pytorch.org/docs/)\r\n- 💬 **Community**: [PyTorch Forums](https://discuss.pytorch.org/)\r\n- 📧 **Contact**: [bjoern.bethge@gmail.com](mailto:bjoern.bethge@gmail.com)\r\n\r\n---\r\n\r\n## 📄 License\r\n\r\nThis project is licensed under the **MIT License** - see the [LICENSE](LICENSE) file for details.\r\n\r\n---\r\n\r\n## 🙏 Acknowledgments\r\n\r\n- 🔥 **PyTorch Team** - For creating the most amazing deep learning framework\r\n- 🎮 **NVIDIA** - For CUDA toolkit and GPU computing revolution  \r\n- ⚡ **Astral Team** - For the blazing-fast `uv` package manager\r\n- 📊 **Polars Team** - For lightning-fast data processing\r\n- 🌟 **Open Source Community** - For continuous inspiration and collaboration\r\n\r\n---\r\n\r\n## 📞 Connect \u0026 Links\r\n\r\n\u003cdiv align=\"center\"\u003e\r\n\r\n[![GitHub](https://img.shields.io/badge/GitHub-100000?style=for-the-badge\u0026logo=github\u0026logoColor=white)](https://github.com/bjoernbethge/torch-cuda)\r\n[![Email](https://img.shields.io/badge/Email-D14836?style=for-the-badge\u0026logo=gmail\u0026logoColor=white)](mailto:bjoern.bethge@gmail.com)\r\n\r\n**Made with ❤️ and ⚡ GPU acceleration**\r\n\r\n\u003c/div\u003e\r\n\r\n---\r\n\r\n\u003cdiv align=\"center\"\u003e\r\n\u003csub\u003eBuilt with 🔥 PyTorch • Accelerated by ⚡ CUDA • Powered by 🚀 uv \u0026 Modern Python\u003c/sub\u003e\r\n\u003c/div\u003e\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsynapticore-io%2Ftorch-cuda","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsynapticore-io%2Ftorch-cuda","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsynapticore-io%2Ftorch-cuda/lists"}