{"id":27510619,"url":"https://github.com/mysterionrise/tensorflow-metal-experiments","last_synced_at":"2026-05-19T09:08:11.377Z","repository":{"id":30984215,"uuid":"34542523","full_name":"MysterionRise/tensorflow-metal-experiments","owner":"MysterionRise","description":"Example of training NN based on Tensorflow Metal using ARM M chips from Apple ","archived":false,"fork":false,"pushed_at":"2026-02-04T13:21:48.000Z","size":7488,"stargazers_count":4,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"master","last_synced_at":"2026-02-05T00:48:10.571Z","etag":null,"topics":["gpu","m1","m1-mac","m2","m2-mac","m3-mac","neural-network","neural-networks","tensorflow","tensorflow-gpu","tensorflow-tutorials"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/MysterionRise.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2015-04-24T21:22:04.000Z","updated_at":"2026-02-04T13:21:52.000Z","dependencies_parsed_at":"2024-05-15T13:47:08.433Z","dependency_job_id":"0b59a34f-2cd5-4edd-8c78-e94c77c46ea2","html_url":"https://github.com/MysterionRise/tensorflow-metal-experiments","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/MysterionRise/tensorflow-metal-experiments","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MysterionRise%2Ftensorflow-metal-experiments","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MysterionRise%2Ftensorflow-metal-experiments/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MysterionRise%2Ftensorflow-metal-experiments/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MysterionRise%2Ftensorflow-metal-experiments/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/MysterionRise","download_url":"https://codeload.github.com/MysterionRise/tensorflow-metal-experiments/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MysterionRise%2Ftensorflow-metal-experiments/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":33209543,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-19T07:54:09.561Z","status":"ssl_error","status_checked_at":"2026-05-19T07:54:08.508Z","response_time":58,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["gpu","m1","m1-mac","m2","m2-mac","m3-mac","neural-network","neural-networks","tensorflow","tensorflow-gpu","tensorflow-tutorials"],"created_at":"2025-04-17T22:31:22.697Z","updated_at":"2026-05-19T09:08:11.371Z","avatar_url":"https://github.com/MysterionRise.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# TensorFlow Metal Experiments\n\n[![Python 3.11+](https://img.shields.io/badge/python-3.11+-blue.svg)](https://www.python.org/downloads/)\n[![TensorFlow 2.18+](https://img.shields.io/badge/tensorflow-2.18+-orange.svg)](https://www.tensorflow.org/)\n[![License: MIT](https://img.shields.io/badge/License-MIT-green.svg)](https://opensource.org/licenses/MIT)\n\nBenchmarking GPU vs CPU training performance across Apple Silicon, NVIDIA GPUs, and Intel CPUs using TensorFlow Metal and MLX.\n\n## Key Findings\n\n![VGG16 Benchmark](assets/vgg16_benchmark.png)\n\n**TL;DR: For large models, GPU acceleration provides 17x speedup on Apple Silicon and up to 120x on NVIDIA.**\n\n| Hardware | GPU Cores | VGG16 (s/epoch) | Speedup vs i7-8700 |\n|----------|-----------|-----------------|-------------------|\n| RTX 4070 Super | 7168 CUDA | 7s | 123x |\n| RTX 2070 | 2304 CUDA | 18s | 48x |\n| M1 Max | 32 GPU | 21s | 41x |\n| M4 Pro | 16 GPU | 26s | 33x |\n| M2 | 10 GPU | 64s | 13x |\n| i7-13700KF | - | 126s | 7x |\n| M1 Max (CPU only) | - | 368s | 2.3x |\n| i7-8700 | - | 863s | 1x (baseline) |\n\n### Apple Silicon GPU Speedup\n\n- **M1 Max**: 17.5x faster with Metal GPU vs CPU-only\n- **M2**: 8.3x faster with Metal GPU vs CPU-only\n- **M4 Pro**: See MLX vs TensorFlow comparison below\n\n## Project Structure\n\n```\ntensorflow-metal-experiments/\n├── notebooks/\n│   ├── tf_mnist_train.ipynb        # Simple CNN (93k params)\n│   ├── tf_fashion_mnist_train.ipynb # CNN with dropout (412k params)\n│   ├── tf_cifar100-train.ipynb     # VGG16-style (34M params)\n│   ├── mlx_comparison.ipynb        # MLX vs TensorFlow Metal (naive)\n│   ├── optimized_benchmark.ipynb   # Naive vs Optimized comparison\n│   └── benchmark_report.ipynb      # Generate benchmark charts\n├── src/utils/\n│   └── device_config.py            # Reusable GPU/CPU configuration\n├── benchmarks/\n│   └── results.json                # Structured benchmark data\n└── assets/\n    └── vgg16_benchmark.png         # Benchmark visualization\n```\n\n## Installation\n\n### Prerequisites: Install Python (macOS)\n\nIf you don't have Python installed, use Homebrew:\n\n```bash\n# Install Homebrew (if not installed)\n/bin/bash -c \"$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)\"\n\n# Install Python 3.11+\nbrew install python@3.11\n\n# Verify installation\npython3.11 --version\n```\n\n### Apple Silicon Setup (M1/M2/M3/M4)\n\n```bash\n# Navigate to project directory\ncd tensorflow-metal-experiments\n\n# Create virtual environment\npython3.11 -m venv venv\n\n# Activate virtual environment\nsource venv/bin/activate\n\n# Upgrade pip\npip install --upgrade pip\n\n# Install dependencies (TF 2.18 is required for Metal compatibility)\npip install \"tensorflow\u003e=2.18,\u003c2.19\" tensorflow-metal mlx\npip install matplotlib seaborn pandas numpy jupyterlab\n\n# Verify TensorFlow sees the GPU\npython -c \"import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))\"\n# Should show: [PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]\n\n# Verify MLX\npython -c \"import mlx.core as mx; print(mx.default_device())\"\n# Should show: gpu\n```\n\n### Windows with NVIDIA GPU (WSL2)\n\n```bash\n# Create and activate venv\npython -m venv venv\nsource venv/bin/activate  # or: venv\\Scripts\\activate on Windows\n\n# Install dependencies\npip install tensorflow[and-cuda]\npip install matplotlib seaborn pandas numpy jupyterlab\n```\n\n### Run Experiments\n\n```bash\n# Make sure venv is activated\nsource venv/bin/activate\n\n# Start Jupyter\njupyter lab\n```\n\nOpen any notebook in `notebooks/` and run all cells.\n\n### Deactivate Environment\n\n```bash\ndeactivate\n```\n\n## Switching Between GPU and CPU\n\nEach notebook uses a device configuration helper:\n\n```python\nfrom utils.device_config import configure_device\n\n# Use GPU (Metal or CUDA)\ndevice = configure_device(use_gpu=True)\n\n# Force CPU only\ndevice = configure_device(use_gpu=False)\n```\n\n## Benchmarks\n\n### VGG16 on CIFAR-100 (34M Parameters)\n\nThis is the primary benchmark. Large models show the most significant GPU acceleration.\n\n| Hardware | Platform | GPU | Time/Epoch |\n|----------|----------|-----|------------|\n| RTX 4070 Super 12GB | Windows 11 | Yes | 7s |\n| RTX 2070 8GB | Windows 10 | Yes | 18s |\n| M1 Max 32-core GPU | macOS | Yes | 21s |\n| M2 10-core GPU | macOS | Yes | 64s |\n| i7-13700KF 3.4GHz | Windows 11 | No | 126s |\n| M1 Max 10-core CPU | macOS | No | 368s |\n| M2 8-core CPU | macOS | No | 528s |\n| i9 2.4GHz 8-core | macOS | No | 630s |\n| i7-8700 3.2GHz | Windows 10 | No | 863s |\n\n### Small Model Caveat\n\nFor small models (MNIST CNN, 93k params), CPU can sometimes match or beat GPU due to data transfer overhead. GPU acceleration is most beneficial for:\n- Models \u003e 1M parameters\n- Batch sizes \u003e= 64\n- Training runs with many epochs\n\n## Performance Optimization\n\n### Why GPU Utilization May Be Low (~40%)\n\nIf you observe low GPU utilization during training, these are the common causes:\n\n1. **NumPy array bottleneck** - Using `model.fit(x_train, y_train)` with NumPy arrays is a major bottleneck\n2. **Small batch sizes** - GPU dispatch overhead doesn't amortize for small batches\n3. **Model too small** - GPU parallelism not fully utilized for models \u003c 1M params\n4. **Data loading on CPU** - Pipeline not optimized for GPU\n\n### Optimization Tips\n\n1. **Use tf.data.Dataset API** instead of NumPy arrays:\n   ```python\n   # Instead of: model.fit(x_train, y_train)\n   dataset = tf.data.Dataset.from_tensor_slices((x_train, y_train))\n   dataset = dataset.batch(128).prefetch(tf.data.AUTOTUNE)\n   model.fit(dataset)\n   ```\n   This can achieve up to 5x acceleration and better GPU utilization.\n\n2. **Increase batch sizes** - Apple's unified memory allows larger batches (try 256, 512) without CPU-GPU transfer overhead\n\n3. **Use mixed precision** where supported:\n   ```python\n   tf.keras.mixed_precision.set_global_policy('mixed_float16')\n   ```\n\n4. **Monitor GPU power** to verify GPU is being utilized:\n   ```bash\n   sudo powermetrics --samplers gpu_power -i1000 -n1\n   ```\n\n5. **For MLX**: Use `mx.eval()` strategically to control lazy evaluation\n\nRun `notebooks/optimized_benchmark.ipynb` to see the impact of these optimizations with real benchmarks comparing naive vs optimized implementations for both TensorFlow and MLX.\n\n## MLX vs TensorFlow Metal\n\nThe `mlx_comparison.ipynb` notebook benchmarks Apple's MLX framework against TensorFlow Metal.\n\n### M4 Pro Benchmark Results\n\nBenchmarked on **M4 Pro (16-core GPU, 48GB RAM)** - Naive vs Optimized:\n\n| Model | Params | TF Naive | TF Optimized | MLX Naive | MLX Optimized | Best |\n|-------|--------|----------|--------------|-----------|---------------|------|\n| MNIST CNN | 93K | 77.2s | 24.8s | 16.4s | **11.6s** | MLX Opt |\n| Fashion CNN | 412K | 95.3s | 28.2s | 28.0s | **24.1s** | MLX Opt |\n\n### Optimization Impact\n\n| Framework | Optimization | MNIST Speedup | Fashion Speedup |\n|-----------|--------------|---------------|-----------------|\n| TensorFlow | tf.data + batch=256 | **3.1x faster** | **3.4x faster** |\n| MLX | eval per epoch + batch=256 | 1.4x faster | 1.2x faster |\n\n**Key Insights**:\n- **TensorFlow benefits most from optimization** - tf.data.Dataset provides 3x+ speedup\n- **MLX is fast out of the box** - Already optimized, less room for improvement\n- **MLX wins for small/medium models** - Even optimized TensorFlow can't catch up\n\n### When to Use Each Framework\n\n**When to use MLX:**\n- Small-to-medium models (\u003c 10M parameters) - fastest option\n- Rapid prototyping on Apple Silicon\n- Apple-native applications (Core ML integration)\n- When you want good performance without optimization work\n\n**When to use TensorFlow Metal:**\n- Cross-platform deployment requirements\n- Access to TensorFlow Hub / Keras ecosystem\n- Production pipelines with TensorFlow Serving\n- When you'll invest in tf.data optimization\n\n## Methodology\n\n- All benchmarks run 3 times, median reported\n- System was idle during benchmarks (no background tasks)\n- Same model architecture across all hardware\n- Data loading time excluded from measurements\n- Batch sizes kept consistent (64 for MNIST, 128 for CIFAR-100)\n\n## Contributing\n\n1. Run benchmarks on your hardware\n2. Add results to `benchmarks/results.json`\n3. Run `notebooks/benchmark_report.ipynb` to regenerate charts\n4. Submit a pull request\n\n## License\n\nMIT\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmysterionrise%2Ftensorflow-metal-experiments","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmysterionrise%2Ftensorflow-metal-experiments","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmysterionrise%2Ftensorflow-metal-experiments/lists"}