{"id":34029892,"url":"https://github.com/maifeeulasad/accelera","last_synced_at":"2026-04-08T12:02:19.038Z","repository":{"id":317688428,"uuid":"1068431290","full_name":"maifeeulasad/Accelera","owner":"maifeeulasad","description":"A framework for performing large matrix operations on memory-constrained GPUs through intelligent chunking and CPU-GPU memory management.","archived":false,"fork":false,"pushed_at":"2025-10-04T09:33:02.000Z","size":90,"stargazers_count":3,"open_issues_count":1,"forks_count":1,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-12-21T22:54:06.282Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"https://pypi.org/project/accelera/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/maifeeulasad.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-10-02T11:26:43.000Z","updated_at":"2025-10-10T18:37:32.000Z","dependencies_parsed_at":"2025-10-02T13:30:00.832Z","dependency_job_id":"2197886a-bd0f-4f46-b01e-b11de03bbdc5","html_url":"https://github.com/maifeeulasad/Accelera","commit_stats":null,"previous_names":["maifeeulasad/accelera"],"tags_count":4,"template":false,"template_full_name":null,"purl":"pkg:github/maifeeulasad/Accelera","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/maifeeulasad%2FAccelera","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/maifeeulasad%2FAccelera/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/maifeeulasad%2FAccelera/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/maifeeulasad%2FAccelera/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/maifeeulasad","download_url":"https://codeload.github.com/maifeeulasad/Accelera/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/maifeeulasad%2FAccelera/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31554110,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-08T10:21:54.569Z","status":"ssl_error","status_checked_at":"2026-04-08T10:21:38.171Z","response_time":54,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-12-13T17:57:11.666Z","updated_at":"2026-04-08T12:02:19.015Z","avatar_url":"https://github.com/maifeeulasad.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Accelera - Memory-Efficient Matrix Operations Framework\n\nA framework for performing large matrix operations on memory-constrained GPUs through intelligent chunking and CPU-GPU memory management.\n\n[![Python 3.8+](https://img.shields.io/badge/python-3.8+-blue.svg)](https://www.python.org/downloads/)\n[![PyTorch](https://img.shields.io/badge/PyTorch-2.0+-red.svg)](https://pytorch.org/)\n[![CUDA](https://img.shields.io/badge/CUDA-required-green.svg)](https://developer.nvidia.com/cuda-toolkit)\n\n## 🚀 Problem Statement\n\nWhen working with large matrices on GPUs with limited VRAM, operations like matrix multiplication can cause **Out-of-Memory (OOM) errors**. Accelera solves this by:\n\n- **🧩 Breaking large operations into smaller chunks**\n- **💾 Intelligently offloading intermediate results to CPU/RAM**  \n- **🔄 Dynamically managing GPU memory**\n- **🎯 Providing a seamless API for large matrix operations**\n\n## ✨ Features\n\n- **🤖 Automatic chunking** for matrix operations\n- **🧠 Dynamic memory management** between GPU and CPU\n- **⚡ CUDA-optimized** for NVIDIA GPUs\n- **📊 Configurable chunk sizes** based on available VRAM\n- **📈 Progress tracking** for long-running operations\n- **📋 Memory usage monitoring**\n- **🔌 Multiple input types** (PyTorch tensors, NumPy arrays)\n\n## 🏃‍♂️ Quick Start\n\n```python\nimport accelera as acc\n\n# Initialize with automatic VRAM detection\nengine = acc.MatrixEngine(auto_detect_memory=True)\n\n# Perform large matrix multiplication that might cause OOM on small GPUs\nA = acc.Matrix.random((10000, 8000))  # 10k x 8k matrix (~305 MB)\nB = acc.Matrix.random((8000, 12000))  # 8k x 12k matrix (~366 MB)\n\n# This will automatically chunk and manage memory\nC = engine.matmul(A, B)  # Result: 10k x 12k matrix (~458 MB)\n\nprint(f\"✅ Success! Result shape: {C.shape}\")\n```\n\n### 🎯 Real-world Example\n\n```python\n# Scenario: Training a large neural network layer on a 4GB GPU\nimport accelera as acc\n\nengine = acc.MatrixEngine()\n\n# Large weight matrix (would normally cause OOM)\nweights = acc.Matrix.randn((20000, 15000))  # ~1.1 GB\ninputs = acc.Matrix.randn((15000, 8000))    # ~457 MB\n\n# Forward pass - automatically chunked if needed\noutput = engine.matmul(weights, inputs)     # ~610 MB result\n\n# Check memory usage\nmemory_info = engine.get_memory_info()\nprint(f\"GPU utilization: {memory_info['gpu_utilization']:.1f}%\")\n```\n\n## 📦 Installation\n\n### Requirements\n\n- **Python 3.8+**\n- **PyTorch 2.0+** with CUDA support\n- **NVIDIA GPU** with CUDA drivers\n- **Sufficient CPU RAM** for temporary storage\n\n### Install\n\n```bash\n# Clone the repository\ngit clone https://github.com/maifeeulasad/accelera\ncd accelera\n\n# Install dependencies\npip install -r requirements.txt\n\n# Install in development mode\npip install -e .\n\n# Verify installation\nmake verify\n```\n\n## 🛠️ Usage Examples\n\n### Basic Operations\n\n```python\nimport accelera as acc\nimport numpy as np\n\n# Initialize engine\nengine = acc.MatrixEngine(auto_detect_memory=True, enable_progress=True)\n\n# Matrix multiplication\nA = acc.Matrix.randn((5000, 4000))\nB = acc.Matrix.randn((4000, 6000))\nC = engine.matmul(A, B)\n\n# Element-wise operations\nX = acc.Matrix.randn((3000, 4000))\nY = acc.Matrix.randn((3000, 4000))\n\n# Addition\nZ1 = engine.add(X, Y)\n\n# Element-wise multiplication  \nZ2 = engine.multiply(X, Y)\n\n# Works with NumPy arrays and PyTorch tensors too!\nA_np = np.random.randn(1000, 800).astype(np.float32)\nB_np = np.random.randn(800, 1200).astype(np.float32)\nC_from_numpy = engine.matmul(A_np, B_np)\n```\n\n### Advanced Configuration\n\n```python\n# Custom chunking strategy\nengine = acc.MatrixEngine(\n    chunking_strategy='adaptive',  # 'row', 'tile', 'adaptive'\n    chunk_size=1024,               # Manual chunk size\n    enable_progress=True           # Show progress bars\n)\n\n# Manual memory management\nengine.set_chunk_size(512)                    # Smaller chunks for limited memory\nengine.enable_auto_memory_detection(False)    # Disable auto-detection\nengine.cleanup()                              # Force GPU memory cleanup\n\n# Memory monitoring\nmemory_info = engine.get_memory_info()\nprint(f\"GPU Memory: {memory_info['gpu_available_gb']:.2f}GB available\")\nprint(f\"CPU Memory: {memory_info['cpu_available_gb']:.2f}GB available\")\n```\n\n## 📊 Performance Comparison\n\nRun the benchmark to see how Accelera performs on your system:\n\n```bash\n# Run full benchmark suite\nmake benchmark\n\n# Test specific matrix size\npython examples/benchmark.py --custom-size 4000 3000 5000\n\n# Quick demo\nmake demo\n```\n\n## 📁 Project Structure\n\n```\naccelera/\n├── accelera/                  # Core framework\n│   ├── __init__.py            # Main package exports\n│   ├── engine.py              # MatrixEngine - main API\n│   ├── matrix.py              # Matrix wrapper class\n│   ├── memory_manager.py      # GPU/CPU memory management\n│   ├── chunking.py            # Chunking strategies\n│   └── config.py              # Configuration and logging\n├── examples/                  # Usage examples\n│   ├── basic_usage.py         # Basic operations demo\n│   ├── advanced_usage.py      # Advanced features demo\n│   └── benchmark.py           # Performance benchmarking\n├── tests/                     # Unit tests\n│   └── test_accelera.py       # Comprehensive test suite\n├── DOCUMENTATION.md           # Detailed documentation\n├── requirements.txt           # Python dependencies\n├── setup.py                   # Package setup\n└── Makefile                   # Development commands\n```\n\n## 🧪 Running Examples\n\n```bash\n# Basic usage example\npython examples/basic_usage.py\n\n# Advanced features demonstration\npython examples/advanced_usage.py\n\n# Performance benchmarking\npython examples/benchmark.py\n\n# Or use make commands\nmake examples\nmake benchmark\n```\n\n## 🔧 Development\n\n```bash\n# Install development dependencies\nmake dev-install\n\n# Run tests\nmake test\n\n# Run linting\nmake lint\n\n# Format code\nmake format\n\n# Clean build artifacts\nmake clean\n```\n\n## 📖 Documentation\n\n- **[Complete Documentation](DOCUMENTATION.md)** - Detailed API reference and usage guide\n- **[Examples](examples/)** - Practical usage examples  \n- **[Tests](tests/)** - Unit tests and integration tests\n\n## 🎯 Use Cases\n\n- **🧠 Deep Learning**: Training large neural networks on consumer GPUs\n- **🔬 Scientific Computing**: Large matrix operations in research\n- **📊 Data Processing**: Batch processing of large datasets\n- **🎮 Computer Graphics**: Large transformation matrices\n- **📈 Financial Modeling**: Risk calculations with large covariance matrices\n\n## ⚠️ System Requirements\n\n- **NVIDIA GPU** (optional)\n- **CUDA** (not sure about minimum version)\n\n## 🤝 Contributing\n\nFollowing the guidelines in [`claude.md`](claude.md):\n\n1. **Fork the repository**\n2. **Create a feature branch**: `git checkout -b feature-name`\n3. **Follow the coding standards**: Small commits, clear intent, boring solutions\n4. **Add tests** for new functionality\n5. **Submit a pull request** with clear description\n\n## 📄 License\n\nThis project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.\n\n## 🙏 Acknowledgments\n\n- **PyTorch** team for the excellent tensor library\n- **NVIDIA** for CUDA and GPU computing\n- **Community** feedback and contributions\n\n---\n\n**💡 Pro Tip**: Start with the basic example, then explore advanced features. The framework is designed to be simple by default but powerful when needed!\n\n```python\n# Get started in 3 lines\nimport accelera as acc\nengine = acc.MatrixEngine()\nresult = engine.matmul(large_matrix_A, large_matrix_B)\n```","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmaifeeulasad%2Faccelera","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmaifeeulasad%2Faccelera","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmaifeeulasad%2Faccelera/lists"}