{"id":46103114,"url":"https://github.com/cai4cai/torchsparsegradutils","last_synced_at":"2026-03-01T20:04:56.527Z","repository":{"id":63630899,"uuid":"556799932","full_name":"cai4cai/torchsparsegradutils","owner":"cai4cai","description":"A collection of utility functions to work with PyTorch sparse tensors","archived":false,"fork":false,"pushed_at":"2026-01-12T17:32:34.000Z","size":21850,"stargazers_count":34,"open_issues_count":6,"forks_count":4,"subscribers_count":2,"default_branch":"main","last_synced_at":"2026-01-12T23:02:52.615Z","etag":null,"topics":["deep-learning","pytorch","sparse-ditributions","sparse-linear-systems","sparse-matrix"],"latest_commit_sha":null,"homepage":"https://torchsparsegradutils.readthedocs.io/en/latest/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/cai4cai.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2022-10-24T14:35:31.000Z","updated_at":"2026-01-12T17:32:38.000Z","dependencies_parsed_at":"2023-12-21T12:07:10.140Z","dependency_job_id":"b586e8eb-0dee-4174-8356-17c4ef3f072a","html_url":"https://github.com/cai4cai/torchsparsegradutils","commit_stats":null,"previous_names":[],"tags_count":6,"template":false,"template_full_name":null,"purl":"pkg:github/cai4cai/torchsparsegradutils","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cai4cai%2Ftorchsparsegradutils","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cai4cai%2Ftorchsparsegradutils/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cai4cai%2Ftorchsparsegradutils/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cai4cai%2Ftorchsparsegradutils/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/cai4cai","download_url":"https://codeload.github.com/cai4cai/torchsparsegradutils/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cai4cai%2Ftorchsparsegradutils/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":29983122,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-03-01T16:35:47.903Z","status":"ssl_error","status_checked_at":"2026-03-01T16:35:44.899Z","response_time":124,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["deep-learning","pytorch","sparse-ditributions","sparse-linear-systems","sparse-matrix"],"created_at":"2026-03-01T20:04:55.915Z","updated_at":"2026-03-01T20:04:56.518Z","avatar_url":"https://github.com/cai4cai.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# torchsparsegradutils: Sparsity-preserving gradient utility tools for PyTorch\n\n[![PyPI](https://img.shields.io/pypi/v/torchsparsegradutils.svg)](https://pypi.org/project/torchsparsegradutils/) [![Python Versions](https://img.shields.io/pypi/pyversions/torchsparsegradutils.svg)](https://pypi.org/project/torchsparsegradutils/) [![Downloads](https://img.shields.io/pypi/dm/torchsparsegradutils.svg)](https://pypi.org/project/torchsparsegradutils/) ![PyTorch 2.5+](https://img.shields.io/badge/PyTorch-2.5%2B-ee4c2c?logo=pytorch) ![Tested 2.5 / 2.9 / nightly](https://img.shields.io/badge/Tested-2.5%20|%202.9%20|%20nightly-ee4c2c?logo=pytorch) [![Build](https://github.com/cai4cai/torchsparsegradutils/actions/workflows/python-package.yml/badge.svg)](https://github.com/cai4cai/torchsparsegradutils/actions/workflows/python-package.yml) [![Docs](https://readthedocs.org/projects/torchsparsegradutils/badge/?version=latest)](https://readthedocs.org/projects/torchsparsegradutils) [![Code Style: Black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black) [![License](https://img.shields.io/github/license/cai4cai/torchsparsegradutils)](LICENSE) [![status](https://joss.theoj.org/papers/6da0e92488d06f70c0a03d0a7cbfba7d/status.svg)](https://joss.theoj.org/papers/6da0e92488d06f70c0a03d0a7cbfba7d)\n\nA comprehensive collection of utility functions to work with PyTorch sparse tensors, ensuring memory efficiency and supporting various sparsity-preserving tensor operations with automatic differentiation. This package addresses fundamental gaps in PyTorch's sparse tensor ecosystem, providing essential operations that preserve sparsity in gradients during backpropagation.\n\n## 🚀 Key Features\n\n### Core Sparse Operations with Sparse Gradient Support\n\n**Memory-Efficient Sparse Matrix Multiplication**\n- `sparse_mm`: Memory-efficient sparse matrix multiplication with batch support\n- Preserves sparsity in gradients during backpropagation\n- Workaround for [PyTorch issue #41128](https://github.com/pytorch/pytorch/issues/41128)\n- Supports both COO and CSR formats with optional batching\n\n**Sparse Linear System Solvers**\n- `sparse_triangular_solve`: Sparse triangular solver with batch support\n  -  Discussion reference: [PyTorch issue #87358](https://github.com/pytorch/pytorch/issues/87358)\n- `sparse_generic_solve`: Generic sparse linear solver with pluggable backends\n  - Tested and benchmarked with CG, BICGSTAB, LSMR and MINRES solvers\n\n- `sparse_solve_c4t`: Wrappers around [cupy sparse solvers](https://docs.cupy.dev/en/stable/reference/scipy_sparse_linalg.html#solving-linear-problems)\n  -  Discussion reference: [Pytorch issue #69538](https://github.com/pytorch/pytorch/issues/69538)\n  - Tested and benchmarked with: [CG](https://docs.cupy.dev/en/v9.6.0/reference/generated/cupyx.scipy.sparse.linalg.cg.html), [CGS](https://docs.cupy.dev/en/stable/reference/generated/cupyx.scipy.sparse.linalg.cgs.html#cupyx.scipy.sparse.linalg.cgs), [MINRES](https://docs.cupy.dev/en/stable/reference/generated/cupyx.scipy.sparse.linalg.minres.html#cupyx.scipy.sparse.linalg.minres), [GMRES](https://docs.cupy.dev/en/stable/reference/generated/cupyx.scipy.sparse.linalg.gmres.html#cupyx.scipy.sparse.linalg.gmres), [spsolve](https://docs.cupy.dev/en/stable/reference/generated/cupyx.scipy.sparse.linalg.spsolve.html#cupyx.scipy.sparse.linalg.spsolve) and [spsolve_triangular](https://docs.cupy.dev/en/stable/reference/generated/cupyx.scipy.sparse.linalg.spsolve_triangular.html#cupyx.scipy.sparse.linalg.spsolve_triangular) CuPy solvers\n- `tsgujax.sparse_solve_j4t`: Wrappers around [jax sparse solvers](https://jax.readthedocs.io/en/latest/jax.scipy.html#module-jax.scipy.sparse.linalg)\n  - Tested with: CG and BICGSTAB JAX solvers\n- `sparse_generic_lstsq`: Generic sparse linear least-squares solver\n\n### Built-in Iterative Solvers (No External Dependencies)\n\n**Pure PyTorch Implementations**\n- **BICGSTAB**: Biconjugate Gradient Stabilized method (ported from [pykrylov](https://github.com/PythonOptimizers/pykrylov))\n- **CG**: Conjugate Gradient method (ported from [cornellius-gp/linear_operator](https://github.com/cornellius-gp/linear_operator))\n- **LSMR**: Least Squares Minimal Residual method (ported from [pytorch-minimize](https://github.com/rfeinman/pytorch-minimize))\n- **MINRES**: Minimal Residual method (ported from [cornellius-gp/linear_operator](https://github.com/cornellius-gp/linear_operator))\n\n### Sparse Multivariate Normal Distributions\n\n- **SparseMultivariateNormal**: Structured Gaussian Distribution\n  - Implements reparameterised sampling (rsample)\n  - Supports leading batch dimension\n  - Supports COO and CSR sparse tensors\n  - Covariance or precision matrices with LL^T or LDL^T parameterisations.\n  - LDL^T parameterization offers numerical stability without SPD constraints\n- **SparseMultivariateNormalNative**:\n  - Implements reparameterised sampling (rsample)\n  - Uses native `torch.sparse.mm` only\n  - Only supports ubatched CSR tensors\n  - Covariance LL^T parameterization\n\n### Spatial Encoding Tools\n\n**Pairwise Encoder**\n- Encode local neighborhood relationships in nD spatial volumes\n- Multi-channel/class support\n- Configurable neighborhood radius and sparsity patterns\n- Outputs sparse unbatched/batched COO or CSR matrices for downstream processing\n- Optimised for medical imaging and volumetric data applications\n\n### Graph Neural Network Operations\n\n**Indexed Matrix Multiplication**\n- `segment_mm`: Segmented matrix multiplication compatible with DGL/PyG\n- `gather_mm`: Gather-based matrix multiplication for graph operations\n- Pure PyTorch implementations as alternatives to [`dgl.ops.segment_mm`](https://docs.dgl.ai/generated/dgl.ops.segment_mm.html), [`pyg_lib.ops.segment_matmul`](https://pyg-lib.readthedocs.io/en/latest/modules/ops.html#pyg_lib.ops.segment_matmul), and [`dgl.ops.gather_mm`](https://docs.dgl.ai/generated/dgl.ops.gather_mm.html)\n- Supports PyTorch \u003e= 2.4 with nested tensor operations\n\n\n\n## 🛠️ Installation\n\n### Basic Installation\n\nThe package can be installed using pip:\n\n```bash\npip install torchsparsegradutils\n```\n\n### Development Installation\n\nFor the latest features and development work:\n\n```bash\npip install git+https://github.com/cai4cai/torchsparsegradutils\n```\n\n### Optional Dependencies\n\nFor full functionality, install optional dependencies:\n\n```bash\n# For CuPy sparse solver support (GPU acceleration)\npip install cupy-cuda12x  # Replace with your CUDA version\n\n# For JAX sparse solver support\npip install \"jax[cpu]\"     # CPU version\npip install \"jax[cuda12]\"  # GPU version (replace with your CUDA version)\n\n# For benchmarking and testing\npip install scipy matplotlib pandas tqdm pytest\n```\n\n### Requirements\n\n- **Python**: ≥ 3.10\n- **PyTorch**: ≥ 2.5 (≥ 2.4 for indexed operations)\n- **Operating Systems**: Linux, macOS, Windows\n- **Hardware**: CPU and CUDA GPU support\n\n\n## 📊 Performance Benchmarks\n\nOur comprehensive benchmark suite demonstrates significant performance improvements across various sparse operations. All benchmarks were conducted on an NVIDIA GeForce RTX 4090 with PyTorch 2.9.0+cu130. Benchmarks are performed using [Rothberg/cfd2](https://suitesparse-collection-website.herokuapp.com/Rothberg/cfd2) matrix from [SuiteSparse Matrix Collection](https://suitesparse-collection-website.herokuapp.com/)\n\n![Sparse MM Suite Performance (int32/float32 COO)](torchsparsegradutils/benchmarks/benchmark_visualizations/sparse_mm_suite_performance_int32_float32_coo.png)\n\n![Sparse Triangular Solve Suite Performance (int32/float32 COO)](torchsparsegradutils/benchmarks/benchmark_visualizations/triangular_solve_suitesparse_performance_int32_float32_coo.png)\n\n![Sparse Genertic Solve Suite Performance (int32/float32 COO)](torchsparsegradutils/benchmarks/benchmark_visualizations/sparse_solve_suite_performance_int32_float32_coo.png)\n\n## 🚀 Quick Start\n\n### Basic Sparse Matrix Multiplication\n\n```python\nimport torch\nfrom torchsparsegradutils import sparse_mm\n\n# Create sparse matrix in COO format\nindices = torch.tensor([[0, 1, 1], [2, 0, 2]], dtype=torch.int64)\nvalues = torch.tensor([3., 4., 5.], requires_grad=True)\nA = torch.sparse_coo_tensor(indices, values, (2, 3))\n\n# Dense matrix\nB = torch.randn(3, 4, requires_grad=True)\n\n# Memory-efficient sparse matrix multiplication with gradient support\nC = sparse_mm(A, B)\nloss = C.sum()\nloss.backward()  # Gradients preserved in sparse format\n\nprint(f\"A.grad: {A.grad}\")  # Sparse gradient\nprint(f\"B.grad: {B.grad}\")  # Dense gradient\n```\n\n### Sparse Linear System Solving\n\n```python\nimport torch\nfrom torchsparsegradutils import sparse_triangular_solve, sparse_generic_solve\nfrom torchsparsegradutils.utils import linear_cg\n\n# Create sparse triangular matrix\nA = create_sparse_triangular_matrix()  # Your sparse CSR matrix\nb = torch.randn(A.shape[0], requires_grad=True)\n\n# Triangular solve (fast for triangular systems)\nx1 = sparse_triangular_solve(A, b, upper=False)\n\n# Generic solve with different backends\nx2 = sparse_generic_solve(A, b, solve=linear_cg, tol=1e-6)\n\n# Using CuPy backend (if available)\nfrom torchsparsegradutils.cupy import sparse_solve_c4t\nx3 = sparse_solve_c4t(A, b, solve=\"cg\", tol=1e-6)\n```\n\n### Sparse Multivariate Normal Distribution\n\n```python\nimport torch\nfrom torchsparsegradutils.distributions import SparseMultivariateNormal\nfrom torchsparsegradutils.utils.random_sparse import rand_sparse_tri\n\n# Create parameters\nbatch_size, event_size = 2, 1000\nloc = torch.zeros(batch_size, event_size)\n\n# Example 1: LDL^T parameterization (numerically stable for precision matrices)\n# Create sparse lower triangular matrix (unit triangular, no diagonal)\nscale_tril = rand_sparse_tri(\n    (batch_size, event_size, event_size),\n    nnz=5000,  # 5000 non-zeros for 1M parameters (0.5% sparsity)\n    layout=torch.sparse_csr,\n    upper=False,\n    unit_triangular=True  # Unit triangular for LDL^T\n)\n\n# Diagonal component for LDL^T parameterization\ndiagonal = torch.ones(batch_size, event_size) * 0.5\n\n# Create distribution with LDL^T parameterization\ndist_ldlt = SparseMultivariateNormal(\n    loc=loc,\n    diagonal=diagonal,\n    scale_tril=scale_tril  # Unit lower triangular\n)\n\n# Example 2: LL^T parameterization (standard Cholesky)\nscale_tril_chol = rand_sparse_tri(\n    (batch_size, event_size, event_size),\n    nnz=5000,\n    layout=torch.sparse_csr,\n    upper=False,\n    unit_triangular=False  # Include diagonal for LL^T\n)\n\n# Create distribution with LL^T parameterization\ndist_chol = SparseMultivariateNormal(\n    loc=loc,\n    scale_tril=scale_tril_chol  # Lower triangular with diagonal\n)\n\n# Example 3: Precision matrix parameterization (more stable with LDL^T)\nprecision_tril = rand_sparse_tri(\n    (batch_size, event_size, event_size),\n    nnz=5000,\n    layout=torch.sparse_csr,\n    upper=False,\n    unit_triangular=True\n)\n\nprecision_diagonal = torch.ones(batch_size, event_size) * 2.0\n\ndist_precision = SparseMultivariateNormal(\n    loc=loc,\n    diagonal=precision_diagonal,\n    precision_tril=precision_tril  # Unit triangular precision factor\n)\n\n# Sample with gradient support\nsamples = dist_ldlt.rsample((100,))  # 100 samples\n\n# Gradient computation preserves sparsity\nloss = samples.sum()\nloss.backward()\nprint(f\"Sparse gradient shape: {scale_tril.grad.shape}\")\nprint(f\"Sparse gradient nnz: {scale_tril.grad._nnz()}\")\nprint(f\"Using LDL^T parameterization: {dist_ldlt.is_ldlt_parameterization}\")\n```\n\n### Pairwise Voxel Encoding\n\n```python\nimport torch\nfrom torchsparsegradutils.encoders import PairwiseEncoder\n\n# Create 3D volume encoder (channels, height, depth, width)\nvolume_shape = (4, 64, 64, 64)  # 4 channels, 64x64x64 spatial\nencoder = PairwiseEncoder(\n    radius=2.0,\n    volume_shape=volume_shape,\n    layout=torch.sparse_csr\n)\n\n# Generate values for each spatial relationship offset\nnum_offsets = len(encoder.offsets)\nvalues = torch.randn(num_offsets, *volume_shape)\n\n# Generate sparse encoding matrix\nsparse_matrix = encoder(values)\n\nprint(f\"Encoded volume shape: {sparse_matrix.shape}\")\nprint(f\"Sparsity: {sparse_matrix._nnz() / sparse_matrix.numel():.3%}\")\nprint(f\"Number of spatial offsets: {num_offsets}\")\n\n# Use in sparse multivariate normal\nflat_size = 4 * 64 * 64 * 64  # Total flattened size\ndist = SparseMultivariateNormal(\n    loc=torch.zeros(flat_size),\n    scale_tril=sparse_matrix\n)\n```\n\n#### Spatial Relationship Visualization\n\nThe encoder creates sparse matrices that encode pairwise spatial relationships within a specified radius. Different channel relationship types affect how channels interact:\n\n- **`indep`**: Independent channels (only spatial neighbors within same channel)\n- **`intra`**: Intra-channel relationships (spatial neighbors within same channel)\n- **`inter`**: Inter-channel relationships (spatial neighbors across all channels)\n\n**3D Spatial Grid (3×3×3×3) with Different Channel Relations:**\n\n\u003cdiv align=\"center\"\u003e\n\n**Radius = 1.0**\n![Spatial Encodings Radius 1](torchsparsegradutils/tests/test_outputs/sparse_encodings_radius_1.png)\n\n**Radius = 2.0**\n![Spatial Encodings Radius 2](torchsparsegradutils/tests/test_outputs/sparse_encodings_radius_2.png)\n\u003c!--\n**Legend for Spatial Offsets:**\n\u003ctable\u003e\n\u003ctr\u003e\n\u003ctd\u003e\u003cimg src=\"torchsparsegradutils/tests/test_outputs/legend_radius_1.png\" width=\"150\"/\u003e\u003c/td\u003e\n\u003ctd\u003e\u003cimg src=\"torchsparsegradutils/tests/test_outputs/legend_radius_2.png\" width=\"150\"/\u003e\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd align=\"center\"\u003eRadius 1.0 Offsets\u003c/td\u003e\n\u003ctd align=\"center\"\u003eRadius 2.0 Offsets\u003c/td\u003e\n\u003c/tr\u003e\n\u003c/table\u003e --\u003e\n\n\u003c/div\u003e\n\nEach color represents a different spatial offset (relative position) in the 3D neighborhood. The sparse matrix encodes these relationships efficiently, enabling:\n\n- **Local spatial modeling** for volumetric data (medical imaging, 3D computer vision)\n- **Multi-channel feature interaction** in convolutional architectures\n- **Sparse graph construction** from regular grids\n- **Memory-efficient neighborhood encoding** for large volumes\n\n**Key Parameters:**\n- `radius`: Spatial neighborhood radius (1.0 = immediate neighbors, 2.0 = extended neighborhood)\n- `volume_shape`: `(channels, height, depth, width)` for 4D volumes\n- `channel_voxel_relation`: Controls cross-channel connectivity patterns\n- `layout`: Output sparse format (`torch.sparse_coo` or `torch.sparse_csr`)\n\n### Indexed Matrix Operations (Graph Neural Networks)\n\n```python\nimport torch\nfrom torchsparsegradutils import segment_mm, gather_mm\n\n# Segment matrix multiplication (compatible with DGL/PyG)\na = torch.randn(15, 10, requires_grad=True)  # Node features\nb = torch.randn(3, 10, 5, requires_grad=True)  # Edge type embeddings\nseglen_a = torch.tensor([5, 6, 4])  # Segment lengths\n\n# Performs: a[0:5] @ b[0], a[5:11] @ b[1], a[11:15] @ b[2]\nresult = segment_mm(a, b, seglen_a)\n\n# Gather matrix multiplication\nindices = torch.tensor([0, 0, 1, 1, 2])\na_gathered = torch.randn(5, 10, requires_grad=True)\nresult = gather_mm(a_gathered, b, indices)\n```\n\n### Statistical Distribution Validation\n\n```python\nimport torch\nfrom torch.distributions import MultivariateNormal\nfrom torchsparsegradutils.utils import mean_hotelling_t2_test, cov_nagao_test\n\n# Generate sample data from known distribution\ntorch.manual_seed(42)\ntrue_mean = torch.tensor([[0.0, 0.0]])\ntrue_cov = torch.eye(2).unsqueeze(0)\nn = 1000\n\n# Generate samples and compute statistics\ndist = MultivariateNormal(true_mean.squeeze(0), true_cov.squeeze(0))\nsamples = dist.sample((n,)).unsqueeze(1)\nsample_mean = samples.mean(0)\nsample_cov = torch.cov(samples.squeeze(1).T).unsqueeze(0)\n\n# Test if sample mean is consistent with hypothesized mean (should pass)\nresult, t2_stat, threshold = mean_hotelling_t2_test(\n    sample_mean, true_mean, sample_cov, n, confidence_level=0.95\n)\nprint(f\"Mean test passed: {result.item()}\")  # True\n\n# Test if sample covariance is consistent with hypothesized covariance (should pass)\nresult, t_n_stat, threshold = cov_nagao_test(\n    sample_cov, true_cov, n, confidence_level=0.95\n)\nprint(f\"Covariance test passed: {result.item()}\")  # True\n\n# Test against wrong parameters (should fail)\nwrong_mean = true_mean + 1.0  # Significantly different mean\nresult, _, _ = mean_hotelling_t2_test(\n    sample_mean, wrong_mean, sample_cov, n, confidence_level=0.95\n)\nprint(f\"Wrong mean test passed: {result.item()}\")  # False\n```\n\n## 🧪 Testing and Benchmarks\n\n### Running Tests\n\n```bash\n# Run all tests\npython -m pytest\n\n# Run specific test modules\npython -m pytest torchsparsegradutils/tests/test_sparse_matmul.py\npython -m pytest torchsparsegradutils/tests/test_distributions.py\n\n# Run with coverage\npython -m pytest --cov=torchsparsegradutils\n```\n\n### Running Benchmarks\n\nThe package includes comprehensive benchmarks for performance evaluation:\n\n```bash\n# Sparse matrix multiplication benchmarks\npython -m torchsparsegradutils.benchmarks.sparse_mm_rand\npython -m torchsparsegradutils.benchmarks.batched_sparse_mm_rand\n\n# Triangular solver benchmarks\npython -m torchsparsegradutils.benchmarks.sparse_triangular_solve_rand\n\n# Generic solver benchmarks\npython -m torchsparsegradutils.benchmarks.sparse_generic_solve_suite\n\n# SuiteSparse matrix benchmarks\npython -m torchsparsegradutils.benchmarks.sparse_mm_suite\n```\n\nResults are automatically saved to `torchsparsegradutils/benchmarks/results/` as CSV files.\n\n### Utility Functions\n\n#### `torchsparsegradutils.utils.random_sparse`\n\n**Sparse Random Matrix Generators**\n- **`rand_sparse(size, nnz, layout=torch.sparse_coo, **kwargs)`**: Generate random sparse matrices with specified layout and properties\n  - Supports COO and CSR\n  - Supports batch dimension\n- **`rand_sparse_tri(size, nnz, layout=torch.sparse_coo, upper=True, strict=False, **kwargs)`**: Generate random sparse triangular matrices\n  - Supports COO and CSR\n  - Supports batch dimension\n  - Strict triangular (no diagonal) or non-strict (with diagonal values)\n  - Option to produce well conditioned matrices and regulate diagonal values\n\n- **`make_spd_sparse(n, layout, value_dtype, index_dtype, device, sparsity_ratio=0.5, nz=None)`**: Generate sparse symmetric positive definite (SPD) matrices\n\n#### `torchsparsegradutils.utils.utils`\n\n**Sparse Matrix Operations**\n- **`sparse_block_diag(*sparse_tensors)`**: Create block diagonal sparse matrix from multiple sparse tensors\n- **`sparse_block_diag_split(sparse_block_diag_tensor, *shapes)`**: Split block diagonal sparse matrix into original sparse tensors\n- **`sparse_eye(size, layout=torch.sparse_coo, **kwargs)`**: Create batched or unbatched sparse identity matrices\n- **`stack_csr(tensors, dim=0)`**: Stack CSR tensors along batch dimension (like torch.stack for CSR)\n\n**Sparse Format Conversion**\n- **`convert_coo_to_csr_indices_values(coo_indices, num_rows, values=None)`**: Convert COO indices and values to CSR format, with support for batch dimension\n- **`convert_coo_to_csr(sparse_coo_tensor)`**: Convert COO sparse tensor to CSR format with batch support\n\n#### `torchsparsegradutils.utils.dist_stats_helpers`\n\n**Statistical Distribution Validation**\n- **`mean_hotelling_t2_test(sample_mean, true_mean, sample_cov, n, confidence_level=0.95)`**: One-sample Hotelling T² test for multivariate mean equality using confidence regions\n  - Tests whether hypothesized mean vector lies within confidence region around sample mean\n  - Uses F-distribution for threshold calculation with proper degrees of freedom\n  - Higher confidence levels create larger (more permissive) acceptance regions\n- **`cov_nagao_test(emp_cov, ref_cov, n, confidence_level=0.95)`**: Nagao's test for covariance matrix equality using confidence regions\n  - Tests whether hypothesized covariance matrix is consistent with empirical covariance\n  - Uses χ² distribution with appropriate degrees of freedom\n  - Standardizes covariance matrices for improved numerical stability\n\n\n## 🤝 Contributing\n\nWe welcome contributions! Please see our contributing guidelines:\n\n1. **Issues**: Report bugs and request features via [GitHub Issues](https://github.com/cai4cai/torchsparsegradutils/issues)\n2. **Pull Requests**: Submit improvements via GitHub PRs\n3. **Testing**: Ensure all tests pass and add tests for new functionality\n4. **Documentation**: Update docstrings and examples for new features\n5. **Benchmarks**: Include performance benchmarks for new operations\n\n### Development Setup\n\n#### Option 1: Local Development\n\n```bash\ngit clone https://github.com/cai4cai/torchsparsegradutils\ncd torchsparsegradutils\npip install -e \".[dev]\"  # Install in development mode\npre-commit install       # Install pre-commit hooks\n```\n\n#### Option 2: Development Containers (Recommended)\n\nFor a consistent development environment with GPU support and all dependencies pre-installed, use VS Code Dev Containers:\n\n**Prerequisites:**\n- [Docker](https://docs.docker.com/get-docker/) with NVIDIA Container Toolkit (for GPU support)\n- [VS Code](https://code.visualstudio.com/) with the [Dev Containers extension](https://marketplace.visualstudio.com/items?itemName=ms-vscode-remote.remote-containers)\n\n**Quick Start:**\n1. Clone the repository and open in VS Code:\n   ```bash\n   git clone https://github.com/cai4cai/torchsparsegradutils\n   cd torchsparsegradutils\n   code .\n   ```\n\n2. When prompted, click **\"Reopen in Container\"** or use the Command Palette:\n   - Press `Ctrl+Shift+P` (or `Cmd+Shift+P` on macOS)\n   - Type \"Dev Containers: Reopen in Container\"\n\n**Available Configurations:**\n\n- **`.devcontainer/Dockerfile.stable`** (default): Uses stable PyTorch with CUDA 13.0 support\n- **`.devcontainer/Dockerfile.nightly`**: Uses nightly PyTorch builds for latest features\n\nTo switch configurations, modify the `dockerfile` field in `.devcontainer/devcontainer.json`:\n```json\n\"build\": {\n    \"dockerfile\": \"./Dockerfile.nightly\",  // or \"./Dockerfile.stable\"\n    \"context\": \".\"\n}\n```\n\n**What's Included:**\n- **CUDA 13.0**: Full GPU development support with NVIDIA drivers\n- **Pre-installed Dependencies**: PyTorch, CuPy, JAX, SciPy, and all development tools\n- **VS Code Extensions**: Python, Pylance, Jupyter, GitHub Copilot, and code formatting tools\n- **Development Tools**: pytest, black, flake8, pre-commit hooks\n- **Python Environment**: Python 3.10+ with all optional dependencies\n\n**Benefits:**\n- ✅ **Consistent Environment**: Same setup across different machines\n- ✅ **GPU Support**: Pre-configured CUDA environment\n- ✅ **Zero Setup**: All dependencies and tools pre-installed\n- ✅ **Isolated**: No conflicts with host system packages\n- ✅ **VS Code Integration**: Seamless debugging, IntelliSense, and testing\n\n## 📄 License\n\nThis project is licensed under the Apache License 2.0 - see the [LICENSE](LICENSE) file for details.\n\n## 🙏 Acknowledgments\n\n- **PyTorch Team**: For the foundational sparse tensor implementations\n- **SciPy/CuPy Teams**: For high-performance sparse linear algebra routines\n- **JAX Team**: For cross-platform sparse operations and XLA compilation\n- **Open Source Libraries**: We port and adapt algorithms from:\n  - [pykrylov](https://github.com/PythonOptimizers/pykrylov) (BICGSTAB)\n  - [cornellius-gp/linear_operator](https://github.com/cornellius-gp/linear_operator) (CG, MINRES)\n  - [pytorch-minimize](https://github.com/rfeinman/pytorch-minimize) (LSMR)\n\n## 📚 Citation\n\nIf you use this package in your research, please cite:\n\n```bibtex\n@software{torchsparsegradutils,\n  title={torchsparsegradutils: Sparsity-preserving gradient utility tools for PyTorch},\n  author={Barfoot, Theodore and Glocker, Ben and Vercauteren, Tom},\n  url={https://github.com/cai4cai/torchsparsegradutils},\n  year={2024}\n}\n```\n\n## ⚠️ Known Issues\n\n### PyTorch Sparse COO Index Dtype Conversion\n\n**Issue**: PyTorch automatically converts `int32` indices to `int64` when creating sparse COO tensors, but preserves `int32` for sparse CSR tensors. This affects memory usage and performance for algorithms that benefit from `int32` indices (such as `sparse_mm`).\n\n**Impact**:\n- **Memory**: `int64` indices use 2× more memory than `int32`\n- **Performance**: Some sparse operations may run faster with `int32` indices\n- **Cross-format consistency**: Different behavior between COO and CSR formats\n\n**Example**:\n```python\nimport torch\n\n# Demonstrate the issue\nindices_int32 = torch.tensor([[0, 1], [1, 0]], dtype=torch.int32)\nvalues = torch.tensor([1.0, 2.0])\n\nprint(f\"Original indices dtype: {indices_int32.dtype}\")  # torch.int32\n\n# COO: int32 -\u003e int64 conversion happens\ncoo_tensor = torch.sparse_coo_tensor(indices_int32, values, (2, 2)).coalesce()\nprint(f\"COO indices dtype: {coo_tensor.indices().dtype}\")  # torch.int64 (converted!)\n\n# CSR: int32 is preserved\ncrow_indices = torch.tensor([0, 1, 2], dtype=torch.int32)\ncol_indices = torch.tensor([1, 0], dtype=torch.int32)\ncsr_tensor = torch.sparse_csr_tensor(crow_indices, col_indices, values, (2, 2))\nprint(f\"CSR crow_indices dtype: {csr_tensor.crow_indices().dtype}\")  # torch.int32 (preserved!)\nprint(f\"CSR col_indices dtype: {csr_tensor.col_indices().dtype}\")    # torch.int32 (preserved!)\n```\n\n**Workarounds**:\n1. **Use CSR format** when `int32` indices are important for performance\n2. **Account for extra memory** when using COO format with large sparse matrices\n3. **Test performance** with both dtypes to determine if the conversion impacts your use case\n\n**Status**: This is a known PyTorch behavior. Our test suite documents and validates this behavior to catch any future changes in PyTorch's handling of sparse tensor index dtypes.\n\n### PairwiseEncoder CSR Memory Usage Issue\n\n**Issue**: CSR sparse tensors generated by `PairwiseEncoder` consume significantly more memory during backward passes compared to COO format, particularly in integration tests with `SparseMultivariateNormal`.\n\n**Impact**:\n- **Memory Consumption**: CSR integration tests can use 2-3x more memory than equivalent COO tests during `.backward()`\n- **Training Stability**: May cause out-of-memory errors during training with large spatial volumes\n- **Development**: Affects integration testing with large tensor configurations\n\n**Suspected Cause**: The issue may be related to CSR permutation operations within `PairwiseEncoder` that create additional intermediate tensors during gradient computation.\n\n**Current Status**: Under investigation. The memory spike occurs specifically during backpropagation through the sparse matrix operations.\n\n**Workarounds**:\n1. **Use COO format** for `PairwiseEncoder` when memory is constrained during training\n2. **Reduce batch sizes** or spatial dimensions when using CSR format\n3. **Monitor memory usage** carefully when integrating `PairwiseEncoder` with gradient-based optimization\n\n**Example**:\n```python\n# More memory-efficient approach for large tensors\nencoder = PairwiseEncoder(\n    radius=2.0,\n    volume_shape=(4, 64, 64, 64),\n    layout=torch.sparse_coo  # Use COO instead of CSR for memory efficiency\n)\n```\n\n### SparseMultivariateNormal LL^T Precision Parameterization Gradient Issues\n\n**Issue**: Large gradient magnitudes can occur when using LL^T parameterization with precision matrices in `SparseMultivariateNormal`, leading to training instability.\n\n**Impact**:\n- **Gradient Explosion**: Gradients can become extremely large (\u003e1e6) during backpropagation\n- **Training Instability**: May cause NaN values or divergent optimization\n- **Numerical Issues**: Poor conditioning of the precision matrix can amplify gradient problems\n\n**Affected Configurations**:\n- LL^T parameterization (`scale_tril` parameter) combined with precision matrix formulation\n- Both 2D and 3D spatial configurations show this behavior\n- More pronounced with larger spatial dimensions and higher sparsity\n\n**Root Cause**: The LL^T precision parameterization can lead to poor numerical conditioning, especially when the triangular matrix has small diagonal values or high condition number.\n\n**Recommended Solution**: Use LDL^T parameterization instead, which provides better numerical stability:\n\n```python\n# Problematic: LL^T precision parameterization\ndist_unstable = SparseMultivariateNormal(\n    loc=loc,\n    precision_tril=scale_tril  # LL^T with precision - can cause large gradients\n)\n\n# Better: LDL^T parameterization with separate diagonal\ndist_stable = SparseMultivariateNormal(\n    loc=loc,\n    diagonal=diagonal,  # Separate diagonal component for stability\n    precision_tril=unit_triangular_matrix  # Unit triangular (LDL^T)\n)\n```\n\n**Benefits of LDL^T Parameterization**:\n- **Numerical Stability**: Separates diagonal scaling from triangular structure\n- **Gradient Stability**: More stable gradients during backpropagation\n- **No SPD Constraints**: Doesn't require strict positive definiteness\n- **Better Conditioning**: Diagonal component can be controlled independently\n\n**Status**: This is a known limitation of the LL^T precision formulation. LDL^T parameterization is the recommended approach for precision matrices.","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcai4cai%2Ftorchsparsegradutils","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcai4cai%2Ftorchsparsegradutils","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcai4cai%2Ftorchsparsegradutils/lists"}