{"id":30360342,"url":"https://github.com/amedar-asterisk/u-statistics-python","last_synced_at":"2025-08-19T13:19:29.839Z","repository":{"id":303003016,"uuid":"909649325","full_name":"Amedar-Asterisk/U-Statistics-python","owner":"Amedar-Asterisk","description":"A Python package for efficient computation of U-statistics via tensor contraction.","archived":false,"fork":false,"pushed_at":"2025-08-19T09:33:21.000Z","size":322,"stargazers_count":3,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-08-19T11:32:41.304Z","etag":null,"topics":["einsum","statistics","u-statistics"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Amedar-Asterisk.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2024-12-29T11:25:37.000Z","updated_at":"2025-08-19T09:33:24.000Z","dependencies_parsed_at":"2025-07-05T05:33:11.219Z","dependency_job_id":"6d6cce7f-308f-4db5-a132-f8cf6019e924","html_url":"https://github.com/Amedar-Asterisk/U-Statistics-python","commit_stats":null,"previous_names":["amedar-asterisk/u-statistics-python"],"tags_count":18,"template":false,"template_full_name":null,"purl":"pkg:github/Amedar-Asterisk/U-Statistics-python","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Amedar-Asterisk%2FU-Statistics-python","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Amedar-Asterisk%2FU-Statistics-python/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Amedar-Asterisk%2FU-Statistics-python/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Amedar-Asterisk%2FU-Statistics-python/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Amedar-Asterisk","download_url":"https://codeload.github.com/Amedar-Asterisk/U-Statistics-python/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Amedar-Asterisk%2FU-Statistics-python/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":271159423,"owners_count":24709268,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-08-19T02:00:09.176Z","response_time":63,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["einsum","statistics","u-statistics"],"created_at":"2025-08-19T13:19:27.746Z","updated_at":"2025-08-19T13:19:29.828Z","avatar_url":"https://github.com/Amedar-Asterisk.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# U-Statistics-python `u-stats`\n[![PyPI version](https://badge.fury.io/py/u-stats.svg)](https://badge.fury.io/py/u-stats)\n[![Python 3.11+](https://img.shields.io/badge/python-3.11+-blue.svg)](https://www.python.org/downloads/)\n[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)\n[![Style Status](https://img.shields.io/github/actions/workflow/status/Amedar-Asterisk/U-Statistics-python/style_check.yml?branch=main\u0026label=Style)](https://github.com/Amedar-Asterisk/U-Statistics-python/actions)\n\n**U-statistics** are fundamental tools in statistics, probability theory, theoretical computer science, economics, statistical physics, and machine learning. Named after Wassily Hoeffding, U-statistics provide unbiased estimators for population parameters and form the foundation for many statistical tests and methods. However, computing U-statistics can be computationally demanding, especially for high-order cases where the number of combinations grows exponentially.\n\nThis package provides a high-performance, tensor-based implementation for computing U-statistics and V-statistics with significant computational advantages:\n\n- Leverages the underlying structure of kernel functions to significantly reduce computational complexity in many cases\n- Utilizes optimized einsum engines—[`numpy.einsum`](https://numpy.org/doc/stable/reference/generated/numpy.einsum.html) and [`torch.einsum`](https://pytorch.org/docs/stable/generated/torch.einsum.html)—to enable efficient computation on both CPU and GPU\n\nIf you find this library useful in your research, please consider citing our paper:\n\n- Title: On computing and the complexity of computing higher-order U-statistics, exactly\n- Authors: Xingyu Chen, Ruiqi Zhang And Lin Liu\n- Url: https://arxiv.org/abs/2508.12627\n\n```bibtex\n@article{CZL2025,\n  title        = {On computing and the complexity of computing higher-order $U$-statistics, exactly},\n  author       = {Xingyu Chen and Ruiqi Zhang and Lin Liu},\n  year         = {2025},\n  archivePrefix= {arXiv},\n  eprint       = {2508.12627},\n  primaryClass = {stat.ML},\n  doi          = {10.48550/arXiv.2508.12627},\n  url          = {https://arxiv.org/abs/2508.12627},\n}\n```\n\n## Table of Contents\n\n1. [Installation](#1-installation)\n2. [Requirements](#2-requirements)\n3. [Example Usage](#3-example-usage)\n4. [API Reference](#4-api-reference)\n5. [Changelog](#5-changelog)\n6. [License](#6-license)\n7. [Contributing](#7-contributing) \n\n## 1. Installation\n\nInstall the package from PyPI:\n\n```bash\npip install u-stats\n```\n\nFor development installation:\n\n```bash\ngit clone https://github.com/Amedar-Asterisk/U-Statistics-python.git\ncd U-Statistics-python\npip install -e .\n```\n\n## 2. Requirements\n\n### Required Dependencies\n- **Python 3.11+**\n- **NumPy \u003e= 1.20.0**\n- **opt_einsum \u003e= 3.3.0**\n\n### Optional Dependencies\n- **torch \u003e= 1.9.0**: GPU acceleration and parallel CPU computation\n\n## 3. Example Usage\n\n### 3.1 Selection of Backend\n\nThe package supports two computation backends for different performance needs:\n\n```python\nfrom u_stats import set_backend, get_backend\n\n# Set backend to NumPy (default)\nset_backend(\"numpy\")  # CPU computation, deterministic results\n\n# Set backend to PyTorch (optional)\nset_backend(\"torch\")  # GPU acceleration and parallel CPU computation\n\n# Check current backend\ncurrent_backend = get_backend()\nprint(f\"Current backend: {current_backend}\")\n```\n\n### 3.2 Computing U-statistics\n\n[`ustat`](https://github.com/Amedar-Asterisk/U-Statistics-python/blob/main/src/u_stats/__init__.py#L92-L131) is the main function for computing U-statistics.\n\nHere we take a **7th-order U-statistic** with kernel function\n\n$$\nh(x_1,x_2,\\dots,x_7) = h_1(x_1, x_2) h_2(x_2, x_3) \\dots h_6(x_6, x_7)\n$$\n\nas an example. For samples $X = (X_1, \\dots, X_n)$, the U-statistic takes the form \n\n$$\nU = \\frac{1}{n(n-1)\\cdots (n-6)} \\sum_{(i_1,\\dots,i_7) \\in P_7}h_1(X_{i_1},X_{i_2})\\cdots h_6(X_{i_6},X_{i_7})\n$$\n\nwhere $P_7$ denotes all 7-tuples of distinct indices.\n\n#### 3.2.1 Tensor Assembly\n\nWe assume that the kernel function values on samples $X$ have been precomputed and assembled into matrices $H_1, H_2, \\dots, H_6 \\in \\mathbb{R}^{n \\times n}$:\n\n$$\nH_k[i,j] = h_k(X_i, X_j)\n$$\n\n#### 3.2.2 Expression Formats\n\nThe expression defines how kernel matrices are connected in the computation. We take this U-statistic as an example to explain how to construct expression. \n\nTo express the structure of the kernel function $h(x_1, x_2, \\dots, x_7) = h_1(x_1, x_2) \\cdot h_2(x_2, x_3) \\cdots h_{6}(x_{6}, x_7)$, we assign a unique index to each distinct variable $x_1, x_2, \\dots, x_7$. For each factor $h_k(x_{k}, x_{k+1})$, we collect the indices of the variables it depends on into a pair. The sequence of pairs is then ordered according to the order of the factors in the product. \n\nWe can use the following notation to represent this structure: \n\n**Einstein Summation Notation:**\n```python\nexpression = \"ab,bc,cd,de,ef,fg-\u003e\"\n```\n\n**Nested List Notation:**\n```python\nexpression = [[1,2],[2,3],[3,4],[4,5],[5,6],[6,7]]\n```\n\n**Format Explanation:**\n- In Einstein notation: each letter represents an index, each string like `\"ab\"` represents a factor of the kernel\n- In list notation: each sub-list `[i,j]` represents a factor of the kernel\nBoth formats are equivalent and specify the same computation pattern in our package\n\n#### 3.2.3 Complete Example\n\n```python\nfrom u_stats import ustat, set_backend\nimport numpy as np\n\n# Choose computation backend\nset_backend(\"torch\")  # Use torch if available\n# The default is numpy\n\n\n# Set number of samples\nn = 100\n\n# Create precomputed kernel matrices\n# In practice, these would be computed from your actual data\nH1 = np.random.rand(n, n)\nH2 = np.random.rand(n, n)\nH3 = np.random.rand(n, n)\nH4 = np.random.rand(n, n)\nH5 = np.random.rand(n, n)\nH6 = np.random.rand(n, n)\n\ntensors = [H1, H2, H3, H4, H5, H6]\nexpression = \"ab,bc,cd,de,ef,fg-\u003e\"\n\n# Compute the U-statistic\nresult = ustat(tensors=tensors, expression=expression, average=True)\nprint(f\"7th-order U-statistic: {result}\")\n\n# You can also compute the unaveraged sum\nsum_result = ustat(tensors=tensors, expression=expression, average=False)\nprint(f\"Sum (before averaging): {sum_result}\")\n```\n\n\n\n## 4. API Reference\n\n### 4.1 Main Functions\n\n#### `ustat(tensors, expression, average=True, optimize=\"greedy\", **kwargs)`\nCompute U-statistics from input tensors.\n\n**Parameters:**\n- `tensors` (List[np.ndarray]): List of input tensors (numpy arrays or torch tensors)\n- `expression` (str | List | Tuple): Tensor contraction expression\n  - String format: Einstein summation notation (e.g., \"ij,jk-\u003eik\")\n  - List format: Nested indices (e.g., [[1,2],[2,3]])\n- `average` (bool, default=True): Whether to compute average (True) or sum (False)\n- `optimize` (str, default=\"greedy\"): Optimization strategy for tensor contraction\n  - \"greedy\": Fast heuristic optimization\n  - \"optimal\": Exhaustive search for optimal contraction order\n  - \"dp\": Dynamic programming approach\n- `**kwargs`: Additional keyword arguments passed to `opt_einsum.contract`\n\n**Returns:** \n- `float`: Computed U-statistic value\n\n**Example:**\n```python\nresult = ustat([H1, H2], \"ij,jk-\u003e\", average=True, optimize=\"optimal\")\n```\n\n#### `vstat(tensors, expression, average=True, optimize=\"greedy\", **kwargs)`\nCompute V-statistics from input tensors.\n\n**Parameters:** Same as `ustat`\n\n**Returns:** Computed V-statistic value\n\n#### `u_stats_loop(tensors, expression)`\nReference implementation using explicit loops (for validation and small computations).\n\n**Note:** This function is primarily for testing and educational purposes. Use `ustats` for production code.\n\n#### `set_backend(backend_name)`\nSet the tensor computation backend.\n\n**Parameters:**\n- `backend_name` (str): Backend identifier\n  - `\"numpy\"`: Use NumPy backend\n  - `\"torch\"`: Use PyTorch backend\n\n**Example:**\n```python\nset_backend(\"torch\")  # Switch to PyTorch backend\n```\n\n#### `get_backend()`\nGet the current tensor computation backend.\n\n**Returns:** \n- `str`: Current backend name (\"numpy\" or \"torch\")\n\n### 4.2 Classes\n\n#### `UStats(expression)`\nClass-based interface for U-statistics computation with advanced features.\n\n**Parameters:**\n- `expression`: Tensor contraction expression (same format as function interface)\n\n**Methods:**\n- `compute(tensors, average=True, **kwargs)`: Compute U-statistic\n- `complexity_analysis()`: Analyze computational complexity\n- `get_contraction_path()`: Get optimized contraction path\n\n**Example:**\n```python\nustats_obj = UStats(\"ij,jk-\u003e\")\nresult = ustats_obj.compute([H1, H2], average=True)\ncomplexity = ustats_obj.complexity_analysis()\n```\n\n#### `VStats(expression)`  \nClass-based interface for V-statistics computation with advanced features.\n\n**Parameters and Methods:** Same as `UStats`\n\n## 5. Changelog\n\nSee [CHANGELOG.md](CHANGELOG.md) for detailed version history and changes.\n\n## 6. License\n\nThis project is licensed under the MIT License – see the [LICENSE](LICENSE) file for details.\n\n## 7. Contributing\n\nWe welcome contributions! Here's how you can help:\n\n### Reporting Issues\n- Use the [GitHub issue tracker](https://github.com/Amedar-Asterisk/U-Statistics-python/issues)\n- Include minimal reproducible examples\n- Specify your environment (Python version, OS, backend)\n\n### Development Setup\n```bash\n# Clone the repository\ngit clone https://github.com/Amedar-Asterisk/U-Statistics-python.git\ncd U-Statistics-python\n\n# Install in development mode\npip install -e \".[test]\"\n```\n\n### Pull Requests\n- Fork the repository and create a feature branch\n- Add tests for new functionality\n- Ensure all tests pass and type checking succeeds\n- Update documentation as needed\n- Follow the existing code style\n\nFor questions or discussions, feel free to open an issue or reach out to the maintainers.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Famedar-asterisk%2Fu-statistics-python","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Famedar-asterisk%2Fu-statistics-python","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Famedar-asterisk%2Fu-statistics-python/lists"}