{"id":24771065,"url":"https://github.com/pfcclab/paddle_scatter","last_synced_at":"2025-03-23T20:26:45.834Z","repository":{"id":272443588,"uuid":"916469513","full_name":"PFCCLab/paddle_scatter","owner":"PFCCLab","description":"Paddle Extension Library of Optimized Scatter Operations","archived":false,"fork":false,"pushed_at":"2025-03-03T06:01:50.000Z","size":368,"stargazers_count":0,"open_issues_count":0,"forks_count":1,"subscribers_count":3,"default_branch":"main","last_synced_at":"2025-03-03T07:19:25.848Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/PFCCLab.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2025-01-14T06:46:30.000Z","updated_at":"2025-03-03T06:01:53.000Z","dependencies_parsed_at":"2025-01-14T13:51:57.984Z","dependency_job_id":"1eb5b7e1-af72-488d-a38c-c5653ad3fec5","html_url":"https://github.com/PFCCLab/paddle_scatter","commit_stats":null,"previous_names":["pfcclab/paddle_scatter"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/PFCCLab%2Fpaddle_scatter","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/PFCCLab%2Fpaddle_scatter/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/PFCCLab%2Fpaddle_scatter/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/PFCCLab%2Fpaddle_scatter/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/PFCCLab","download_url":"https://codeload.github.com/PFCCLab/paddle_scatter/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":245163972,"owners_count":20571033,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-01-29T03:57:45.382Z","updated_at":"2025-03-23T20:26:45.795Z","avatar_url":"https://github.com/PFCCLab.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Paddle Scatter: Paddle Extension Library of Optimized Scatter Operations (Paddle backend)\n\n![paddle_logo](picture/paddle_logo.png)\n\n\u003e [!IMPORTANT]\n\u003e paddle_scatter 是基于 [Paddle 后端](https://www.paddlepaddle.org.cn/install/quick?docurl=/documentation/docs/zh/develop/install/pip/windows-pip.html) 所开发的稀疏计算 API 拓展仓库，包括 scatter，segment，gather 三大类稀疏计算 API。仓库原型参照：[pytorch_scatter](https://github.com/rusty1s/pytorch_scatter)。拓展仓库中的稀疏计算 API 通过 Paddle 原生 python API 以及自定义 C++ 算子实现。\n\u003e\n\u003e 推荐在运行前安装 [**Paddle 3.0 或 develop**](https://www.paddlepaddle.org.cn/install/quick?docurl=/documentation/docs/zh/develop/install/pip/linux-pip.html) 版本。\n\u003e\n\u003e 运行正确性在 Ubuntu 20.04.6 环境下经过验证。\n\n## 运行条件\n\n* Paddle 3.0\n\n\u003e cpu 版本或 gpu 版本均可\n\n## 安装说明\n\n* Package Build\n\n```sh\ncd paddle_scatter\npip install -v .\n```\n\n* Simple Example\n\n## 测试说明\n\n```sh\npip install pytest\ncd paddle_scatter/paddle_scatter\npytest -v ./tests\n```\n\n```py\nimport paddle\nfrom paddle_scatter import scatter_max\n\nsrc = paddle.to_tensor([[2, 0, 1, 4, 3], [0, 2, 1, 3, 4]])\nindex = paddle.to_tensor([[4, 5, 4, 2, 3], [0, 0, 2, 2, 1]])\n\nout, argmax = scatter_max(src, index, dim=-1)\n\nprint(out)\nTensor(shape=[2, 6], dtype=int64, place=Place(gpu:0), stop_gradient=True,\n       [[0, 0, 4, 3, 2, 0],\n        [2, 4, 3, 0, 0, 0]])\n\nprint(argmax)\nTensor(shape=[2, 6], dtype=int64, place=Place(gpu:0), stop_gradient=True,\n       [[5, 5, 3, 4, 0, 1],\n        [1, 4, 3, 5, 5, 5]])\n```\n\n## 技术架构\n\n* 一级 API：\nscatter，segment_coo，segment_csr，gather_coo，gather_csr\n\n* 二级 API：\nscatter_add，scatter_mean，scatter_mul，scatter_min，scatter_max；\nsegment_sum_coo, segment_add_coo, segment_mean_coo, segment_min_coo, segment_max_coo；\nsegment_sum_csr, segment_add_csr, segment_mean_csr, segment_min_csr, segment_max_csr\n\n* 组合 API：\nscatter_softmax，scatter_log_softmax，scatter_logsumexp\n\n* Paddle 自定义 C++ 算子以及扩展技术说明:\n自定义 C++ 算子：\u003chttps://www.paddlepaddle.org.cn/documentation/docs/zh/guides/custom_op/new_cpp_op_cn.html\u003e\n自定义 C++ 扩展：\u003chttps://www.paddlepaddle.org.cn/documentation/docs/zh/guides/custom_op/cpp_extension_cn.html\u003e\n\n## 简要文档\n\n### paddle_scatter.scatter\n\n\u003e scatter(src: paddle.Tensor, index: paddle.Tensor, dim: int = -1, out: Optional[paddle.Tensor] = None, dim_size: Optional[int] = None, reduce: Optional[str] = \"sum\")\n\n分散计算，将 `src` 按照指定的 `index` 延 `dim` 轴进行 `reduce` 规约合并。若指定 `out` 则输出到 `out`，若指定 `dim_size` 则规约后输出的 `dim` 维的维数是 `dim_size`。\n\n符号表示：\n\n* `src` 形状: $(x_{0}, ..., x_{i-1}, x_{i}, x_{i+1}, ..., x_{n-1})$ 其中 $i$ = `dim`\n* `index` 形状: $(x_0, ..., x_{i-1}, x_i, x_{i+1}, ..., x_{n-1})$ 其中 $i$ = `dim`\n* `out` 形状: $(x_0, ..., x_{i-1}, y, x_{i+1}, ..., x_{n-1})$\n* `index` 的值必须属于 $[0, 1, ..., y-1]$，且值的顺序大小没有限制\n\n此 API 对 `index` 支持广播，所以 `index` 的形状还可以是: $(x_i,)$ 或 $(d_0, d_1, ..., d_{i-1}, x_i)$，其中 $d_k ,\\quad (k \u003c= i-1)$ 可以是 $1$ 或 $x_k$\n\n以一维情况下 `reduce = \"sum\"` 为例，数学计算公式为：\n\n$$\n\\mathrm{out}_i = \\mathrm{out}_i + \\underset{j \\in \\lgroup j | \\mathrm{index}_j = i \\rgroup }{\\sum} \\mathrm{src}_j\n$$\n\n\u003cp align=\"center\"\u003e\n  \u003cimg width=\"50%\" src=\"./picture/scatter_add.svg\" alt=\"scatter_add\"/\u003e\n\u003c/p\u003e\n\n参数：\n\n* **src** (paddle.Tensor) - 源 tensor。\n* **index** (paddle.Tensor) - 用于分散计算的指定下标，形状请参考上述文档。\n* **dim** (int) - 分散计算的目标维度。默认值为 -1。\n* **out** (paddle.Tensor，可选) - 输出 tensor。默认值为 None。\n* **dim_size** (int，可选) - 若未指定 `out`，输出 tensor 在 `dim` 维的维数将被设为 `dim_size`；若未指定 `dim_size`，输出 tensor 在 `dim` 维的维数将被自动设为 `index.max() + 1`。默认值为 None。\n* **reduce** (str，可选) - 规约类型，支持 \"sum\"，\"add\"，\"mul\"，\"mean\"，\"min\"，\"max\"。默认值为 \"sum\"。\n\n返回：\n分散规约计算后的 tensor。\n\n代码示例：\n\n```py\nfrom paddle_scatter import scatter\n\nsrc = paddle.randn([10, 6, 64])\nindex = paddle.tensor([0, 1, 0, 1, 2, 1])\n\n# Broadcasting in the first and last dim\nout = scatter(src, index, dim=1, reduce=\"sum\")\nprint(out.shape)\n[10, 3, 64]\n\n# Specify `dim_size`\nout = scatter(src, index, dim=1, dim_size=4, reduce=\"sum\")\nprint(out.shape)\n[10, 4, 64]\n\n# Specify `out`\nout = paddle.empty([10, 3, 64])\nscatter(src, index, dim=1, out=out, reduce=\"sum\")\nprint(out.shape)\n[10, 3, 64]\n```\n\n### paddle_scatter.segment_coo\n\n\u003e segment_coo(src: paddle.Tensor, index: paddle.Tensor, out: Optional[paddle.Tensor] = None, dim_size: Optional[int] = None, reduce: Optional[str] = \"sum\")\n\n以 coordinate 的稀疏格式分段计算，将 `src` 沿 `index` 最后一维，按照 `index` 的值分组进行 `reduce` 规约合并。若指定 `out` 则输出到 `out`，若指定 `dim_size` 则规约后输出的 `dim` 维的维数是 `dim_size`。\n\n符号表示：\n\n* `src` 形状: $(x_1, ..., x_{m-1}, x_m, x_{m+1}, ..., x_n)$\n* `index` 形状: $(x_1, ..., x_{m-1}, x_m)$\n* `out` 形状: $(x_1, ..., x_{m-1}, y, x_{m+1}, ..., x_n)$\n* `index` 的值必须属于 $[0, 1, ..., y-1]$，且值的顺序必须是升序\n\n此 API 对 `index` 支持广播，所以 `index` 的形状还可以是: $(d_1, d_2, ..., d_{m-1}, x_m)$，其中 $d_k ,\\quad (k \u003c= m-1)$ 可以是 $1$ 或 $x_k$\n\n以一维情况下 `reduce = \"sum\"` 为例，数学计算公式为：\n\n$$\n\\mathrm{out}_i = \\mathrm{out}_i + \\underset{j \\in \\lgroup j | \\mathrm{index}_j = i \\rgroup }{\\sum} \\mathrm{src}_j\n$$\n\n\u003cp align=\"center\"\u003e\n  \u003cimg width=\"50%\" src=\"./picture/segment_coo_add.svg\" alt=\"segment_coo_add\"/\u003e\n\u003c/p\u003e\n\n参数：\n\n* **src** (paddle.Tensor) - 源 tensor。\n* **index** (paddle.Tensor) - 用于分段计算的指定下标，形状请参考上述文档。\n* **out** (paddle.Tensor，可选) - 输出 tensor。默认值为 None。\n* **dim_size** (int，可选) - 若未指定 `out`，输出 tensor 在 `dim` 维的维数将被设为 `dim_size`；若未指定 `dim_size`，输出 tensor 在 `dim` 维的维数将被自动设为 `index.max() + 1`。默认值为 None。\n* **reduce** (str，可选) - 规约类型，支持 \"sum\"，\"add\"，\"mean\"，\"min\"，\"max\"。默认值为 \"sum\"。\n\n返回：\n以 coordinate 的稀疏格式分段规约计算后的 tensor。\n\n代码示例：\n\n```py\nfrom paddle_scatter import segment_coo\n\nsrc = paddle.randn([10, 6, 64])\nindex = paddle.to_tensor([0, 0, 1, 1, 1, 2])\nindex = index.view(1, -1)  # Broadcasting in the first and last dim.\n\nout = segment_coo(src, index, reduce=\"sum\")\n\nprint(out.shape)\n[10, 3, 64]\n```\n\n### paddle_scatter.segment_csr\n\n\u003e segment_csr(src: paddle.Tensor, indptr: paddle.Tensor, out: Optional[paddle.Tensor] = None, reduce: Optional[str] = \"sum\")\n\n以 compressed sparse row 的稀疏格式分段计算，将 `src` 沿 `indptr` 最后一维，按照 `indptr` 指定的下标范围进行分段 `reduce` 规约合并。若指定 `out` 则输出到 `out`。\n\n符号表示：\n\n* `src` 形状: $(x_1, ..., x_{m-1}, x_m, x_{m+1}, ..., x_n)$\n* `indptr` 形状: $(x_1, ..., x_{m-1}, y)$，其中 $y$ 的大小无限制\n* `out` 形状: $(x_1, ..., x_{m-1}, y - 1, x_{m+1}, ..., x_n)$\n* `indptr` 的值必须属于 $[0, 1, ..., x_m]$，且值的顺序必须是升序\n\n此 API 对 `indptr` 支持广播，所以 `indptr` 的形状还可以是: $(d_1, d_2, ..., d_{m-1}, y)$，其中 $d_k ,\\quad (k \u003c= m-1)$ 可以是 $1$ 或 $x_k$\n\n以一维情况下 `reduce = \"sum\"` 为例，数学计算公式为：\n\n$$\n\\mathrm{out}_i = \\overset{{\\mathrm{indptr}[i+1]-1}}{\\underset{{j = \\mathrm{indptr}[i]}}{\\sum}} \\mathrm{src}_j\n$$\n\n\u003cp align=\"center\"\u003e\n  \u003cimg width=\"50%\" src=\"./picture/segment_csr_add.png\" alt=\"segment_csr_add\"/\u003e\n\u003c/p\u003e\n\n参数：\n\n* **src** (paddle.Tensor) - 源 tensor。\n* **indptr** (paddle.Tensor) - 用于分段计算的下标指针，形状请参考上述文档。\n* **out** (paddle.Tensor，可选) - 输出 tensor。默认值为 None。\n* **reduce** (str，可选) - 规约类型，支持 \"sum\"，\"add\"，\"mean\"，\"min\"，\"max\"。默认值为 \"sum\"。\n\n返回：\n以 compressed sparse row 的稀疏格式分段规约计算后的 tensor。\n\n代码示例：\n\n```py\nfrom paddle_scatter import segment_csr\n\nsrc = paddle.randn([10, 6, 64])\nindptr = paddle.tensor([0, 2, 5, 6])\nindptr = indptr.view(1, -1)  # Broadcasting in the first and last dim.\n\nout = segment_csr(src, indptr, reduce=\"sum\")\n\nprint(out.shape)\n[10, 3, 64]\n```\n\n### paddle_scatter.gather_coo\n\n\u003e gather_coo(src: paddle.Tensor, index: paddle.Tensor, out: Optional[paddle.Tensor] = None)\n\n以 coordinate 的稀疏格式，沿着 `index` 最后一维，从 `src` 中按照 `index` 的下标值取出对应元素。若指定 `out` 则输出到 `out`。\n\n符号表示：\n\n* `src` 形状: $(x_1, ..., x_{m-1}, x_m, x_{m+1}, ..., x_n)$\n* `index` 形状: $(x_1, ..., x_{m-1}, y)$，其中 $y$ 的大小无限制\n* `out` 形状: $(x_1, ..., x_{m-1}, y, x_{m+1}, ..., x_n)$\n* `index` 的值必须属于 $[0, 1, ..., x_m - 1]$，且值的顺序必须是升序\n\n此 API 对 `index` 支持广播，所以 `index` 的形状还可以是: $(d_1, d_2, ..., d_{m-1}, y)$，其中 $d_k ,\\quad (k \u003c= m-1)$ 可以是 $1$ 或 $x_k$\n\n以一维情况为例，数学计算公式为：\n\n$$\n\\mathrm{out_{i}} = \\mathrm{src_{\\mathrm{index}_{i}}}\n$$\n\n参数：\n\n* **src** (paddle.Tensor) - 源 tensor。\n* **index** (paddle.Tensor) - 稀疏取出操作的指定下标，形状请参考上述文档。\n* **out** (paddle.Tensor，可选) - 输出 tensor。默认值为 None。\n\n返回：\n以 coordinate 的稀疏格式取出的 tensor。\n\n代码示例：\n\n```py\nfrom paddle_scatter import gather_coo\n\nsrc = paddle.to_tensor([1, 2, 3, 4])\nindex = paddle.to_tensor([0, 0, 1, 1, 1, 3])\n\nout = gather_coo(src, index)\n\nprint(out)\nTensor(shape=[6], dtype=int64, place=Place(cpu), stop_gradient=True,\n[1, 1, 2, 2, 2, 4])\n```\n\n### paddle_scatter.gather_csr\n\n\u003e gather_csr(src: paddle.Tensor, indptr: paddle.Tensor, out: Optional[paddle.Tensor] = None)\n\n以 compressed sparse row 的稀疏格式，沿 `indptr` 最后一维，按照 `indptr` 指定的下标范围从 `src` 中取出对应元素。若指定 `out` 则输出到 `out`。\n\n符号表示：\n\n* `src` 形状: $(x_1, ..., x_{m-1}, x_m, x_{m+1}, ..., x_n)$\n* `indptr` 形状: $(x_1, ..., x_{m-1}, y)$，其中需满足 $y = x_m + 1$\n* `out` 形状: $(x_1, ..., x_{m-1}, k, x_{m+1}, ..., x_n)$，其中 $k$ 指 `indptr` 所指示的下标分段数\n* `indptr` 的值必须属于 $[0, 1, ..., x_m]$，且值的顺序必须是升序\n\n此 API 对 `indptr` 支持广播，所以 `indptr` 的形状还可以是: $(d_1, d_2, ..., d_{m-1}, y)$，其中 $d_k ,\\quad (k \u003c= m-1)$ 可以是 $1$ 或 $x_k$\n\n以一维情况为例，数学计算公式为：\n\n$$\n\\mathrm{out}[i] = \\mathrm{src}[indptr[k]]\n$$\n\n$$\nk = indptr[(indptr - i \u003c= 0)][-1]\n$$\n\n参数：\n\n* **src** (paddle.Tensor) - 源 tensor。\n* **indptr** (paddle.Tensor) - 稀疏取出操作的下标指针，形状请参考上述文档。\n* **out** (paddle.Tensor，可选) - 输出 tensor。默认值为 None。\n\n返回：\n以 compressed sparse row 的稀疏格式取出的 tensor。\n\n代码示例：\n\n```py\nfrom paddle_scatter import gather_csr\n\nsrc = paddle.to_tensor([1, 2, 3, 4])\nindptr = paddle.to_tensor([0, 2, 5, 5, 6])\n\nout = gather_csr(src, indptr)\n\nprint(out)\nTensor(shape=[6], dtype=int64, place=Place(cpu), stop_gradient=True,\n[1, 1, 2, 2, 2, 4])\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpfcclab%2Fpaddle_scatter","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fpfcclab%2Fpaddle_scatter","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpfcclab%2Fpaddle_scatter/lists"}