{"id":24912373,"url":"https://github.com/sangioai/torchpace","last_synced_at":"2026-04-25T16:33:31.098Z","repository":{"id":274996961,"uuid":"912987593","full_name":"sangioai/torchPACE","owner":"sangioai","description":"PyTorch CUDA/C++ extension of PACE: Transformer non-linearlity accelerator engine.","archived":false,"fork":false,"pushed_at":"2025-02-18T17:35:10.000Z","size":1856,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-03-28T04:18:42.297Z","etag":null,"topics":["cuda","pytorch","transformer"],"latest_commit_sha":null,"homepage":"","language":"Cuda","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/sangioai.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2025-01-06T19:57:55.000Z","updated_at":"2025-02-18T17:35:14.000Z","dependencies_parsed_at":"2025-01-30T16:46:07.801Z","dependency_job_id":null,"html_url":"https://github.com/sangioai/torchPACE","commit_stats":null,"previous_names":["sangioai/torchpace"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sangioai%2FtorchPACE","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sangioai%2FtorchPACE/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sangioai%2FtorchPACE/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sangioai%2FtorchPACE/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/sangioai","download_url":"https://codeload.github.com/sangioai/torchPACE/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":245966927,"owners_count":20701758,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cuda","pytorch","transformer"],"created_at":"2025-02-02T05:19:26.407Z","updated_at":"2026-04-25T16:33:31.068Z","avatar_url":"https://github.com/sangioai.png","language":"Cuda","funding_links":[],"categories":[],"sub_categories":[],"readme":"# torchPACE\nPyTorch C++ and CUDA extension for PACE's Piecewise Polynomial Approximation(PwPA), a Transformer non-linerarities accelaration engine.\n\n## Introduction\nThis extension integrates PwPA CUDA kernels for both AoS and SoA coefficients' data structure using a simple unrolling technic.\u003c/br\u003e\nMore details [here](extra/README.md).\n\n## Setup\nBuilt with [PyPA/Build](https://github.com/pypa/build), but you can use Pip or similar.\n\nTo build: \u003c/br\u003e\n```text\npython -m build -n\n```\n    \nTo install:  \u003c/br\u003e\n```text\npip install dist\\\u003cbuilded_extension_file.whl\u003e\n```\n\nTo test:  \u003c/br\u003e\n```text\npython test\\extension_test.py\n```\n\n```text\npython test\\extension_test.py\n```\n\n\nTo use:  \u003c/br\u003e\n```python\nimport torch_pace\n...\n# base kernel\ny = torch_pace.ops._pwpa(x, coeffs, partition_points, AoS=true)\n# optimized kernel\ny = torch_pace.ops.pwpa(x, coeffs, partition_points, AoS=true)\n# AoS to SoA coefficients rearrangement\ncoeffs_soa = torch_pace.ops.aos2soa(coeffs, degree)\n# optimized kernel with SoA coefficients' data structure\ny = torch_pace.ops.pwpa(x, coeffs_soa, partition_points, AoS=false)\n```\n\n\u003e [!Important]\n\u003e Requirements: \n\u003e    - torch\u003e=2.4 with CUDA enabled (mine is 2.5.1+cu118)\n\u003e    - CUDA toolkit (mine is 11.7)\n\u003e    - Python\u003e=3.8 (mine is 3.12.8)\n\n## Examples\n\nThis is the ouput of running [approximation_test.py](test/approximation_test.py):\n![immagine](https://github.com/user-attachments/assets/01ecdbec-d232-4e9e-99f5-f5d38cadfeb3)\n\n\u003e [!Note]\n\u003e [approximation_test.py](test/approximation_test.py) uses a simple uniform partitioning which divides the X-value range in equal parts.\u003c/br\u003e\n\u003e More sophisticated partitioning strategies may account for slope trends, yielding more accurate approximations where the function changes more.\n\n## ToDo\nA brief list of things to do or fix in this extension:\n- [x] PyTorch Half type support\n- [ ] Extension Benchmark on non-linearities in plain CUDA code\n- [ ] Extension Benchmark on PyTorch non-linearities\n- [ ] ILP (Instruction-Level Parallelism) integration\n- [x] aos2soa function\n- [ ] soa2aos function\n- [ ] CUDA SIMD instrics analysis for float16 (PyTorch Half) type  \n- [ ] PyTorch neural net example\n\n## Credits\n\nExtension backbone inspired by [this tutorial](https://github.com/pytorch/extension-cpp).\n\n## Authors\n\n[Marco Sangiorgi](https://github.com/SangioAI)\n\u003c/br\u003e\n*2025©*\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsangioai%2Ftorchpace","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsangioai%2Ftorchpace","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsangioai%2Ftorchpace/lists"}