{"id":13415280,"url":"https://github.com/facebookresearch/TensorComprehensions","last_synced_at":"2025-03-14T22:33:15.465Z","repository":{"id":41555492,"uuid":"120494252","full_name":"facebookresearch/TensorComprehensions","owner":"facebookresearch","description":"A domain specific language to express machine learning workloads.","archived":true,"fork":false,"pushed_at":"2023-04-28T21:51:23.000Z","size":38455,"stargazers_count":1760,"open_issues_count":90,"forks_count":211,"subscribers_count":108,"default_branch":"master","last_synced_at":"2024-12-17T01:38:05.254Z","etag":null,"topics":["domain-specific-language","machine-learning"],"latest_commit_sha":null,"homepage":"https://facebookresearch.github.io/TensorComprehensions/","language":"C++","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/facebookresearch.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":"CodeOwners.md","security":null,"support":null,"governance":null}},"created_at":"2018-02-06T17:11:07.000Z","updated_at":"2024-11-22T20:05:25.000Z","dependencies_parsed_at":"2022-08-10T02:50:24.703Z","dependency_job_id":"e60c3f21-636b-4d9f-a761-e000d9e8018e","html_url":"https://github.com/facebookresearch/TensorComprehensions","commit_stats":null,"previous_names":[],"tags_count":1,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/facebookresearch%2FTensorComprehensions","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/facebookresearch%2FTensorComprehensions/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/facebookresearch%2FTensorComprehensions/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/facebookresearch%2FTensorComprehensions/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/facebookresearch","download_url":"https://codeload.github.com/facebookresearch/TensorComprehensions/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":243658057,"owners_count":20326459,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["domain-specific-language","machine-learning"],"created_at":"2024-07-30T21:00:46.468Z","updated_at":"2025-03-14T22:33:15.459Z","avatar_url":"https://github.com/facebookresearch.png","language":"C++","readme":"# ![Tensor Comprehensions](docs/source/_static/img/tc-logo-full-color-with-text-2.png)\n\nTensor Comprehensions (TC) is a fully-functional C++ library to *automatically* synthesize high-performance machine learning kernels using [Halide](https://github.com/halide/Halide), [ISL](http://isl.gforge.inria.fr/) and NVRTC or LLVM. TC additionally provides basic integration with Caffe2 and PyTorch. We provide more details in our paper on [arXiv](https://arxiv.org/abs/1802.04730).\n\nThis library is designed to be highly portable, machine-learning-framework agnostic and only requires a simple tensor library with memory allocation, offloading and synchronization capabilities.\n\nFor now, we have integrated TC with [Caffe2](https://github.com/caffe2/caffe2) and [PyTorch](https://github.com/pytorch/pytorch/).\n\n# A simple example\n\nThe following illustrates a short but powerful feature of the library: the capacity to JIT-compile high-performance machine learning kernels on demand, for specific sizes.\n\n```python\nimport tensor_comprehensions as tc\nimport torch\nlang = \"\"\"\ndef tensordot(float(N, C1, C2, H, W) I0, float(N, C2, C3, H, W) I1) -\u003e (O) {\n    O(n, c1, c3, h, w) +=! I0(n, c1, c2, h, w) * I1(n, c2, c3, h, w)\n}\n\"\"\"\nN, C1, C2, C3, H, W = 32, 512, 8, 2, 28, 28\ntensordot = tc.define(lang, name=\"tensordot\")\nI0, I1 = torch.randn(N, C1, C2, H, W).cuda(), torch.randn(N, C2, C3, H, W).cuda()\nbest_options = tensordot.autotune(I0, I1, cache=True)\nout = tensordot(I0, I1, options=best_options)\n```\n\nAfter a few generations of `autotuning` on a 2-GPU P100 system, we see results resembling:\n\n![Autotuning Sample](docs/source/_static/img/autotuning.png)\n\nIn C++ a minimal autotuning example resembles the [following](tc/examples/tensordot.cc):\n```cpp\nTEST(TensorDot, SimpleAutotune) {\n  // 1. Define and setup the TC compilation unit with CUDA memory\n  // management backed by ATen tensors.\n  std::string tc = R\"TC(\ndef tensordot(float(N, C1, C2, H, W) I0,\n              float(N, C2, C3, H, W) I1)  -\u003e (O)\n{\n    O(n, c1, c3, h, w) +=! I0(n, c1, r_c2, h, w) * I1(n, r_c2, c3, h, w)\n}\n  )TC\";\n\n  // 2. Allocate tensors with random data.\n  at::Tensor I0 = at::CUDA(at::kFloat).rand({32,  8, 16, 17, 25});\n  at::Tensor I1 = at::CUDA(at::kFloat).rand({32, 16, 2, 17, 25});\n\n  // 3. Run autotuning with evolutionary search starting from a naive option.\n  auto naiveOptions = Backend::MappingOptionsType::makeNaiveMappingOptions();\n  tc::aten::ATenAutotuner\u003ctc::CudaBackend, tc::autotune::GeneticSearch\u003e\n      geneticAutotuneATen(tc);\n  auto bestOption =\n      geneticAutotuneATen.tune(\"tensordot\", {I0, I1}, {naiveOptions});\n\n  // 4. Compile and run the TC with the best option after allocating output\n  //    tensors.\n  auto pExecutor =\n      tc::aten::compile\u003cBackend\u003e(tc, \"tensordot\", {I0, I1}, bestOption[0]);\n  auto outputs = tc::aten::prepareOutputs(tc, \"tensordot\", {I0, I1});\n  auto timings = tc::aten::profile(*pExecutor, {I0, I1}, outputs);\n  std::cout \u003c\u003c \"tensordot size I0: \" \u003c\u003c I0.sizes() \u003c\u003c \", \"\n            \u003c\u003c \"size I1: \" \u003c\u003c I1.sizes()\n            \u003c\u003c \" ran in: \" \u003c\u003c timings.kernelRuntime.toMicroSeconds() \u003c\u003c \"us\\n\";\n}\n```\n\nNote that we only need to **autotune a TC once** to obtain reasonable mapping options\nthat can translate to other problem sizes for a given TC as the following snippet\nillustrates:\n```cpp\n// 5. Reuse bestOptions from autotuning on another kernel\nfor (auto sizes : std::vector\u003cstd::pair\u003cat::IntList, at::IntList\u003e\u003e{\n         {{4, 9, 7, 16, 14}, {4, 7, 3, 16, 14}},\n         {{8, 5, 11, 10, 10}, {8, 11, 16, 10, 10}},\n     }) {\n  at::Tensor I0 = makeATenTensor\u003cBackend\u003e(sizes.first);\n  at::Tensor I1 = makeATenTensor\u003cBackend\u003e(sizes.second);\n  auto pExecutor =\n      tc::aten::compile\u003cBackend\u003e(tc, \"tensordot\", {I0, I1}, bestOption[0]);\n  auto outputs = tc::aten::prepareOutputs(tc, \"tensordot\", {I0, I1});\n  auto timings = tc::aten::profile(*pExecutor, {I0, I1}, outputs);\n  std::cout \u003c\u003c \"tensordot size I0: \" \u003c\u003c I0.sizes() \u003c\u003c \", \"\n            \u003c\u003c \"size I1: \" \u003c\u003c I1.sizes()\n            \u003c\u003c \" ran in: \" \u003c\u003c timings.kernelRuntime.toMicroSeconds()\n            \u003c\u003c \"us\\n\";\n}\n```\n\nPutting it all together, one may see:\n```shell\n\u003e build$ ./examples/example_simple\n[==========] Running 1 test from 1 test case.\n[----------] Global test environment set-up.\n[----------] 1 test from TensorDot\n[ RUN      ] TensorDot.SimpleAutotune\nGeneration 0    Jobs(Compiled, GPU)/total  (10, 10)/10   (best/median/worst)us: 226/4238/7345\nGeneration 1    Jobs(Compiled, GPU)/total  (10, 10)/10   (best/median/worst)us: 220/221/233\nGeneration 2    Jobs(Compiled, GPU)/total  (10, 10)/10   (best/median/worst)us: 220/221/234\ntensordot size I0: [16, 8, 16, 17, 25], size I1: [16, 16, 2, 17, 25] ran in: 239us\ntensordot size I0: [4, 9, 7, 16, 14], size I1: [4, 7, 3, 16, 14] ran in: 56us\ntensordot size I0: [8, 5, 11, 10, 10], size I1: [8, 11, 16, 10, 10] ran in: 210us\n[       OK ] TensorDot.SimpleAutotune (27812 ms)\n[----------] 1 test from TensorDot (27812 ms total)\n\n[----------] Global test environment tear-down\n[==========] 1 test from 1 test case ran. (27812 ms total)\n[  PASSED  ] 1 test.\n```\n\nWe have not yet characterized the precise fraction of peak performance we obtain but it is not uncommon to obtain 80%+ of peak shared memory bandwidth after autotuning. Solid register-level optimizations are still in the work but TC in its current form already addresses the productivity gap between the needs of research and the needs of production. Which is why we are excited to share it with the entire community and bring this collaborative effort in the open.\n\n# Documentation\n\n**General**: You can find detailed information about Tensor Comprehensions [here](https://facebookresearch.github.io/TensorComprehensions/).\n\n**C++ API**: We also provide documentation for our C++ API which can can be found [here](https://facebookresearch.github.io/TensorComprehensions/api/)\n\n# Installation\n\n## Binaries\n\nWe provide conda package for making it easy to install and use TC binary. Please refer to our documentation\n[here](https://facebookresearch.github.io/TensorComprehensions/framework/pytorch_integration/getting_started.html) for instructions.\n\n## From Source\n\nYou can find documentation [here](https://facebookresearch.github.io/TensorComprehensions/) which contains instructions for building TC via docker, conda packages or in non-conda environment.\n\n# Communication\n\n* **Email**: tensorcomp@fb.com\n* **GitHub issues**: bug reports, feature requests, install issues, RFCs, thoughts, etc.\n\n# Code of Conduct\nSee the [CODE_OF_CONDUCT.md](CODE_OF_CONDUCT.md) file for more details.\n\n# License\nTensor Comprehensions is distributed under a permissive Apache v2.0 license, see the [LICENSE](LICENSE) file for more details.\n\n# Contributing\nSee the [CONTRIBUTING.md](CONTRIBUTING.md) file for more details.\n","funding_links":[],"categories":["C++","TODO scan for Android support in followings","Machine Learning"],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffacebookresearch%2FTensorComprehensions","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ffacebookresearch%2FTensorComprehensions","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffacebookresearch%2FTensorComprehensions/lists"}