{"id":17882560,"url":"https://github.com/prigoyal/tensorcomprehensions","last_synced_at":"2025-09-23T14:30:41.673Z","repository":{"id":78438028,"uuid":"120939638","full_name":"prigoyal/TensorComprehensions","owner":"prigoyal","description":"PyTorch Framework Integration for Tensor Comprehensions","archived":false,"fork":false,"pushed_at":"2018-03-04T18:44:09.000Z","size":5415,"stargazers_count":14,"open_issues_count":0,"forks_count":3,"subscribers_count":8,"default_branch":"master","last_synced_at":"2025-01-13T17:33:22.517Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/prigoyal.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2018-02-09T18:14:39.000Z","updated_at":"2023-02-28T03:02:04.000Z","dependencies_parsed_at":"2023-03-07T01:45:37.373Z","dependency_job_id":null,"html_url":"https://github.com/prigoyal/TensorComprehensions","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/prigoyal%2FTensorComprehensions","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/prigoyal%2FTensorComprehensions/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/prigoyal%2FTensorComprehensions/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/prigoyal%2FTensorComprehensions/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/prigoyal","download_url":"https://codeload.github.com/prigoyal/TensorComprehensions/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":233981144,"owners_count":18760779,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-10-28T12:56:02.335Z","updated_at":"2025-09-23T14:30:36.210Z","avatar_url":"https://github.com/prigoyal.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# ![Tensor Comprehensions](docs/source/_static/img/tc-logo-full-color-with-text-2.png)\n\n# Using Tensor Comprehensions with PyTorch\n\n## Table of Contents\n\n- [Installation](#installation)\n- [Examples and Documentation](#examples-and-documentation)\n- [Going through basics: by example](#going-through-basics-by-example)\n- [Layers that can't be expressed right now](#layers-that-cant-be-expressed-right-now)\n- [Note about performance / tuning](#note-about-performance--tuning)\n- [Communication](#communication)\n\nA blogpost on Tensor Comprehensions can be read [here](https://research.fb.com/announcing-tensor-comprehensions/).\n\nWe provide integration of Tensor Comprehensions (TC) with PyTorch for both training\nand inference purposes. Using TC, you can express an operator using [Einstein\nnotation](https://obilaniu6266h16.wordpress.com/2016/02/04/einstein-summation-in-numpy/)\nand get the fast CUDA code for that layer with a few lines of code. By providing\nTC integration with PyTorch, we hope to make it further easy to write new\noperations with TC.\n\nHere is what the PyTorch-TC package provides:\n\n- inputs and outputs to functions are are `torch.*Tensor`s\n- Integration with PyTorch `autograd`: if you specify forward and backward functions, you get an autograd function that takes `Variable` as input and returns `Variable` as output. Here's an [example](https://gist.github.com/anonymous/dc0cd7de343922a8c0c0636ccc4889a9#file-test_convolution_train-py).\n- autotuner results can be cached to a file (for reuse)\n\nTo make it easy to use TC, we provide conda packages for it. Follow the instructions\nbelow on how to install the conda package.\nBuilding from source is not easy, because of large dependencies like llvm, so using the conda package is ideal.\n\n## Installation\nYou will need anaconda to install conda packages of TC. If you don't have it, follow the next step, otherwise verify conda is in your **$PATH** and proceed to Step 2.\n\n### **Step 1:** Anaconda3\nInstall anaconda3 by following the instructions below:\n\n```Shell\ncd $HOME \u0026\u0026 wget https://repo.continuum.io/archive/Anaconda3-5.0.1-Linux-x86_64.sh -O anaconda3.sh\nchmod +x anaconda3.sh \u0026\u0026 ./anaconda3.sh -b -p $HOME/anaconda3 \u0026\u0026 rm anaconda3.sh\n```\n\nNow add anaconda3 to your PATH so that you can use it. For that run the following command:\n\n```Shell\nexport PATH=$HOME/anaconda3/bin:$PATH\n```\n\nNow, verify your conda installation and check the version:\n\n```Shell\nwhich conda\n```\n\nThis command should print the path of your conda bin. If it doesn't, make sure conda is\nin your $PATH.\n\n### **Step 2**: Conda Install Tensor Comprehensions\n\nNow, go ahead and install Tensor Comprehensions by running following commands.\n\n```Shell\nconda install -y -c pytorch -c https://conda.anaconda.org/t/oJuz1IosRLQ5/prigoyal tensor_comprehensions\n```\n\n## Examples and documentation\n\nIn order to explore Tensor Comprehensions (TC), there are few helpful resources to get started:\n\n1. We provide **examples** of TC definitions covering wide range of Deep Learning layers.\n\nThe list of examples we provide are: [avgpool](https://gist.github.com/anonymous/dc0cd7de343922a8c0c0636ccc4889a9#file-test_avgpool_autotune-py), [maxpool](https://gist.github.com/anonymous/dc0cd7de343922a8c0c0636ccc4889a9#file-test_maxpool-py), [matmul](https://gist.github.com/anonymous/dc0cd7de343922a8c0c0636ccc4889a9#file-test_matmul-py), [matmul - give output buffers](https://gist.github.com/anonymous/dc0cd7de343922a8c0c0636ccc4889a9#file-test_matmul_reuse_outputs-py) and [batch-matmul](https://gist.github.com/anonymous/dc0cd7de343922a8c0c0636ccc4889a9#file-test_batchmatmul-py), [convolution](https://gist.github.com/anonymous/dc0cd7de343922a8c0c0636ccc4889a9#file-test_convolution-py), [strided-convolution](https://gist.github.com/anonymous/dc0cd7de343922a8c0c0636ccc4889a9#file-test_convolution_strided-py), [batchnorm](https://gist.github.com/anonymous/dc0cd7de343922a8c0c0636ccc4889a9#file-test_batchnorm-py), [copy](https://gist.github.com/anonymous/dc0cd7de343922a8c0c0636ccc4889a9#file-test_copy-py), [cosine similarity](https://gist.github.com/anonymous/dc0cd7de343922a8c0c0636ccc4889a9#file-test_cosine_similarity-py), [Fully-connected](https://gist.github.com/anonymous/dc0cd7de343922a8c0c0636ccc4889a9#file-test_fc-py), [fused FC + ReLU](https://gist.github.com/anonymous/dc0cd7de343922a8c0c0636ccc4889a9#file-test_fusion_fcrelu-py), [group-convolutions](https://gist.github.com/anonymous/dc0cd7de343922a8c0c0636ccc4889a9#file-test_group_convolution-py), [strided group-convolutions](https://gist.github.com/anonymous/dc0cd7de343922a8c0c0636ccc4889a9#file-test_group_convolution_strided-py), [indexing](https://gist.github.com/anonymous/dc0cd7de343922a8c0c0636ccc4889a9#file-test_indexing-py), [Embedding (lookup table)](https://gist.github.com/anonymous/dc0cd7de343922a8c0c0636ccc4889a9#file-test_lookup_table-py), [small-mobilenet](https://gist.github.com/anonymous/dc0cd7de343922a8c0c0636ccc4889a9#file-test_small_mobilenet-py), [softmax](https://gist.github.com/anonymous/dc0cd7de343922a8c0c0636ccc4889a9#file-test_softmax-py), [tensordot](https://gist.github.com/anonymous/dc0cd7de343922a8c0c0636ccc4889a9#file-test_tensordot-py), [transpose](https://gist.github.com/anonymous/dc0cd7de343922a8c0c0636ccc4889a9#file-test_transpose-py)\n\n2. Tensor Comprehensions are based on **Einstein notation** which is very well explained [here](https://obilaniu6266h16.wordpress.com/2016/02/04/einstein-summation-in-numpy/). This notation is\nalso widely used in Numpy. If you don't know the notation, we recommend doing a 5 minute read of the above link.\n\n3. [TC Documentation](https://facebookresearch.github.io/TensorComprehensions/index.html)\nis a very helpful resource to understand how Tensor Comprehensions are expressed. The sections on\n[introduction](https://facebookresearch.github.io/TensorComprehensions/introduction.html),\n[range inference](https://facebookresearch.github.io/TensorComprehensions/inference.html),\n[semantics](https://facebookresearch.github.io/TensorComprehensions/semantics.html)\nare particularly helpful to get insights into writing Tensor Comprehensions.\n\n4. **Autotuner**: TC provides an evolutionary search based algorithm to automatically tune the kernel.\nYou can read briefly about autotuner [here](https://facebookresearch.github.io/TensorComprehensions/autotuner.html) and look at various [examples of autotuning](https://gist.github.com/anonymous/dc0cd7de343922a8c0c0636ccc4889a9#file-test_autotuner-py).\n\n5. To construct a TC autograd function, [here](https://gist.github.com/anonymous/dc0cd7de343922a8c0c0636ccc4889a9#file-test_convolution_train-py) is one self-descriptive example.\n\n## Going through basics: by example\n\nLet's see few examples of what features Tensor Comprehensions has and what you can do as a starter. I'll pick a simple layer `matmul`\nfor the purpose of examples and start with describing reduction operator we will use.\n\n**Reduction Operator**:\n\n`+=!` operator with `!` means that the output tensor will be initialized to reduction identity i.e. `0` for `+`.\n\nNow, let's cover the basics:\n\n1. New Tensor Comprehension:\n\n```python\nlang = \"\"\"\ndef matmul(float(M,N) A, float(N,K) B) -\u003e (output) {\n  output(i, j) +=! A(i, kk) * B(kk, j)\n}\n\"\"\"\nmatmul = tc.define(lang, name=\"matmul\")\nmat1, mat2 = torch.randn(3, 4).cuda(), torch.randn(4, 5).cuda()\nout = matmul(mat1, mat2)\n```\n\n2. New Tensor Comprehension, Autotune it, run it:\n\n```python\nlang = \"\"\"\ndef matmul(float(M,N) A, float(N,K) B) -\u003e (output) {\n  output(i, j) +=! A(i, kk) * B(kk, j)\n}\n\"\"\"\nmatmul = tc.define(lang, name=\"matmul\")\nmat1, mat2 = torch.randn(100, 400).cuda(), torch.randn(400, 500).cuda()\nbest_options = matmul.autotune(mat1, mat2, **tc.autotuner_default_options)\nout = matmul(mat1, mat2, options=best_options)\n```\n\n3. New Tensor Comprehension, Autotune it, save cache:\n\n```python\nlang = \"\"\"\ndef matmul(float(M,N) A, float(N,K) B) -\u003e (output) {\n  output(i, j) +=! A(i, kk) * B(kk, j)\n}\n\"\"\"\nmatmul = tc.define(lang, name=\"matmul\")\nmatmul.autotune((3, 4), (4, 5), cache=\"matmul_345.tc\", **tc.small_size_autotuner_options)\nmatmul.autotune((100, 400), (400, 500), cache=\"matmul_100400500.tc\", **tc.autotuner_default_options)\n```\n\n**The big advantage of specifying `cache` is that the next time you run the program, the cached autotuned values are used.**\nBeware that if you move to a significantly different type of GPU, then you might want to tune again for maximum performance.\n\n3. Train layer with TC, Autotune it and run it:\n\n```python\nlang = \"\"\"\ndef KRU(float(D2, N2) W2, float(M, N0, N1, N2) X) -\u003e (XW2) {\n   XW2(m, n0, n1, d2)   +=! X(m, n0, n1, n2_red) * W2(d2, n2_red)\n}\ndef KRU_grad(float(D2, N2) W2, float(M, N0, N1, N2) X, float(M, N0, N1, D2) XW2_grad) -\u003e (W2_grad, X_grad)\n{\n   W2_grad(d2, n2)   +=! XW2_grad(m_red, n0_red, n1_red, d2) * X(m_red, n0_red, n1_red, n2)\n   X_grad(m, n0, n1, n2) +=! XW2_grad(m, n0, n1, d2_red) * W2(d2_red, n2)\n}\n\"\"\"\nKRU = tc.define(lang, training=True, name=\"KRU\", backward=\"KRU_grad\")\nX = Variable(torch.randn(256, 16, 16, 16).cuda(), requires_grad=True)\nW2 = Parameter(torch.randn(32, 16)).cuda()\noptions = KRU.autotune(W2, X, **tc.autotuner_default_options)\nout = KRU(W2, X, options=options)\nout[0].sum().backward()\n```\n\n4. Dump out generated CUDA code (for fun?):\n\n```python\nimport tensor_comprehensions as tc\n\ntc.GlobalDebugInit([\"tc\", \"--dump_cuda=true\"])\n\nlang = \"\"\"\ndef matmul(float(M,N) A, float(N,K) B) -\u003e (output) {\n  output(i, j) +=! A(i, kk) * B(kk, j)\n}\n\"\"\"\nmatmul = tc.define(lang, name=\"matmul\")\nmat1, mat2 = torch.randn(3, 4).cuda(), torch.randn(4, 5).cuda()\nout = matmul(mat1, mat2)\n```\n\n5. Inject your own CUDA code and run it (because you might have faster code):\n\n```python\nlang = \"\"\"\ndef add(float(N) A, float(N) B) -\u003e (output) {\n    output(i) = A(i) + B(i) + 1\n}\n\"\"\"\n\ncuda_code = \"\"\"\nextern \"C\"{\n__global__ void my_add(float* __restrict__ output, const float* __restrict__ A, const float* __restrict B) {\n    int t = threadIdx.x;\n    output[t] = A[t] + B[t];\n}\n}\n\"\"\"\n\nadd = tc.define(lang, name=\"add\", inject_kernel=\"my_add\", cuda_code=cuda_code)\na, b = torch.randn(100).cuda(), torch.randn(100).cuda()\nout = add(a, b, grid=[1, 1, 1], block=[100, 1, 1])    # change grid/block for adjusting kernel performance\n```\n\n## Layers that can't be expressed right now\n\n1. Reshaping Tensors inside the language\n2. Dropout : RNGs are not suppported inside TC language, because TC doesn't do internal allocations\n3. Strided \"tensors\" : input Tensors have to be contiguous. If they are not contiguous, they are made contiguous before passing to the TC backend.\n4. RNNs : TC language doesn't have loops yet. You can write them unrolled if you want :)\n\n**We are actively working on these and many more features. If there is some feature that can be very helpful\nto you, please send your request our way.**\n\n## Note about performance / tuning\n\nTensor Comprehensions have an autotuner that uses evolutionary search to find faster kernels.\nHere is what you should know about the polyhederal exploration / evolutionary search:\n\n### Static sizes for autotuning\n\n- The autotuner needs static input sizes (for now). You can not tune a kernel, for say: batchsize between `16 and 32`\n  - you can autotune `avgpool2x2` for input shape `(16, 32, 24, 23)`:\n    ```\n    avgpool.autotune((16, 32, 24, 23), **tc.small_size_autotuner_options, cache=\"16x32x24x23.tc\")\n    ```\n  - if you want to target multiple input shapes, run multiple autotune calls:\n    ```\n    avgpool.autotune((16, 32, 24, 23), **tc.small_size_autotuner_options, cache=\"mysize1.tc\")\n    avgpool.autotune((32, 1, 128, 128), **tc.small_size_autotuner_options, cache=\"mysize2.tc\")\n    ```\n  - The more static we make the sizes, the better and faster the search procedure. Hence, we made this trade-off of only supporting static sizes in the initial release.\n  \n### Autotuning options primer\n\nBy **default**, `tc.autotuner_default_options` is:\n\n```\noptions = {\n    \"threads\": 32, \"generations\": 2, \"pop_size\": 10, \"number_elites\": 1\n}\n```\n\nGood for quick autotuning (2 generations finish quickly)\n\n**good default that runs for a bit longer (maybe in exchange for better performance)**\n\n```\noptions = {\n    \"threads\": 32, \"generations\": 5, \"pop_size\": 10, \"number_elites\": 1\n}\n```\n\n**good default that runs for a LOT longer**\n\n```\noptions = {\n    \"threads\": 32, \"generations\": 25, \"pop_size\": 100, \"number_elites\": 10\n}\n```\n\n\n**brief explanation**\n\n- `threads` - set this to number of CPU cores available.\n- `generations` - 5 to 10 generations is a good number.\n- `pop_size` - 10 is usually reasonable. You can try 10 to 20.\n- `number_elites` - number of candidates preserved intact between generations. `1` is usually sufficient.\n- `min_launch_total_threads` - If you have really input small sizes, set this to `1`.\n- `gpus`: Number of gpus to use for autotuning. Default value is \"0\". Set this to \"0,1\" if you wish to use two gpus.\n\nLook at [docs](https://facebookresearch.github.io/TensorComprehensions/autotuner.html) for more details\n\n## Communication\n\n* **Email**: prigoyal@fb.com\n* **[GitHub](https://github.com/facebookresearch/TensorComprehensions/) issues**: bug reports, feature requests, install issues, RFCs, thoughts, etc.\n* **Slack**: For discussion around framework integration, build support, collaboration, etc. join our slack channel https://tensorcomprehensions.herokuapp.com/.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fprigoyal%2Ftensorcomprehensions","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fprigoyal%2Ftensorcomprehensions","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fprigoyal%2Ftensorcomprehensions/lists"}