{"id":20089541,"url":"https://github.com/bassoy/ttv","last_synced_at":"2025-10-25T05:32:33.976Z","repository":{"id":43042963,"uuid":"188558496","full_name":"bassoy/ttv","owner":"bassoy","description":"C++ Header-Only Library for High-Performance Tensor-Vector Multiplication","archived":false,"fork":false,"pushed_at":"2024-04-19T09:05:51.000Z","size":6178,"stargazers_count":19,"open_issues_count":2,"forks_count":4,"subscribers_count":4,"default_branch":"master","last_synced_at":"2024-04-19T19:11:47.140Z","etag":null,"topics":["arrays","blas","c-plus-plus","fast","high-performance","multidimensional","multilinear-algebra","tensor","tensor-contraction","tensor-library","tensor-times-vector","tensor-vector-multiplication","tensor-vector-multiplications"],"latest_commit_sha":null,"homepage":"","language":"C++","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"lgpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/bassoy.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null}},"created_at":"2019-05-25T11:35:23.000Z","updated_at":"2024-04-23T12:46:33.474Z","dependencies_parsed_at":"2024-04-23T12:46:28.639Z","dependency_job_id":"42e1945d-218b-4ad2-8acb-eb2b162a3f00","html_url":"https://github.com/bassoy/ttv","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bassoy%2Fttv","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bassoy%2Fttv/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bassoy%2Fttv/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bassoy%2Fttv/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/bassoy","download_url":"https://codeload.github.com/bassoy/ttv/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":224480880,"owners_count":17318335,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["arrays","blas","c-plus-plus","fast","high-performance","multidimensional","multilinear-algebra","tensor","tensor-contraction","tensor-library","tensor-times-vector","tensor-vector-multiplication","tensor-vector-multiplications"],"created_at":"2024-11-13T16:18:18.964Z","updated_at":"2025-10-25T05:32:28.938Z","avatar_url":"https://github.com/bassoy.png","language":"C++","funding_links":[],"categories":[],"sub_categories":[],"readme":"High-Performance Tensor-Vector Multiplication Library (TTV)\n=====\n[![Language](https://img.shields.io/badge/C%2B%2B-17-blue.svg)](https://en.wikipedia.org/wiki/C%2B%2B#Standardization)\n[![License](https://img.shields.io/badge/license-GPL-blue.svg)](https://github.com/bassoy/ttv/blob/master/LICENSE)\n[![Wiki](https://img.shields.io/badge/ttv-wiki-blue.svg)](https://github.com/bassoy/ttv/wiki)\n[![Discussions](https://img.shields.io/badge/ttv-discussions-blue.svg)](https://github.com/bassoy/ttv/discussions)\n[![Build Status](https://github.com/bassoy/ttv/actions/workflows/test.yml/badge.svg)](https://github.com/bassoy/ttv/actions)\n\n## Summary\n**TTV** is C++ high-performance tensor-vector multiplication **header-only library**\nIt provides free C++ functions for parallel computing the **mode-`q` tensor-times-vector product** of the general form\n\n$$\n\\underline{\\mathbf{C}} = \\underline{\\mathbf{A}} \\times_q \\mathbf{b} \\quad :\\Leftrightarrow \\quad\n\\underline{\\mathbf{C}} (i_1, \\dots, i_{q-1}, i_{q+1}, \\dots, i_p) = \\sum_{i_q=1}^{n_q} \\underline{\\mathbf{A}}({i_1, \\dots, i_q,  \\dots, i_p}) \\cdot \\mathbf{b}({i_q}).\n$$\n\nwhere $q$ is the contraction mode, $\\underline{\\mathbf{A}}$ and $\\underline{\\mathbf{C}}$ are tensors of order $p$ and $p-1$ with shapes $\\mathbf{n}\\_a= (n\\_1,\\dots n\\_{q-1},n\\_q ,n\\_{q+1},\\dots,n\\_p)$ and $\\mathbf{n}\\_c = (n\\_1,\\dots,n\\_{q-1},n\\_{q+1},\\dots,n\\_p)$, respectively. $\\mathbf{b}$ is a vector of length $n\\_{q}$.\n\nAll function implementations are based on the Loops-Over-GEMM (LOG) approach and utilize high-performance `GEMV` or `DOT` routines of a `BLAS` implementation such as OpenBLAS or Intel MKL.\nImplementation details and runtime behevior of the tensor-vector multiplication functions are described in the [research paper article](https://link.springer.com/chapter/10.1007/978-3-030-22734-0_3).\n\nPlease have a look at the [wiki](https://github.com/bassoy/ttv/wiki) page for more informations about the **usage**, **function interfaces** and the **setting parameters**.\n\n## Key Features\n\n### Flexibility\n* Contraction mode `q`, tensor order `p`, tensor extents `n` and tensor layout `pi` can be chosen at runtime\n* Supports any non-hierarchical storage format inlcuding the first-order and last-order storage layouts\n* Offers two high-level and one C-like low-level interfaces for calling the tensor-times-vector multiplication\n* Implemented independent of a tensor data structure (can be used with `std::vector` and `std::array`)\n* Supports float, double, complex and double complex data types (and more if a BLAS library is not used)\n\n### Performance\n* Multi-threading support with OpenMP\n* Can be used with and without a BLAS implementation\n* Performs in-place operations without transposing the tensor - no extra memory needed\n* For large tensors reaches peak matrix-times-vector performance\n\n### Requirements\n* Requires the tensor elements to be contiguously stored in memory.\n* Element types must be an arithmetic type suporting multiplication and addition operator\n\n## Experiments\n\nThe experiments were carried out on a Core i9-7900X Intel Xeon processor with 10 cores and 20 hardware threads running at 3.3 GHz.\nThe source code has been compiled with GCC v7.3 using the highest optimization level `-Ofast` and `-march=native`, `-pthread` and `-fopenmp`. \nParallel execution has been accomplished using GCC ’s implementation of the OpenMP v4.5 specification. \nWe have used the `dot` and `gemv` implementation of the OpenBLAS library v0.2.20. \nThe benchmark results of each of the following functions are the average of 10 runs.\n\nThe comparison includes three state-of-the-art libraries that implement three different approaches. \n* [TCL](https://github.com/springer13/tcl) (v0.1.1 ) implements the TTGT approach. \n* [TBLIS](https://github.com/devinamatthews/tblis) ( v1.0.0 ) implements the GETT approach.\n* [EIGEN](https://bitbucket.org/eigen/eigen/src/default/) ( v3.3.90 ) sequentially executes the tensor-times-vector in-place.\n\nThe experiments have been carried out with asymmetrically-shaped and symmetrically-shaped tensors in order to provide a comprehensive test coverage where\nthe tensor elements are stored according to the first-order storage format.\nThe tensor order of the asymmetrically- and symmetrically-shaped tensors have been varied from `2` to `10` and `2` to `7`, respectively.\nThe contraction mode `q` has also been varied from `1` to the tensor order `p`.\n\n### Symmetrically-Shaped Tensors\n\n**TTV** has been executed with parameters `tlib::execution::blas`, `tlib::slicing::large` and `tlib::loop_fusion::all`\n\n\u003ctable\u003e\n\u003ctr\u003e\n\u003ctd\u003e\u003cimg src=\"https://github.com/bassoy/ttv/blob/master/misc/symmetric_throughput_single_precision.png\" alt=\"Drawing\" style=\"width: 250px;\"/\u003e \u003c/td\u003e\n\u003ctd\u003e\u003cimg src=\"https://github.com/bassoy/ttv/blob/master/misc/symmetric_speedup_single_precision.png\" alt=\"Drawing\" style=\"width: 250px;\"/\u003e \u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e \n\u003ctd\u003e \u003cimg src=\"https://github.com/bassoy/ttv/blob/master/misc/symmetric_throughput_double_precision.png\" alt=\"Drawing\" style=\"width: 250px;\"/\u003e \u003c/td\u003e\n\u003ctd\u003e \u003cimg src=\"https://github.com/bassoy/ttv/blob/master/misc/symmetric_speedup_double_precision.png\" alt=\"Drawing\" style=\"width: 250px;\"/\u003e \u003c/td\u003e\n\u003c/tr\u003e\n\u003c/table\u003e\n\n### Asymmetrically-Shaped Tensors\n\n**TTV** has been executed with parameters `tlib::execution::blas`, `tlib::slicing::small` and `tlib::loop_fusion::all`\n\n\u003ctable\u003e\n\u003ctr\u003e\n\u003ctd\u003e\u003cimg src=\"https://github.com/bassoy/ttv/blob/master/misc/nonsymmetric_throughput_single_precision.png\" alt=\"Drawing\" style=\"width: 250px;\"/\u003e \u003c/td\u003e\n\u003ctd\u003e\u003cimg src=\"https://github.com/bassoy/ttv/blob/master/misc/nonsymmetric_speedup_single_precision.png\" alt=\"Drawing\" style=\"width: 250px;\"/\u003e \u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e \n\u003ctd\u003e \u003cimg src=\"https://github.com/bassoy/ttv/blob/master/misc/nonsymmetric_throughput_double_precision.png\" alt=\"Drawing\" style=\"width: 250px;\"/\u003e \u003c/td\u003e\n\u003ctd\u003e \u003cimg src=\"https://github.com/bassoy/ttv/blob/master/misc/nonsymmetric_speedup_double_precision.png\" alt=\"Drawing\" style=\"width: 250px;\"/\u003e \u003c/td\u003e\n\u003c/tr\u003e\n\u003c/table\u003e\n\n\n\n## Example \n```cpp\n/*main.cpp*/\n#include \u003cvector\u003e\n#include \u003cnumeric\u003e\n#include \u003ciostream\u003e\n#include \u003ctlib/ttv.h\u003e\n\n\nint main()\n{\n  const auto q = 2ul; // contraction mode\n  \n  auto A = tlib::tensor\u003cfloat\u003e( {4,3,2} ); \n  auto B = tlib::tensor\u003cfloat\u003e( {3,1}   );\n  std::iota(A.begin(),A.end(),1);\n  std::fill(B.begin(),B.end(),1);\n\n/*\n  A =  { 1  5  9  | 13 17 21\n         2  6 10  | 14 18 22\n         3  7 11  | 15 19 23\n         4  8 12  | 16 20 24 };\n\n  B =   { 1 1 1 } ;\n*/\n\n  // computes mode-2 tensor-times-vector product with C(i,j) = A(i,k,j) * B(k)\n  auto C1 = A (q)* B; \n  \n/*\n  C =  { 1+5+ 9 | 13+17+21\n         2+6+10 | 14+18+22\n         3+7+11 | 15+19+23\n         4+8+12 | 16+20+24 };\n*/\n}\n```\nCompile with `g++ -I../include/ -std=c++17 -Ofast -fopenmp main.cpp -o main` and additionally `-DUSE_OPENBLAS` or `-DUSE_INTELBLAS`  for fast execution.\n\n# Citation\n\nIf you want to refer to TTV as part of a research paper, please cite the article [Design of a High-Performance Tensor-Vector Multiplication with BLAS](https://link.springer.com/chapter/10.1007/978-3-030-22734-0_3)\n\n```\n@inproceedings{ttv:bassoy:2019,\n  author=\"Bassoy, Cem\",\n  editor=\"Rodrigues, Jo{\\~a}o M. F. and Cardoso, Pedro J. S. and Monteiro, J{\\^a}nio and Lam, Roberto and Krzhizhanovskaya, Valeria V. and Lees, Michael H. and Dongarra, Jack J. and Sloot, Peter M.A.\",\n  title=\"Design of a High-Performance Tensor-Vector Multiplication with BLAS\",\n  booktitle=\"Computational Science -- ICCS 2019\",\n  year=\"2019\",\n  publisher=\"Springer International Publishing\",\n  address=\"Cham\",\n  pages=\"32--45\",\n  isbn=\"978-3-030-22734-0\"\n}\n``` \n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbassoy%2Fttv","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fbassoy%2Fttv","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbassoy%2Fttv/lists"}