{"id":13833145,"url":"https://github.com/springer13/hptt","last_synced_at":"2025-07-09T20:31:26.856Z","repository":{"id":19539133,"uuid":"87284463","full_name":"springer13/hptt","owner":"springer13","description":"High-Performance Tensor Transpose library","archived":false,"fork":false,"pushed_at":"2023-05-13T20:01:14.000Z","size":838,"stargazers_count":185,"open_issues_count":18,"forks_count":42,"subscribers_count":12,"default_branch":"master","last_synced_at":"2024-11-20T16:39:38.992Z","etag":null,"topics":["high-performance-computing","multidimensional-arrays","tensor","tensor-transposition","tensors","transposition"],"latest_commit_sha":null,"homepage":null,"language":"C++","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"bsd-3-clause","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/springer13.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2017-04-05T08:18:19.000Z","updated_at":"2024-11-10T09:30:55.000Z","dependencies_parsed_at":"2023-01-11T20:29:56.155Z","dependency_job_id":"b34785ab-e97e-4157-96e1-f43eee696d62","html_url":"https://github.com/springer13/hptt","commit_stats":null,"previous_names":[],"tags_count":6,"template":false,"template_full_name":null,"purl":"pkg:github/springer13/hptt","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/springer13%2Fhptt","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/springer13%2Fhptt/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/springer13%2Fhptt/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/springer13%2Fhptt/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/springer13","download_url":"https://codeload.github.com/springer13/hptt/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/springer13%2Fhptt/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":264502650,"owners_count":23618664,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["high-performance-computing","multidimensional-arrays","tensor","tensor-transposition","tensors","transposition"],"created_at":"2024-08-04T11:00:40.894Z","updated_at":"2025-07-09T20:31:21.843Z","avatar_url":"https://github.com/springer13.png","language":"C++","readme":"# High-Performance Tensor Transpose library #\n\nHPTT is a high-performance C++ library for out-of-place tensor transpositions of the general form: \n\n![hptt](https://github.com/springer13/hptt/blob/master/misc/equation.png)\n\nwhere A and B respectively denote the input and output tensor;\n\u003cimg src=https://github.com/springer13/hptt/blob/master/misc/pi.png height=16px/\u003e represents the user-specified\ntransposition, and \n\u003cimg src=https://github.com/springer13/hptt/blob/master/misc/alpha.png height=14px/\u003e and\n\u003cimg src=https://github.com/springer13/hptt/blob/master/misc/beta.png height=16px/\u003e being scalars\n(i.e., setting \u003cimg src=https://github.com/springer13/hptt/blob/master/misc/beta.png height=16px/\u003e != 0 enables the user to update the output tensor B).\n\n# Key Features\n\n* Multi-threading support\n* Explicit vectorization\n* Auto-tuning (akin to FFTW)\n    * Loop order\n    * Parallelization\n* Multi architecture support\n    * Explicitly vectorized kernels for (AVX and ARM)\n* Supports float, double, complex and double complex data types\n* Supports both column-major and row-major data layouts\n\nHPTT now also offers C- and Python-interfaces (see below).\n\n# Requirements\n\nYou must have a working C++ compiler with c++11 support. I have tested HPTT with:\n\n* Intel's ICPC 15.0.3, 16.0.3, 17.0.2\n* GNU g++ 5.4, 6.2, 6.3\n* clang++ 3.8, 3.9\n\n\n# Install\n\nClone the repository into a desired directory and change to that location:\n\n    git clone https://github.com/springer13/hptt.git\n    cd hptt\n    export CXX=\u003cdesired compiler\u003e\n\nNow you have several options to build the desired version of the library:\n\n    make avx\n    make arm\n    make scalar\n\nUsing CMake:\n    mkdir build \u0026\u0026 cd build\n    cmake .. -DCMAKE_C_COMPILER=gcc -DCMAKE_CXX_COMPILER=g++\n    #Optionally one of [-DENABLE_ARM=ON -DENABLE_AVX=ON -DENABLE_IBM=ON]    \n\nThis should create 'libhptt.so' inside the ./lib folder.\n\n\n# Getting Started\n\nPlease have a look at the provided benchmark.cpp.\n\nIn general HPTT is used as follows:\n\n    #include \u003chptt.h\u003e\n\n    // allocate tensors\n    float A* = ...\n    float B* = ...\n\n    // specify permutation and size\n    int dim = 6;\n    int perm[dim] = {5,2,0,4,1,3};\n    int size[dim] = {48,28,48,28,28};\n\n    // create a plan (shared_ptr)\n    auto plan = hptt::create_plan( perm, dim, \n                                   alpha, A, size, NULL, \n                                   beta,  B, NULL, \n                                   hptt::ESTIMATE, numThreads);\n\n    // execute the transposition\n    plan-\u003eexecute();\n\nThe example above does not use any auto-tuning, but solely relies on HPTT's\nperformance model. To active auto-tuning, please use hptt::MEASURE, or\nhptt::PATIENT instead of hptt::ESTIMATE.\n\n\n## C-Interface\n\nHPTT also offeres a C-interface. This interface is less expressive than its C++\ncounter part since it does not expose control over the plan.\n\n    void sTensorTranspose( const int *perm, const int dim,\n            const float alpha, const float *A, const int *sizeA, const int *outerSizeA, \n            const float beta,        float *B,                   const int *outerSizeB, \n            const int numThreads, const int useRowMajor);\n\n    void dTensorTranspose( const int *perm, const int dim,\n            const double alpha, const double *A, const int *sizeA, const int *outerSizeA, \n            const double beta,        double *B,                   const int *outerSizeB, \n            const int numThreads, const int useRowMajor);\n    ...\n\n## Python-Interface\n\nHPTT now also offers a python-interface. The functionality offered by HPTT is comparable to [numpy.transpose](https://docs.scipy.org/doc/numpy/reference/generated/numpy.transpose.html)\nwith the difference being that HPTT can also update the output tensor.\n\n    tensorTransposeAndUpdate( perm, alpha, A, beta, B, numThreads=-1)\n\n    tensorTranspose( perm, alpha, A, numThreads=-1)\n\nSee docstring for additional information. Based on those there are also the following drop-in replacements for ``numpy`` functions:\n\n    hptt.transpose(A, axes)\n    hptt.ascontiguousarray(A)\n    hptt.asfortranarray(A)\n\nInstallation should be straight forward via:\n\n    cd ./pythonAPI\n    python setup.py install\n\nor \n\n    pip install -U .\n\nif you want a ``pip`` managed install. At this point you should be able to import the 'hptt' package within your python scripts.\n\nThe python interface also offers support for:\n\n* Single and double precision\n* Column-major and row-major data layouts\n* multi-threading support (HPTT by default utilizes all cores of a system)\n\n### Python Benchmark\n\nYou can find an elaborate example under ./pythonAPI/benchmark/benchmark.py --help\n\n* Multi-threaded 2x Intel Haswell-EP E5-2680 v3 (24 threads)\n  * Comparison again [numpy.transpose](https://docs.scipy.org/doc/numpy/reference/generated/numpy.transpose.html)\n\n![hptt](https://github.com/springer13/hptt/blob/master/misc/hptt_vs_numpy.png)\n\n# Documentation\n\nYou can generate the doxygen documentation via\n\n    make doc\n\n\n# Benchmark\n\nThe benchmark is the same as the original TTC benchmark [benchmark for tensor transpositions](https://github.com/HPAC/TTC/blob/master/benchmark).\n\nYou can compile the benchmark via:\n\n    cd benchmark\n    make\n\nBefore running the benchmark, please modify the number of threads and the thread\naffinity within the benchmark.sh file. To run the benchmark just use:\n\n    ./benshmark.sh\n\nThis will create hptt_benchmark.dat file containing all the runtime information\nof HPTT and the reference implementation.\n\n# Performance Results\n\n![hptt](https://github.com/springer13/hptt/blob/master/benchmark/bw.png)\n\nSee [(pdf)](https://arxiv.org/abs/1704.04374) for details.\n\n# TODOs\n\n* Add explicit vectorization for IBM power\n* Add explicit vectorization for complex types\n\n\n# Related Projects\n\n* Shared-Memory Tensor Contractions: \n    * [TCL](https://github.com/springer13/tcl)\n    * [TBLIS](https://github.com/devinamatthews/tblis)\n* Distributed-Memory Tensor Contractions:\n    * [CTF](https://github.com/cyclops-community/ctf)\n    * [libtensor](https://github.com/epifanovsky/libtensor)\n* Tensor network codes:\n    * [ITensor](http://itensor.org/)\n    * [Uni10](http://yingjerkao.github.io/uni10/)\n\n# Citation\n\nIn case you want refer to HPTT as part of a research paper, please cite the following\narticle [(pdf)](https://arxiv.org/abs/1704.04374):\n```\n@inproceedings{hptt2017,\n author = {Springer, Paul and Su, Tong and Bientinesi, Paolo},\n title = {{HPTT}: {A} {H}igh-{P}erformance {T}ensor {T}ransposition {C}++ {L}ibrary},\n booktitle = {Proceedings of the 4th ACM SIGPLAN International Workshop on Libraries, Languages, and Compilers for Array Programming},\n series = {ARRAY 2017},\n year = {2017},\n isbn = {978-1-4503-5069-3},\n location = {Barcelona, Spain},\n pages = {56--62},\n numpages = {7},\n url = {http://doi.acm.org/10.1145/3091966.3091968},\n doi = {10.1145/3091966.3091968},\n acmid = {3091968},\n publisher = {ACM},\n address = {New York, NY, USA},\n keywords = {High-Performance Computing, autotuning, multidimensional transposition, tensor transposition, tensors, vectorization},\n}\n``` \n","funding_links":[],"categories":["numerical tools"],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fspringer13%2Fhptt","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fspringer13%2Fhptt","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fspringer13%2Fhptt/lists"}