{"id":17108768,"url":"https://github.com/podgorskiy/tensor4","last_synced_at":"2025-04-13T02:56:57.269Z","repository":{"id":57474120,"uuid":"149834427","full_name":"podgorskiy/tensor4","owner":"podgorskiy","description":"tensor4 - pytorch to C++ convertor using lightweight templated tensor library","archived":false,"fork":false,"pushed_at":"2019-12-18T04:04:23.000Z","size":336,"stargazers_count":28,"open_issues_count":0,"forks_count":2,"subscribers_count":4,"default_branch":"master","last_synced_at":"2025-04-13T02:55:56.114Z","etag":null,"topics":["cpp","deep-learning","deep-neural-networks","generator","python","pytorch","tensor"],"latest_commit_sha":null,"homepage":"","language":"C++","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/podgorskiy.png","metadata":{"files":{"readme":"README.rst","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2018-09-22T01:14:19.000Z","updated_at":"2023-10-11T00:32:51.000Z","dependencies_parsed_at":"2022-09-12T19:30:21.171Z","dependency_job_id":null,"html_url":"https://github.com/podgorskiy/tensor4","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/podgorskiy%2Ftensor4","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/podgorskiy%2Ftensor4/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/podgorskiy%2Ftensor4/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/podgorskiy%2Ftensor4/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/podgorskiy","download_url":"https://codeload.github.com/podgorskiy/tensor4/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248657873,"owners_count":21140844,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cpp","deep-learning","deep-neural-networks","generator","python","pytorch","tensor"],"created_at":"2024-10-14T16:06:12.154Z","updated_at":"2025-04-13T02:56:57.247Z","avatar_url":"https://github.com/podgorskiy.png","language":"C++","readme":"tensor4 - single header, lightweight tensor library for C++. PyTorch trace to tensor4 convertor included.\n================================\n\nThis project was born as a fun experiment and can be useful because it is extremely lightweight.\nIt can be used as a standalone C++ tensor library, as well as with converter that can convert PyTorch traces to C++ code.\n\nFeatures:\n * Single header library\n * Using PyTorch trace to generate C++ code that defines the network.\n * No dependencies\n * Inference only, no gradients.\n * Easy to use, simple to embed.\n * CPU only\n * Can be compiled to WebAssembly\n * ~ 2k lines of code.\n \nWhat it can do?:\n * Convert some PyTorch graphs to C++ code\n * Can run *DenseNet*, *ResNet*, *AlexNet*, *Vgg16*.\n * Produces a very small binary footprint onto executable. Executable that can run DenseNet is about 100kb.\n\nCan work in three modes:\n * No dependencies, single thread\n * No dependencies, but OpenMP for threading.\n * MKL + OpenMP\n \nWhen not using MKL, the internal implementation of GEMM is used.\n\n**DCGAN** web demo: http://podgorskiy.com/static/dcgan/dcgan.html\n\n**StyleGAN** web demo: http://podgorskiy.com/static/stylegan/stylegan.html\n\n\nTODO:\n * Add support of ONNX, instead of parsing PyTorch trace.\n \nExampe:\n\n.. code:: python\n    \n    alexnet = torchvision.models.alexnet(pretrained=True)\n    alexnet.eval()\n \n    ...\n    \n    out = tensor4.generate(alexnet, args=(im,)) #im some test tensor of the same type/size as expected for the input\n    \nWill produce:\n\nheader\n\n.. code:: cpp\n\n  #include \"tensor4.h\"\n\n\n  struct AlexNet\n  {\n   t4::tensor4f features_0_weight;\n   t4::tensor1f features_0_bias;\n   t4::tensor4f features_3_weight;\n   t4::tensor1f features_3_bias;\n   t4::tensor4f features_6_weight;\n   t4::tensor1f features_6_bias;\n   t4::tensor4f features_8_weight;\n   t4::tensor1f features_8_bias;\n   t4::tensor4f features_10_weight;\n   t4::tensor1f features_10_bias;\n   t4::tensor2f classifier_1_weight;\n   t4::tensor1f classifier_1_bias;\n   t4::tensor2f classifier_4_weight;\n   t4::tensor1f classifier_4_bias;\n   t4::tensor2f classifier_6_weight;\n   t4::tensor1f classifier_6_bias;\n  };\n\n\n  AlexNet AlexNetLoad(const char* filename);\n\n  t4::tensor2f AlexNetForward(const AlexNet\u0026 ctx, t4::tensor4f x0);\n\nC++ file with definitions of forwatd pass function and weight loading function:\n\n.. code:: cpp\n\n    #include \"AlexNet.h\"\n\n\n    AlexNet AlexNetLoad(const char* filename)\n    {\n     AlexNet ctx;\n     t4::model_dict dict = t4::load(filename);\n     dict.load(ctx.features_0_weight, \"features.0.weight\", 64, 3, 11, 11);\n     dict.load(ctx.features_0_bias, \"features.0.bias\", 64);\n     dict.load(ctx.features_3_weight, \"features.3.weight\", 192, 64, 5, 5);\n     dict.load(ctx.features_3_bias, \"features.3.bias\", 192);\n     dict.load(ctx.features_6_weight, \"features.6.weight\", 384, 192, 3, 3);\n     dict.load(ctx.features_6_bias, \"features.6.bias\", 384);\n     dict.load(ctx.features_8_weight, \"features.8.weight\", 256, 384, 3, 3);\n     dict.load(ctx.features_8_bias, \"features.8.bias\", 256);\n     dict.load(ctx.features_10_weight, \"features.10.weight\", 256, 256, 3, 3);\n     dict.load(ctx.features_10_bias, \"features.10.bias\", 256);\n     dict.load(ctx.classifier_1_weight, \"classifier.1.weight\", 4096, 9216);\n     dict.load(ctx.classifier_1_bias, \"classifier.1.bias\", 4096);\n     dict.load(ctx.classifier_4_weight, \"classifier.4.weight\", 4096, 4096);\n     dict.load(ctx.classifier_4_bias, \"classifier.4.bias\", 4096);\n     dict.load(ctx.classifier_6_weight, \"classifier.6.weight\", 1000, 4096);\n     dict.load(ctx.classifier_6_bias, \"classifier.6.bias\", 1000);\n     return ctx;\n    }\n\n\n    t4::tensor2f AlexNetForward(const AlexNet\u0026 ctx, t4::tensor4f x0)\n    {\n     t4::tensor4f x17 = t4::Conv2d\u003c11, 11, 4, 4, 2, 2, 1, 1\u003e(x0, ctx.features_0_weight, ctx.features_0_bias); //features.0\n     t4::release(x0);\n     t4::tensor4f x18 = t4::ReluInplace(x17); //features.1\n     t4::release(x17);\n     t4::tensor4f x19 = t4::MaxPool2d\u003c3, 3, 2, 2, 0, 0\u003e(x18); //features.2\n     t4::release(x18);\n     t4::tensor4f x20 = t4::Conv2d\u003c5, 5, 1, 1, 2, 2, 1, 1\u003e(x19, ctx.features_3_weight, ctx.features_3_bias); //features.3\n     t4::release(x19);\n     t4::tensor4f x21 = t4::ReluInplace(x20); //features.4\n     t4::release(x20);\n     t4::tensor4f x22 = t4::MaxPool2d\u003c3, 3, 2, 2, 0, 0\u003e(x21); //features.5\n     t4::release(x21);\n     t4::tensor4f x23 = t4::Conv2d\u003c3, 3, 1, 1, 1, 1, 1, 1\u003e(x22, ctx.features_6_weight, ctx.features_6_bias); //features.6\n     t4::release(x22);\n     t4::tensor4f x24 = t4::ReluInplace(x23); //features.7\n     t4::release(x23);\n     t4::tensor4f x25 = t4::Conv2d\u003c3, 3, 1, 1, 1, 1, 1, 1\u003e(x24, ctx.features_8_weight, ctx.features_8_bias); //features.8\n     t4::release(x24);\n     t4::tensor4f x26 = t4::ReluInplace(x25); //features.9\n     t4::release(x25);\n     t4::tensor4f x27 = t4::Conv2d\u003c3, 3, 1, 1, 1, 1, 1, 1\u003e(x26, ctx.features_10_weight, ctx.features_10_bias); //features.10\n     t4::release(x26);\n     t4::tensor4f x28 = t4::ReluInplace(x27); //features.11\n     t4::release(x27);\n     t4::tensor4f x29 = t4::MaxPool2d\u003c3, 3, 2, 2, 0, 0\u003e(x28); //features.12\n     t4::release(x28);\n     t4::tensor2f x30 = t4::Flatten\u003c1\u003e(x29);\n     t4::release(x29);\n     t4::tensor2f x31 = t4::Dropout(x30, 0.5f); //classifier.0\n     t4::release(x30);\n     t4::tensor2f x33 = t4::Linear(x31, ctx.classifier_1_weight, ctx.classifier_1_bias); //classifier.1\n     t4::release(x31);\n     t4::tensor2f x34 = t4::ReluInplace(x33); //classifier.2\n     t4::release(x33);\n     t4::tensor2f x35 = t4::Dropout(x34, 0.5f); //classifier.3\n     t4::release(x34);\n     t4::tensor2f x37 = t4::Linear(x35, ctx.classifier_4_weight, ctx.classifier_4_bias); //classifier.4\n     t4::release(x35);\n     t4::tensor2f x38 = t4::ReluInplace(x37); //classifier.5\n     t4::release(x37);\n     t4::tensor2f x39 = t4::Linear(x38, ctx.classifier_6_weight, ctx.classifier_6_bias); //classifier.6\n     t4::release(x38);\n     return x39;\n    }\n\nAlso it produces a binary with weights.\n\nHow differently it runs compared to pytorch?\n-----\n\nFor the case of AlexNet and test example:\n\n.. figure:: https://raw.githubusercontent.com/podgorskiy/tensor4/master/examples/common/alexnet224x224_input.png\n   :alt: hello-world\n\nPredictions made by tensor4:\n\n.. code:: \n\n  68.935448%: speedboat\n  23.621313%: amphibian, amphibious vehicle\n  2.844828%: container ship, containership, container vessel\n  0.931512%: fireboat\n  0.624658%: lifeboat\n  0.594834%: sandbar, sand bar\n  0.526897%: submarine, pigboat, sub, U-boat\n  0.292151%: canoe\n  0.263978%: paddle, boat paddle\n  0.263804%: trimaran\n\nPytorch output:\n\n.. code:: \n\n  68.935245% speedboat\n  23.621449% amphibian, amphibious vehicle\n  2.844823% container ship, containership, container vessel\n  0.931520% fireboat\n  0.624658% lifeboat\n  0.594838% sandbar, sand bar\n  0.526899% submarine, pigboat, sub, U-boat\n  0.292150% canoe\n  0.263979% paddle, boat paddle\n  0.263808% trimaran\n\nThe difference is due to differences of float point nubares rounding. \n\n+--------------+-------------------+\n|              | Inference time:   |\n+==============+===================+\n| Pytorch CPU  | 41.5ms            |\n+--------------+-------------------+\n| tensor4      | 82.0ms            |\n+--------------+-------------------+\n| tensor4 + MKL| 32.4ms            |\n+--------------+-------------------+\n\n\ntensor4 has a naive GEMM implementation, however you can enable using the one from MKL: cblas_sgemm.\n\nRow *tensor4 + MKL* in the table above corresponds to the case, when instead of naive GEMM, MKL is used.\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpodgorskiy%2Ftensor4","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fpodgorskiy%2Ftensor4","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpodgorskiy%2Ftensor4/lists"}