{"id":44052549,"url":"https://github.com/pulp-platform/pulp-transformer","last_synced_at":"2026-02-07T23:36:49.918Z","repository":{"id":189013873,"uuid":"459931925","full_name":"pulp-platform/pulp-transformer","owner":"pulp-platform","description":null,"archived":false,"fork":false,"pushed_at":"2024-04-02T09:14:19.000Z","size":62,"stargazers_count":2,"open_issues_count":1,"forks_count":2,"subscribers_count":3,"default_branch":"main","last_synced_at":"2024-04-02T10:55:14.768Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"C","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/pulp-platform.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.md","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null}},"created_at":"2022-02-16T09:19:29.000Z","updated_at":"2024-02-18T09:26:20.000Z","dependencies_parsed_at":"2024-04-02T10:29:52.510Z","dependency_job_id":"c5d79203-eecd-4073-9748-1a218b8c5d6a","html_url":"https://github.com/pulp-platform/pulp-transformer","commit_stats":null,"previous_names":["pulp-platform/pulp-transformer"],"tags_count":1,"template":false,"template_full_name":null,"purl":"pkg:github/pulp-platform/pulp-transformer","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pulp-platform%2Fpulp-transformer","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pulp-platform%2Fpulp-transformer/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pulp-platform%2Fpulp-transformer/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pulp-platform%2Fpulp-transformer/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/pulp-platform","download_url":"https://codeload.github.com/pulp-platform/pulp-transformer/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pulp-platform%2Fpulp-transformer/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":29212755,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-02-07T23:36:15.537Z","status":"ssl_error","status_checked_at":"2026-02-07T23:36:12.879Z","response_time":63,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2026-02-07T23:36:49.313Z","updated_at":"2026-02-07T23:36:49.912Z","avatar_url":"https://github.com/pulp-platform.png","language":"C","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Optimizing the Deployment of Tiny Transformers on Low-Power MCUs\n*by Victor J.B. Jung,*\n*Alessio Burrello,*\n*Moritz Scherer,*\n*Francesco Conti,*\n*Luca Benini*\n\n[[`paper (arxiv)`](https://arxiv.org/abs/2404.02945)]\n\nThis repository contains code used in *Optimizing the Deployment of Tiny Transformers on Low-Power MCUs*.\n\n## Abstract\n\nTransformer networks are rapidly becoming SotA in many fields, such as NLP and CV. Similarly to CNNs, there is a strong push for deploying Transformer models at the extreme edge, ultimately fitting the tiny power budget and memory footprint of MCUs. However, the early approaches in this direction are mostly ad-hoc, platform, and model-specific. This work aims to enable and optimize the flexible, multi-platform deployment of encoder Tiny Transformers on commercial MCUs. We propose a complete framework to perf orm end-to-end deployment of Transformer models onto single and multi-core MCUs. Our framework provides an optimized library of kernels to maximize data reuse and avoid unnecessary data marshaling operations into the crucial attention block. A novel MHSA inference schedule, named FWSA, is introduced, fusing the linear projection weights offline to further reduce the number of operations and parameters. Furthermore, to mitigate the memory peak reached by the computation of the attention map, we present a DFT scheme for MHSA tailored for cache-less MCU devices that allows splitting the computation of the attention map into successive steps, never materializing the whole matrix in memory. We evaluate our framework on three different MCU classes exploiting ARM and RISC-V ISA, namely the STM32H7 (ARM Cortex M7), the STM32L4 (ARM Cortex M4), and GAP9 (RV32IMC-XpulpV2). We reach an average of 4.79x and 2.0x lower latency compared to SotA libraries CMSIS-NN (ARM) and PULP-NN (RISC-V), respectively. Moreover, we show that our MHSA depth-first tiling scheme reduces the memory peak by up to 6.19x, while the fused-weight attention can reduce the runtime by 1.53x, and number of parameters by 25%. Leveraging the optimizations proposed in this work, we run end-to-end inference of three SotA Tiny Transformers for three applications characterized by different input dimensions and network hyperparameters. We report significant improvements across the networks: for instance, when executing a transformer block for the task of radar-based hand-gesture recognition on GAP9, we achieve a latency of 0.14ms and energy consumption of 4.92 micro-joules, 2.32x lower than the SotA PULP-NN library on the same platform. \n\n## Kernel test harness\n\nStart by creating a fresh python environement with `Python 3.10` and install required packages with:\n```\npip install -r ./Test/testEnv.txt\n```\nYou will also need to install the latest version of the [GAP SDK](https://github.com/GreenWaves-Technologies/gap_sdk).\n\nTo run the test harness, it is as simple as running the `kernelTest.sh` script.\n\nYou can configure which test you want to run and select the hyperparameters of the tests by modifying the `testConfig.yaml` file.\n\nFor instance this is the config file to run the Q linear projection kernel with the parameters of the EEGFormer:\n```\ncores: 8\nseed: 42\n\n### MHSA Parameters ###\nS:\n  - 81\nE: \n  - 32\nP:\n  - 32\nH: \n  - 8\n\ntestToRun:\n  - projQK\n\n# Projection QK\nprojQK:\n  kernelName: linearQK_4x2_H\n  appFolder: ./Application/GAP9LinProjQK\n  inputGen: generateInputsQKV\n  templateGen: generateTemplateQKV\n  goldenKernel: linearProjectionQK\n  platform: gvsoc\n```\n\nIf you want to run more than one test a the time you can simply add more test to the `testToRun` list in the config file.\n\n## Citation\n\nIf you use our work or find it valuable, please cite us with:\n\n```\n@misc{jung2024optimizingdeploymenttinytransformers,\n      title={Optimizing the Deployment of Tiny Transformers on Low-Power MCUs}, \n      author={Victor J. B. Jung and Alessio Burrello and Moritz Scherer and Francesco Conti and Luca Benini},\n      year={2024},\n      eprint={2404.02945},\n      archivePrefix={arXiv},\n      primaryClass={cs.LG},\n      url={https://arxiv.org/abs/2404.02945}, \n}\n```","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpulp-platform%2Fpulp-transformer","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fpulp-platform%2Fpulp-transformer","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpulp-platform%2Fpulp-transformer/lists"}