{"id":15015758,"url":"https://github.com/futurecomputing4ai/hrrformer","last_synced_at":"2025-04-12T09:31:16.612Z","repository":{"id":171200912,"uuid":"638621257","full_name":"FutureComputing4AI/Hrrformer","owner":"FutureComputing4AI","description":"Hrrformer: A Neuro-symbolic Self-attention Model (ICML23)","archived":false,"fork":false,"pushed_at":"2023-06-07T00:20:14.000Z","size":129,"stargazers_count":47,"open_issues_count":1,"forks_count":5,"subscribers_count":3,"default_branch":"main","last_synced_at":"2024-11-07T12:02:42.330Z","etag":null,"topics":["ember","holographic-reduced-representations","hrr","hrrformer","icml","icml-2023","long-range-arena","lra","malware","neuro-symbolic","self-attention","transformer"],"latest_commit_sha":null,"homepage":"https://arxiv.org/abs/2305.19534","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/FutureComputing4AI.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null}},"created_at":"2023-05-09T18:36:01.000Z","updated_at":"2024-10-14T22:36:05.000Z","dependencies_parsed_at":"2024-03-23T16:44:30.307Z","dependency_job_id":null,"html_url":"https://github.com/FutureComputing4AI/Hrrformer","commit_stats":null,"previous_names":["neuromorphiccomputationresearchprogram/hrrformer","futurecomputing4ai/hrrformer"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/FutureComputing4AI%2FHrrformer","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/FutureComputing4AI%2FHrrformer/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/FutureComputing4AI%2FHrrformer/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/FutureComputing4AI%2FHrrformer/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/FutureComputing4AI","download_url":"https://codeload.github.com/FutureComputing4AI/Hrrformer/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":223510342,"owners_count":17157306,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ember","holographic-reduced-representations","hrr","hrrformer","icml","icml-2023","long-range-arena","lra","malware","neuro-symbolic","self-attention","transformer"],"created_at":"2024-09-24T19:47:53.356Z","updated_at":"2024-11-07T12:03:40.950Z","avatar_url":"https://github.com/FutureComputing4AI.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003ch2 align=\"center\"\u003eHrrformer⚡\u003c/h2\u003e\n\n\u003cp align=\"justify\"\u003e\nHrrformer is a neuro-symbolic self-attention model with linear 𝒪(T) time and space complexity. 23× faster and consumes 24× less memory than Transformer. SOTA performance for even over sequence length T≥100,000. Able to learn with a single layer and converges 10× faster in LRA benchmark.\n\u003c/p\u003e\n\u003cp align=\"center\"\u003e\n\u003cimg src=\"https://github.com/MahmudulAlam/Unified-Gesture-and-Fingertip-Detection/assets/37298971/ef076eaa-bace-49e6-902f-31a9518f80d7\" width=\"1000\"\u003e\n\u003c/p\u003e\n\n## Requirements\n![requirements](https://img.shields.io/badge/Python-3.9.12-3480eb.svg?longCache=true\u0026style=flat\u0026logo=python)\n\n\u003cp align=\"justify\"\u003e\nThe code is written in \u003ca href=https://github.com/google/jax\u003ejax\u003c/a\u003e which is a deep learning framework developed by Google. Jax leverages just-in-time (JIT) compilation and hardware acceleration to optimize the execution of numerical operations. JIT compilation is a technique that compiles code at runtime, just before it is executed which allows the compiler to optimize the code. Moreover, the numerical operations are also optimized using Accelerated Linear Algebra (XLA) compiler. Along with jax flax and optax are also used which are higher-level libraries written on top of jax. \n\u003c/p\u003e\n\n```properties\npip install --upgrade https://storage.googleapis.com/jax-releases/cuda11/jaxlib-0.3.15+cuda11.cudnn82-cp39-none-manylinux2014_x86_64.whl\npip install flax==0.6.0\npip install optax==0.1.2\n```\n\nJax is great at optimization and making use of hardware acceleration but it does not have a built-in dataloader for which we have to rely on Tensorflow and PyTorch data loaders. Install the CPU version of both of them. \n\n```properties\npip install tensorflow-cpu==2.8.0\npip install tensorflow-datasets==4.5.2\npip install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cpu\n```\n\nFinally, the library that implements the vector symbolic architecture called \u003ca href=https://github.com/MahmudulAlam/Holographic-Reduced-Representations\u003eHolographic Reduced Representations (HRR)\u003c/a\u003e which is the key concept used to develop Hrrformer. \n\n```properties\npip install hrr --upgrade\n```\n\n## Dataset \n\u003cp align=\"justify\"\u003e\nExperiments are performed on \u003ca href=https://github.com/google-research/long-range-arena\u003eLong Range Arena (LRA)\u003c/a\u003e and EMBER malware classification benchmarks. To get the LRA benchmark first download the following file and extract it to the working directory. Image and Text datasets come with the TensorFlow Datasets library. These datasets will be automatically downloaded while running the code. \n\u003c/p\u003e\n\n```properties\nwget https://storage.googleapis.com/long-range-arena/lra_release.gz\ntar xvf lra_release.gz\n``` \n\n## Getting Started\n\u003cp align=\"justify\"\u003e\nAll the tasks are separated into different folders. Each folder contains a data loader file named \u003cb\u003edataset.py\u003c/b\u003e along with a standalone \u003cb\u003ehrrformer_mgpu.py\u003c/b\u003e file which can run the Hrrformer model in multi-GPU settings.\n\u003c/p\u003e\n\n```embed.py``` contains data classes for ```learned``` and ```fixed``` positional embeddings. ```utils.py``` has assorted\nutility files which as necessary to load/save models, write history, split/merge tensors, etc.\n\n## Results\n### LRA benchmark \n\u003cp align=\"justify\"\u003e\nWe use the same or less number of parameters as mentioned in the LRA benchmark across the tasks. Hrrformer is trained for a total of 20 epochs both in the case of single- and multi-layer which is 10x less than the previous works. The results in terms of the accuracy of the LRA benchmark are presented in the following table.\n\u003c/p\u003e\n\n| **Model** | **ListOps** | **Text** | **Retrieval** | **Image** | **Path** | **Avg** | **Epochs** |\n|:-:|:-:|:-:|:-:|:-:|:-:|:-:|:-:|\nTransformer | 36.37 | 64.27 | 57.46 | 42.44 | 71.40 | 54.39 | 200\nLocal Attention | 15.82 | 52.98 | 53.39 | 41.46 | 66.63 | 46.06 | 200\nLinear Transformer | 16.13 | 65.90 | 53.09 | 42.34 | 75.30 | 50.55 | 200\nReformer | 37.27 | 56.10 | 53.40 | 38.07 | 68.50 | 50.67 | 200\nSparse Transformer | 17.07 | 63.58 | 59.59 | 44.24 | 71.71 | 51.24 | 200\nSinkhorn Transformer | 33.67 | 61.20 | 53.83 | 41.23 | 67.45 | 51.29 | 200\nLinformer | 35.70 | 53.94 | 52.27 | 38.56 | 76.34 | 51.36 | 200\nPerformer | 18.01 | 65.40 | 53.82 | 42.77 | 77.05 | 51.41 | 200\nSynthesizer | 36.99 | 61.68 | 54.67 | 41.61 | 69.45 | 52.88 | 200\nLongformer | 35.63 | 62.85 | 56.89 | 42.22 | 69.71 | 53.46 | 200\nBigBird | 36.05 | 64.02 | 59.29 | 40.83 | 74.87 | 55.01 | 200\nF-Net | 35.33 | 65.11 | 59.61 | 38.67 | 77.78 | 54.42 | 200\nNystromformer | 37.15 | 65.52 | **79.56** | 41.58 | 70.94 | 58.95 | 200\nLuna-256 | 37.98 | 65.78 | 79.56 | 47.86 | **78.55** | **61.95** | 200\nH-Transformer-1D | **49.53** | **78.69** | 63.99 | 46.05 | 68.78 | 61.41 | 200\n**Hrrformer Single Layer** | 38.79 | 66.50 | 75.40 | 48.47 | 70.71 | 59.97 | **20**\n**Hrrformer Multi Layer** | 39.98 | 65.38 | 76.15 | **50.45** | 72.17 | 60.83 | **20**\n\n### Speed and Memory Usage\n\u003cp align=\"justify\"\u003e\nThe following figure compares all the self-attention models in terms of LRA score, speed (training examples per second), and memory footprint (size of the circle). LRA score is the mean accuracy of all the tasks in the LRA benchmark. Both single- and multi-layered Hrrformer are 28x and 10x faster than the Luna-256 which has achieved the highest accuracy in the LRA benchmark. Hrrformer also consumes the least amount of memory, taking 79.15% and 70.66% less memory compared to Luna-256 in the case of single and multi-layered Hrrformer, respectively.\n\u003c/p\u003e\n\n\u003cp align=\"center\"\u003e\n\u003cimg src=\"https://github.com/MahmudulAlam/Unified-Gesture-and-Fingertip-Detection/assets/37298971/dc2a24e0-1f9b-430d-8dfd-7692723889d8\" width=\"800\"\u003e\n\u003c/p\u003e\n\n### Learning 2D Structure from 1D\n\u003cp align=\"justify\"\u003e\nThe ability to learn with a single layer aids in both throughput and memory use. The result is surprising, and in visualizing the weight vector W we can confirm that a single layer is sufficient to learn the structure.\n\u003c/p\u003e\n\n\u003cp align=\"center\"\u003e\n\u003cimg src=\"https://github.com/MahmudulAlam/Unified-Gesture-and-Fingertip-Detection/assets/37298971/4872de28-8c42-4180-944a-6a2bb9a3a892\" width=\"800\"\u003e\n\u003c/p\u003e\n\n### EMBER\n\u003cp align=\"justify\"\u003e\nThe following Figure shows the classification accuracy and the execution time for different self-attention models for incremental sequence length in the EMBER malware classification dataset. As the sequence length increases, Hrrformer outperforms the rest of the models achieving the highest \u003cb\u003e91.03%\u003c/b\u003e accuracy for a maximum sequence length of \u003cb\u003e16,384\u003c/b\u003e.\n\u003c/p\u003e\n\n\u003cp align=\"center\"\u003e\n\u003cimg src=\"https://github.com/MahmudulAlam/Unified-Gesture-and-Fingertip-Detection/assets/37298971/182d1a62-44ab-4926-83c3-859676d38d9f\" width=\"950\"\u003e\n\u003c/p\u003e\n\n### Citations\n[![Paper](https://img.shields.io/badge/ICML-2023-1495f7.svg?longCache=true\u0026style=flat)](https://icml.cc/Conferences/2023)\n[![Paper](https://img.shields.io/badge/paper-ArXiv-ff0a0a.svg?longCache=true\u0026style=flat)](https://arxiv.org/abs/2305.19534)\n\nTo get more information about the proposed method and experiments, please go through the [paper](https://arxiv.org/abs/2305.19534). If you use this work or find this useful, cite the paper as:\n\n```bibtex\n@article{alam2023recasting,\n  title={Recasting Self-Attention with Holographic Reduced Representations},\n  author={Alam, Mohammad Mahmudul and Raff, Edward and Biderman, Stella and Oates, Tim and Holt, James},\n  journal={arXiv preprint arXiv:2305.19534},\n  year={2023}\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffuturecomputing4ai%2Fhrrformer","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ffuturecomputing4ai%2Fhrrformer","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffuturecomputing4ai%2Fhrrformer/lists"}