{"id":18422025,"url":"https://github.com/spcl/apfp","last_synced_at":"2025-04-07T14:32:17.995Z","repository":{"id":44869598,"uuid":"412362203","full_name":"spcl/apfp","owner":"spcl","description":"FPGA acceleration of arbitrary precision floating point computations.","archived":false,"fork":false,"pushed_at":"2022-05-17T03:27:53.000Z","size":311,"stargazers_count":38,"open_issues_count":3,"forks_count":7,"subscribers_count":7,"default_branch":"main","last_synced_at":"2025-03-22T19:45:44.502Z","etag":null,"topics":["arbitrary-precision","bignum","fpga","gmp","high-level-synthesis","high-performance-computing","hls","hpc","mpfr","multiple-precision","vitis","vivado-hls","xilinx"],"latest_commit_sha":null,"homepage":"","language":"C++","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/spcl.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2021-10-01T06:58:05.000Z","updated_at":"2024-11-21T10:48:39.000Z","dependencies_parsed_at":"2022-09-12T07:51:48.435Z","dependency_job_id":null,"html_url":"https://github.com/spcl/apfp","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/spcl%2Fapfp","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/spcl%2Fapfp/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/spcl%2Fapfp/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/spcl%2Fapfp/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/spcl","download_url":"https://codeload.github.com/spcl/apfp/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247670072,"owners_count":20976497,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["arbitrary-precision","bignum","fpga","gmp","high-level-synthesis","high-performance-computing","hls","hpc","mpfr","multiple-precision","vitis","vivado-hls","xilinx"],"created_at":"2024-11-06T04:27:47.116Z","updated_at":"2025-04-07T14:32:17.679Z","avatar_url":"https://github.com/spcl.png","language":"C++","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Fast Arbitrary Precision Floating Point on FPGA\n\nA detailed description of the approach implemented in this repository can be\nfound in our [FCCM'22\npaper](https://spcl.inf.ethz.ch/Publications/.pdf/apfp.pdf) [1].\n\n## Introduction\n\nThis repository implements an arbitrary precision floating point multiplier and\nadder using Vitis HLS targeting XRT-enabled Xilinx FPGAs, exposing them through\na matrix multiplication primitive that allows running them at full throughput\nwithout becoming memory bound. The design is _fully pipelined_, yielding a MAC\nthroughput equivalent to the frequency times the number of compute units\ninstantiated.\n\nInstantiations of the design on an Alveo U250 accelerator were shown to yield\n2.0 GMAC/s of 512-bit matrix-matrix multiplication; an order of magnitude\nhigher than a 36-core dual-socket Xeon node, corresponding to 375× CPU cores\nworth of throughput [1].\n\n## Configuration\n\nThe hardware design is configured using CMake. The target Xilinx XRT-enabled\nplatform must be specified with the `APFP_PLATFORM` parameter. The most\nimportant configuration parameters include:\n- The width used for the floating point representation is fixed at compile-time\n  using the `APFP_BITS` CMake parameter, out of which 63 bits will be used for\n  the exponent, 1 bit will be used for the sign, and the remaining bits will be\n  used for the mantissa. The value is currently expected to be a multiple of 512\n  for the sake of being aligned to the memory interface width.\n- To scale the design beyond a single pipelined multiplier, the\n  `APFP_COMPUTE_UNITS` can be used to replicate the full kernel. Each\n  instantiation will run a fully independent matrix multiplication unit. These\n  can be used to collaborate on a single matrix multiplication operation (see\n  `host/TestMatrixMultiplication.cpp` for an example.\n- The floating point multiplier uses Karatsuba decomposition to reduce the\n  overall resource usage of the design. The decomposition bottoms out at\n  `APFP_MULT_BASE_BITS`, after which it falls back on naive multiplication using\n  DSPs as generated by the HLS tool. Similarly, the `APFP_ADD_BASE_BITS`\n  configures the number of bits to dispatch to the HLS tool's addition\n  implementation, manually pipelining the addition into multiple stages above\n  this threshold.\n- To avoid being memory bound, the matrix multiplication implementation is\n  tiled using the approach described in our [FPGA'20\n  paper](https://spcl.inf.ethz.ch/Publications/.pdf/gemm-fpga.pdf) [2]. The\n  tile sizes are exposed through the `APFP_TILE_SIZE_N` and `APFP_TILE_SIZE_M`\n  parameters. The highest arithmetic intensity is achieved when these two\n  quantities are equal and maximized, but relatively small tile sizes are\n  sufficient to overcome the memory bottleneck (e.g., 32x32). Higher tile sizes\n  increase arithmetic intensity at the cost of BRAM usage, and potential\n  overhead when the input matrix is not a multiple of the tile size.\n- `APFP_FREQUENCY` can be used to change the maximum frequency targeted by the\n  design. If unspecified, the default of the target platform will be used.\n\nFor more details on how to configure the project to achieve high throughput,\nsee our paper [1].\n\n## Configuration and compilation\n\nPlease make sure you clone the repository with `git clone --recursive` or run\n`git submodule update --init` after cloning to check out dependencies.\n\nThe minimum commands necessary to configure and build the code are:\n\n```bash\nmkdir build\ncd build\ncmake ..  # Default parameters\nmake      # Builds software components\nmake hw   # Builds hardware accelerator\n```\n\nHowever, the accelerator should always be configured to match the target system\nusing the parameters described in the previous section and in our paper [1].\nThe CMake configuration flow uses\n[hlslib](https://github.com/definelicht/hlslib) [3] to locate the Xilinx tools\nand expose hardware build targets.\n\nThe project depends on Vitis, GMP, and MPFR to successfully configure.\n\n## Running the code\n\nWe provide an example host code that runs the matrix multiplication accelerator\non a randomized input in `host/TestMatrixMultiplication.cpp`. See the executable\nfor usage. An example invocation could be:\n\n```bash\n./TestMatrixMultiplicationHardware hw 256 256 256\n```\n\n## Installation\n\nTo install the project, including both the software interface components and the\nhardware accelerator itself (built with `make hw`), simply run `make install`.\nThe location to install the project in is configured with the\n`CMAKE_INSTALL_PREFIX` parameter.\n\n## References\n\n[1] Johannes de Fine Licht, Christopher A. Pattison, Alexandros Nikolaos\nZiogas, David Simmons-Duffin, Torsten Hoefler, _\"Fast Arbitrary Precision\nFloating Point on FPGA\"_, in Proceedings of the 2022 IEEE 30th Annual\nInternational Symposium on Field-Programmable Custom Computing Machines\n(FCCM'22). [🔗](https://spcl.inf.ethz.ch/Publications/.pdf/apfp.pdf)\n\n[2] Johannes de Fine Licht, Grzegorz Kwasniewski, and Torsten Hoefler,\n_\"Flexible Communication Avoiding Matrix Multiplication on FPGA with High-Level\nSynthesis\"_, in Proceedings of 28th ACM/SIGDA International Symposium on\nField-Programmable Gate Arrays (FPGA'20).\n[🔗](https://spcl.inf.ethz.ch/Publications/.pdf/gemm-fpga.pdf)\n\n[3] Johannes de Fine Licht, and Torsten Hoefler. _\"hlslib: Software Engineering\nfor Hardware Design.\"_, presented at the Fifth International Workshop on\nHeterogeneous High-performance Reconfigurable Computing (H2RC'19).\n[🔗](https://spcl.inf.ethz.ch/Publications/.pdf/hlslib.pdf)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fspcl%2Fapfp","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fspcl%2Fapfp","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fspcl%2Fapfp/lists"}