{"id":21161982,"url":"https://github.com/princetonuniversity/maple","last_synced_at":"2025-07-09T14:31:59.982Z","repository":{"id":39713565,"uuid":"497019451","full_name":"PrincetonUniversity/maple","owner":"PrincetonUniversity","description":"MAPLE's hardware-software co-design allows programs to perform long-latency memory accesses asynchronously from the core, avoiding pipeline stalls, and enabling greater memory parallelism (MLP). ","archived":false,"fork":false,"pushed_at":"2024-02-22T16:31:02.000Z","size":5389,"stargazers_count":15,"open_issues_count":0,"forks_count":2,"subscribers_count":4,"default_branch":"main","last_synced_at":"2024-03-15T07:59:29.091Z","etag":null,"topics":["memory-accelerators","memory-access","memory-latency","programmable-block","rtl","specialized-hardware","verilog"],"latest_commit_sha":null,"homepage":"","language":"C","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/PrincetonUniversity.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2022-05-27T14:04:27.000Z","updated_at":"2024-01-25T19:43:27.000Z","dependencies_parsed_at":"2022-09-20T08:33:21.494Z","dependency_job_id":null,"html_url":"https://github.com/PrincetonUniversity/maple","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/PrincetonUniversity%2Fmaple","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/PrincetonUniversity%2Fmaple/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/PrincetonUniversity%2Fmaple/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/PrincetonUniversity%2Fmaple/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/PrincetonUniversity","download_url":"https://codeload.github.com/PrincetonUniversity/maple/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":225562506,"owners_count":17488677,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["memory-accelerators","memory-access","memory-latency","programmable-block","rtl","specialized-hardware","verilog"],"created_at":"2024-11-20T13:19:51.663Z","updated_at":"2024-11-20T13:19:52.312Z","avatar_url":"https://github.com/PrincetonUniversity.png","language":"C","funding_links":[],"categories":[],"sub_categories":[],"readme":"# MAPLE (Memory Access Parallel Load Engine)\n\u003cimg align=\"right\" width=\"375\" height=\"400\" src=\"https://user-images.githubusercontent.com/55038083/175697160-4008adf9-8ddc-4374-9eb6-13f0d375c581.png\"\u003e\n\nThis is the repository for the RTL and API of the [research paper \"Tiny but Mighty: Designing and Realizing Scalable Latency Tolerance for Manycore SoCs\"](https://dl.acm.org/doi/abs/10.1145/3470496.3527400\n), to appear at the 49th International Symposium on Computer Architecture \n\nThe correct citation for this work is:\n\n```\n@inproceedings{maple,\n  author = {Orenes-Vera, Marcelo and Manocha, Aninda and Balkind, Jonathan and Gao, Fei and Arag\\'{o}n, Juan L. and Wentzlaff, David and Martonosi, Margaret},\n  title = {Tiny but Mighty: Designing and Realizing Scalable Latency Tolerance for Manycore SoCs},\n  year = {2022},\n  isbn = {9781450386104},\n  publisher = {Association for Computing Machinery},\n  address = {New York, NY, USA},\n  url = {https://doi.org/10.1145/3470496.3527400},\n  doi = {10.1145/3470496.3527400},\n  booktitle = {Proceedings of the 49th Annual International Symposium on Computer Architecture},\n  pages = {817–830},\n  numpages = {14},\n  series = {ISCA '22}\n}\n```\n\n## Overview\n\nIn this repository you can find:\n\nAn **outline of the RTL files** can be found at *rtl/Flist.dcp*\nDCP stands for 'decoupling from processor', as it is an RTL block that can be interacted with through the MAPLE API.\n\nThe **MAPLE API** is located at *api/dcp_maple.h*, whereas the *api/dcp_shared_memory.h* implements the API of the decoupling functions using shared memory, as a way to compare the improvements of the specialized MAPLE hardware to mitigate memory latency.\n\nThe *tests* folder contains the benchmarks in subfolders and four other programs to test MAPLE features. Note that MAPLE can also perform basic DMA as shown in dma.c\n\n## Installation\n\n    git clone git@github.com:PrincetonUniversity/openpiton.git;\n    cd openpiton;\n    git checkout openpiton-maple;\n    git clone git@github.com:PrincetonUniversity/maple.git;\n    source piton/ariane_setup.sh;\n    source piton/ariane_build_tools.sh;\n\n### Building RTL\nWe are now going to build a basic prototype of 4 tiles (2 Arianes and 2 MAPLE tile in between)\nCurrently the frequency of MAPLE tiles is one every two tiles. This can be configured.\n\n    cd build;\n    sims -sys=manycore -ariane -decoupling -vcs_build -x_tiles=3 -y_tiles=1 -config_rtl=MINIMAL_MONITORING;\n\n### Running basic test\n    cd $PITON_ROOT/maple;\n\nRuns test #0. (Four basic tests are provided within this run_test.sh script)\n\n    ./run_test.sh 0;\n\n### Troubleshooting\nIf the build process fails due to a python problem make sure that your python command is defined and pointing to python2\n\n    which python\n\nIf it's not defined or pointing to python3, then change the usage of python for python2 in the following files by running the following commands from the openpiton folder (not the build folder)\n    \n    sed -i 's/python/python2/' piton/design/chip/tile/ariane/bootrom/Makefile;\n    sed -i 's/python/python2/' piton/design/chip/tile/ariane/openpiton/bootrom/linux/Makefile;\n\n\n\n## Running programs\n\n### Running feature tests\n\nTo run the basic tests we can use the script **run_test.sh \u003ctest_id\u003e**\n    \nwhere ***test_id*** is the index of the 4 test types \n\ntests=(\"dma\" \"dcp_uni\" \"contiguous_allocation\" \"custom_acc\")\n\nFor example to run the dma test we do\n\n    ./run_test.sh 0\n\n### Running benchmarks\n\nTo run the benchmarks we can use the script **run_bench.sh \u003ctype\u003e \u003cname\u003e \u003ctiles\u003e \u003caccess_threads\u003e \u003cexecute_threads\u003e \u003cdataset_size\u003e\u003cmode\u003e**\n\nwhere:\n\n ***type*** is either *dcpn* (dcp normal), *dcpl* (dcp lima prefetching) or *doall* (for traditional homogeneous parallelism)\n\n ***name*** is either *spmv* (sparse matrix vector mul), *spmm* (sparse matrix matrix mul), *bfs* (breadth-first-seach) or *ewsd* (element-wise sparse dense multiplication, aka, SDHP)\n\n ***tiles*** is the number of tiles, counting MAPLE and Ariane core tiles.\n\n ***access_threads*** is the number of cores that are going to be behaving as supplier in decoupling, or the total number of core threads in prefetching or doall\n\n***execute_threads*** (only relevant for decoupling, aka dcpn) is the number of cores that are going to be behaving as consumer in decoupling.\n\n***size_dataset*** (1:Tiny, 2:Small 3:Big)\n\n***mode*** (1:LIMA Loading from DRAM; 2:LIMA Loading from LLC; 3:LIMA Prefetching into LLC; 4:Software Prefetching )\n\nFor example, the next command runs *prefetching with LIMA*, for SPMV, using 3 tiles (2 arianes and 1 MAPLE tile), 2 processing threads, execute threads not relevant, and size small, and LIMA loading from DRAM.\n\n    ./run_bench.sh \"dcpl\" spmv 3 2 0 2 1\n\nThe following example does the same for *doall*\n\n    ./run_bench.sh \"doall\" spmv 3 2 0 2 1\n\nAnd the following example does the same for *decoupling*\n\n    ./run_bench.sh \"dcpn\" spmv 3 1 1 2 1\n\nThe print output and the simulation trace are places into the *build/res* folder\n\n\n## Videos of FPGA demos\nDecoupling with four tiles\n- https://youtu.be/elkQcMFSvoo\n\nDecoupling and prefetching on top of Linux\n- https://youtu.be/YRbsjqzlTOM\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fprincetonuniversity%2Fmaple","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fprincetonuniversity%2Fmaple","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fprincetonuniversity%2Fmaple/lists"}