{"id":31815714,"url":"https://github.com/xilinx/reconic","last_synced_at":"2026-02-18T15:01:19.957Z","repository":{"id":187276992,"uuid":"676557566","full_name":"Xilinx/RecoNIC","owner":"Xilinx","description":"RecoNIC is a software/hardware shell used to enable network-attached processing within an RDMA-featured SmartNIC for scale-out computing.","archived":false,"fork":false,"pushed_at":"2025-03-20T01:48:20.000Z","size":1333,"stargazers_count":144,"open_issues_count":8,"forks_count":35,"subscribers_count":5,"default_branch":"main","last_synced_at":"2025-09-24T13:05:41.134Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"SystemVerilog","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Xilinx.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-08-09T13:26:42.000Z","updated_at":"2025-09-05T02:27:05.000Z","dependencies_parsed_at":"2023-11-07T09:26:03.778Z","dependency_job_id":"dcbd8b0e-efcd-42e8-8e0b-e2e6f99473e2","html_url":"https://github.com/Xilinx/RecoNIC","commit_stats":null,"previous_names":["xilinx/reconic"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/Xilinx/RecoNIC","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Xilinx%2FRecoNIC","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Xilinx%2FRecoNIC/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Xilinx%2FRecoNIC/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Xilinx%2FRecoNIC/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Xilinx","download_url":"https://codeload.github.com/Xilinx/RecoNIC/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Xilinx%2FRecoNIC/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":279006753,"owners_count":26084178,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-11T02:00:06.511Z","response_time":55,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-10-11T09:23:22.957Z","updated_at":"2025-10-11T09:23:24.087Z","avatar_url":"https://github.com/Xilinx.png","language":"SystemVerilog","funding_links":[],"categories":[],"sub_categories":[],"readme":"# RecoNIC - \u003cins\u003eR\u003c/ins\u003eDMA-\u003cins\u003ee\u003c/ins\u003enabled \u003cins\u003eC\u003c/ins\u003eompute \u003cins\u003eO\u003c/ins\u003effloading on Smart\u003cins\u003eNIC\u003c/ins\u003e\n\nTo meet the explosive growth of data and scale-out workloads/applications, today's data centers comprise sea of network connected hosts, each with multi-core CPUs and accelerators in the form of ASICs, FPGAs, and/or GPUs. The interaction between these hosts takes place through network interface cards (NICs) operating at speeds of 40Gbps or higher. Such data center architecture provides an ideal environment for distributed applications. When computation tasks are offloaded to accelerators, data received from remote application components through the network is first stored in the host’s memory and then copied to the memory of accelerators via the PCIe bus. Once computation tasks are completed, results destined for remote peers are often copied back to the host’s memory. This CPU-centric solution introduces multiple data copies, leading to a notable decrease in overall performance and increased latency.\n\nTo address these challenges, we propose RecoNIC, an RDMA-enabled SmartNIC platform with compute acceleration, designed to minimize the overhead associated with data copies and to bring data as close to computation as feasible. The platform consists of a hardware shell and software stacks. The hardware shell of RecoNIC encompasses basic NIC functionalities, an RDMA engine, and two programmable compute logic modules for lookaside and streaming computations, respectively. Developers have the flexibility to design their accelerators using RTL, HLS or Vitis Networking P4 within the RecoNIC's programmable compute logic modules. This allows for the processing of network data without resorting to the multiple copies’ characteristic of traditional CPU-centric solutions. The logic executed within these programmable modules can access both RecoNIC and host memory in remote peers via RDMA.\n\nFor more information, please refer to the [RecoNIC primer](https://arxiv.org/abs/2312.06207).\n\n## RecoNIC System Overview\n\n\u003cimg src=\"doc/image/RecoNIC.png\"\u003e\n\nThe above figure shows the hardware shell architecture and software stacks of RecoNIC. The hardware shell consists of a basic NIC module (including a MAC subsystem and DMA subsystem - QDMA), a packet classification module, an RDMA engine, two programmable compute logic modules (Lookaside Compute and Streaming Compute), along with supplementary modules such as system/memory crossbars and an arbiter.\n\nThe RDMA engine is responsible for processing RDMA traffic, allowing payload data from the network to be stored in either the host's memory or the RecoNIC device's memory. User defined accelerators implemented in the Streaming Compute and Lookaside Compute modules can directly process data, including network-received data, within the device memory.\n\nThe software encompasses the network stack, consisting of RDMA APIs and network driver to handle non-RDMA traffic (such as TCP/IP, UDP/IP, and ARP). Additionally, the memory driver facilitates seamless memory transfers between the host and RecoNIC memory. Finally, the control driver serves to configure and control various components in the hardware shell.\n\n## System Requirement\n\n* Two servers, each one has an AMD-Xilinx Alveo U250 FPGA board\n* The two AMD-Xilinx Alveo U250 boards can be connected via a 100Gbps cable or through a 100Gbps switch\n* Experiments are tested on machines with Ubuntu 20.04 and linux kernel version 5.4.0-125-generic.\n\n## Preliminary Installation\n\n* Vivado 2021.2\n* vitis_net_p4 \u003cbr/\u003e\nHow to enable vitis_net_p4: (1) before Vivado installation, we need to '$ export VitisNetP4_Option_VISIBLE=true'; (2) When running Vivado installer, you should be able to see the option for Vitis Networking P4. Make sure you select the vitis_net_p4 option.\n* ERNIC license \u003cbr/\u003e\nERNIC license is required in this project. You can either purchase or apply for it through [AMD University Program](https://www.xilinx.com/support/university.html). For further details, please visit [AMD ERNIC](https://www.xilinx.com/products/intellectual-property/ef-di-ernic.html) website.\n* Questa simulator 2021.3 (if available)\n* python \u003e= 3.8\n* [Xilinx Board Store](https://github.com/Xilinx/XilinxBoardStore)\n  ```\n  $ git clone https://github.com/Xilinx/XilinxBoardStore\n  $ export BOARD_REPO=/your/path/to/XilinxBoardStore\n  ```\n* netplan : We are using netplan to configure static IPs for RecoNIC\n* Doxygen\n\n## Hardware Generation and Programming\n\nRecoNIC leverages [OpenNIC](https://github.com/Xilinx/open-nic) as its basic NIC shell. To build RecoNIC, we need to first obtain the Open-NIC shell and apply patches to set up the RecoNIC shell.\n\n* Obtain the modified OpenNIC shell with the RDMA engine\n```\n$ git submodule update --init base_nics/open-nic-shell\n$ cp -r base_nics/open-nic-shell/board_files/Xilinx/au250 $BOARD_REPO/boards/Xilinx/\n```\n* Integrate RecoNIC into the modified OpenNIC shell and generate bitstream\n```\n$ cd ./scripts\n$ ./gen_base_nic.sh\n$ make build_nic\n```\nIf you encounter the error below, please specify your python version when generating bitstream by \"*make PYTHON_EXE=python3.8 build_nic*\"\n```\n...\n  File \"../scripts/build_tcl.py\", line 82\n    logging.info(f'verilog: {f}')\n\nSyntaxError: invalid syntax\n```\n\n* Program FPGA\n\n\u0026emsp;\u0026emsp;**Using Vivado GUI for FPGA programming**\n\nThe system project and its bitstream will be generated under ./smartnics/open-nic-shell/build/au250/open_nic_shell folder. To setup the demo, please download the bitstream to the two FPGA boards according to [AMD Vivado User Guide UG908](https://docs.xilinx.com/r/2022.1-English/ug908-vivado-programming-debugging/Programming-the-Device).\n\nAfter downloading the bitstream on the FPGA board, you can check whether the board is up by\n```\n$ lspci | grep Xilinx\nd8:00.0 Memory controller: Xilinx Corporation Device 903f\n```\nThe PCIe BDF (Bus, Device, Function) number and device ID might be different depending on your system.\n\n\u0026emsp;\u0026emsp;**Using scripts for FPGA programming**\n\n*[program_fpga.sh](scripts/program_fpga.sh)* is a bash script used to program FPGA either with *.bit or *.mcs file. In order to use the script, you have to first get the PCIe BDF number and FPGA target device ID/name. You can obtain the FPGA target device name from \"Open New Target\" under \"Open Hardware Manager\" of \"PROGRAM AND DEBUG\" in Vivado GUI. Or you can use this command\n```\n$ echo 'open_hw_manager; connect_hw_server -allow_non_jtag; puts [get_hw_targets]' \u003e temp.tcl \u0026\u0026 vivado -nolog -nojournal -mode batch -source temp.tcl | grep 'xilinx_tcf/Xilinx/' \u0026\u0026 rm temp.tcl\nlocalhost:3121/xilinx_tcf/Xilinx/12345678A01BC\n```\nIn this case, the FPGA target device ID/name is \"12345678A01BC\".\n\nIf your jtag cable for programming is connected to the other remote host, then you need to provide IP address or hostname of that remote machine as well. To get the target device ID/name in this case, you can add *connect_hw_server -url $remote_host:3121 -allow_non_jtag;* in the above command, where $remote_host is IP address or hostname of your remote machine. \n\nThe below commands show how to use *program_fpga.sh* to download bitstream on an FPGA board.\n```\n$ cd scripts\n$ ./program_fpga.sh\nUsage:\n  ./program_fpga.sh -b pcie_bdf_num -t target_name [option]\n  Options and arguments:\n  -b, --bdf          PCIe BDF (Bus, Device, Function) number\n  -t, --target_id    FPGA target device name or ID\n  -p, --prog_file    FPGA programming file in \"bit\" or \"mcs\" format\n  -r, --remote_host  Remote hostname or IP address used to program FPGA board\n\nInfo: This script should be executed locally on a host server with the target FPGA board.\nInfo: For mcs programming, user has to provide /your/path/to/your_file.mcs.\nInfo: Target ID or target name can be obtained from \"Open New Target\" under \"Open Hardware\n      Manager\" of \"PROGRAM AND DEBUG\" in Vivado GUI\n\n$ ./program_fpga.sh -b d8:00.0 -t target_name -r remote_hostname\n```\n\n### How to generate patches for the hardware shell\n\nWe leverage patches to include new changes in the hardware shell. If you want to modify the hardware shell such as adding board support and new features, please refer to this [document](./doc/how_to_gen_a_patch.md) for the instructions.\n\n## Driver Installation\n\n* Install the modified onic driver\n```\n$ git submodule update --init drivers/onic-driver\n$ cd ./scripts\n$ ./gen_nic_driver.sh\n$ cd ../drivers/onic-driver\n$ make\n$ sudo insmod onic.ko\n```\n\n* Get MAC address and ethernet interface name assigned by the driver\n```\n$ dmesg | grep \"Set MAC address to\"\nonic 0000:d8:00.0 onic216s0f0 (uninitialized): Set MAC address to 0:a:35:29:33:0\n```\n\nIn this example, the new MAC address is [0x00, 0x0a, 0x35, 0x29, 0x33, 0x00], while the ethernet interface name assigned is 'onic216s0f0'. It is possible that the ethernet interface, 'onic216s0f0', might be renamed by the operating system. You can check with the following commands.\n```\n$ dmesg | grep \"renamed from\"\n[  146.932392] onic 0000:d8:00.0 ens8: renamed from onic216s0f0\n```\nIn this case, the ethernet interface is renamed as \"ens8\" from \"onic216s0f0\".\n\n* Set IP addresses for the two peers\n\nWe can set the IP addresses either via a netplan configuration file or *ifconfig*. Assuming the ethernet interface name is \"onic216s0f0\".\n\n\u0026emsp;\u0026emsp;**1. Using *netplan* to set IPs**\n\nYou need to create a configuration file, onic216s0f0.yaml, at \"/etc/netplan/\" and copy the below code snippet in this file.\n\n\u0026emsp;\u0026emsp;\u0026emsp;**Peer 1**\n```\nnetwork:\n  version: 2\n  renderer: networkd\n  ethernets:\n    onic216s0f0:\n      dhcp4: no\n      dhcp6: no\n      addresses: [192.100.51.1/16]\n```\n\u0026emsp;\u0026emsp;\u0026emsp;**Peer 2**\n```\nnetwork:\n  version: 2\n  renderer: networkd\n  ethernets:\n    onic216s0f0:\n      dhcp4: no\n      dhcp6: no\n      addresses: [192.100.52.1/16]\n```\n\nOnce done, you need to enable the configuration by \"sudo netplan apply\" or simply do warm reboot for your system.\n\n\u0026emsp;\u0026emsp;**2. Using *ifconfig* to set IPs**\n\n\u0026emsp;\u0026emsp;\u0026emsp;**Peer 1**\n```\n$ sudo ifconfig onic216s0f0 192.100.51.1 netmask 255.255.0.0 broadcast 192.100.255.255\n```\n\u0026emsp;\u0026emsp;\u0026emsp;**Peer 2**\n```\n$ sudo ifconfig onic216s0f0 192.100.52.1 netmask 255.255.0.0 broadcast 192.100.255.255\n```\n\n* Test network connectivity\n\n\u0026emsp;\u0026emsp;**Peer 1**\n```\n$ ping 192.100.52.1\nPING 192.100.52.1 (192.100.52.1) 56(84) bytes of data.\n64 bytes from 192.100.52.1: icmp_seq=1 ttl=64 time=0.188 ms\n64 bytes from 192.100.52.1: icmp_seq=2 ttl=64 time=0.194 ms\n64 bytes from 192.100.52.1: icmp_seq=3 ttl=64 time=0.222 ms\n```\n\n\u0026emsp;\u0026emsp;**Peer 2**\n```\n$ ping 192.100.51.1\nPING 192.100.51.1 (192.100.51.1) 56(84) bytes of data.\n64 bytes from 192.100.51.1: icmp_seq=1 ttl=64 time=0.248 ms\n64 bytes from 192.100.51.1: icmp_seq=2 ttl=64 time=0.174 ms\n64 bytes from 192.100.51.1: icmp_seq=3 ttl=64 time=0.201 ms\n```\n\nIf everything works fine, it should return similar output from your terminals. After verifying, you can stop *ping*. The system is now up.\n\n## RecoNIC user-space library\nRecoNIC user-space library (green boxes shown in the above RecoNIC system overview figure) contains all necessary APIs for RDMA, memory and control operations. To obtain the document for source codes, you can simply run with\n```\n$ cd ./lib\n$ doxygen\n```\nThe source code documents will be generated at ./lib/html.\n\nBefore we run test cases and applications, we need to build the libreconic library.\n```\n$ make\n$ export LD_LIBRARY_PATH=/your/path/to/RecoNIC/lib:$LD_LIBRARY_PATH\n```\nThe generated static, *libreconic.a*, and shared library, *libreconic.so*, are located at ./lib folder. We are ready to play test cases and applications.\n\n## RDMA Test Cases\nThe *rdma_test* folder contains RDMA read, write and send/receive test cases using libreconic.\n\nBuild RDMA read, write and send/recv program.\n```\n$ cd examples/rdma_test\n$ make\n```\n\n### RDMA Read\nRDMA Read operation: The client node issues RDMA read request to the server node first. The server node then replies with the RDMA read response packet.\n```\n$ ./read -h\n  usage: ./read [OPTIONS]\n\n    -d (--device) character device name (defaults to /dev/reconic-mm)\n    -p (--pcie_resource) PCIe resource\n    -r (--src_ip) Source IP address\n    -i (--dst_ip) Destination IP address\n    -u (--udp_sport) UDP source port\n    -t (--tcp_sport) TCP source port\n    -q (--dst_qp) Destination QP number\n    -z (--payload_size) Payload size in bytes\n    -l (--qp_location) QP/mem-registered buffers' location: [host_mem | dev_mem]\n    -s (--server) Server node\n    -c (--client) Client node\n    -g (--debug) Debug mode\n    -h (--help) print usage help and exit \n```\n\n#### On the client node (192.100.51.1)\nRun the program\n```\nsudo ./read -r 192.100.51.1 -i 192.100.52.1 -p /sys/bus/pci/devices/0000\\:d8\\:00.0/resource2 -z 128 -l host_mem -d /dev/reconic-mm -c -u 22222 -t 11111 --dst_qp 2 -g 2\u003e\u00261 | tee client_debug.log\n```\n\n#### On the server node (192.100.52.1)\nRun the program\n```\nsudo ./read -r 192.100.52.1 -i 192.100.51.1 -p /sys/bus/pci/devices/0000\\:d8\\:00.0/resource2 -z 128 -l host_mem -d /dev/reconic-mm -s -u 22222 -t 11111 --dst_qp 2 -g 2\u003e\u00261 | tee server_debug.log\n```\n\nIf the program exits with an error saying libreconic.so is not found, you can try with \"sudo env LD_LIBRARY_PATH=$LD_LIBRARY_PATH ./read\", instead of \"sudo ./read\".\n\nThe above example allocates the QP (SQ, CQ and RQ) in the host memory. If you want the QP to be allocated in the host memory, you can simply replace \"-l host_mem\" with \"-l dev_mem\" on both receiver and sender nodes.\n\n### RDMA Write\nRDMA Write operation: The client node issues RDMA write request to the server node directly. Usage of the RDMA write program is the same with RDMA read program above.\n\n#### On the client node (192.100.51.1)\nRun the program\n```\nsudo ./write -r 192.100.51.1 -i 192.100.52.1 -p /sys/bus/pci/devices/0000\\:d8\\:00.0/resource2 -z 128 -l host_mem -d /dev/reconic-mm -c -u 22222 -t 11111 --dst_qp 2 -g 2\u003e\u00261 | tee client_debug.log\n```\n\n#### On the server node (192.100.52.1)\nRun the program\n```\nsudo ./write -r 192.100.52.1 -i 192.100.51.1 -p /sys/bus/pci/devices/0000\\:d8\\:00.0/resource2 -z 128 -l host_mem -d /dev/reconic-mm -s -u 22222 -t 11111 --dst_qp 2 -g 2\u003e\u00261 | tee server_debug.log\n```\n\nIf the program exits with an error saying libreconic.so is not found, you can try with \"sudo env LD_LIBRARY_PATH=$LD_LIBRARY_PATH ./write\", instead of \"sudo ./write\".\n\nThe above example allocates the QP (SQ, CQ and RQ) in the host memory. You can allocate QPs on device memory as well by using \"-l dev_mem\" on both receiver and sender nodes.\n\n### RDMA Send/Receive\nRDMA send/recv operation: The server node posts an RDMA receive request, waiting for a RDMA send request to its allocated receive queue. The client node then issues an RDMA send request to the server node. Usage of the RDMA send/receive program is the same iwth RDMA read program above.\n\n#### On the receiver node (192.100.51.1)\nRun the program in the receiver mode\n```\nsudo ./send_recv -r 192.100.51.1 -i 192.100.52.1 -p /sys/bus/pci/devices/0000\\:d8\\:00.0/resource2 -z 128 -l host_mem -d /dev/reconic-mm -c -u 22222 --dst_qp 2 -g 2\u003e\u00261 | tee client_debug.log\n```\n\n#### On the sender node (192.100.52.1)\nRun the program in the sender mode\n```\nsudo ./send_recv -r 192.100.52.1 -i 192.100.51.1 -p /sys/bus/pci/devices/0000\\:d8\\:00.0/resource2 -z 16384 -l dev_mem -d /dev/reconic-mm -s -u 22222 --dst_qp 2 -g 2\u003e\u00261 | tee server_debug.log\n```\n\nIf the program exits with an error saying libreconic.so is not found, you can try with \"sudo env LD_LIBRARY_PATH=$LD_LIBRARY_PATH ./send_recv\", instead of \"sudo ./send_recv\".\n\nThe above example allocates the QP (SQ, CQ and RQ) in the host memory. You can allocate QPs on device memory as well by using \"-l dev_mem\" on both receiver and sender nodes.\n\n## Applications\n\n### Built-in example - network systolic-array matrix multiplication\n\nIn the current implementation, we have [matrix multiplication](examples/network_systolic_mm) as an example to demonstrate how to use RecoNIC. In this example, array A and B are stored in the host memory of the remote peer, and the computation is done in the local peer. \n\n**Execution flow**: The local host first issues two RDMA read requests to the remote peer for acquiring array A and B and store it in the device memory. Once detecting the readiness of the two arrays, the local host issues a compute control command to the Compute Logic to start computation. Once the computation is finished, the host reads the result back to the host memory for verification.\n\nThe [hardware implementation](shell/compute/lookside) of the MM computation is a systolic-array version and written in HLS C from the [Vitis_Accel_Examples](https://github.com/Xilinx/Vitis_Accel_Examples/blob/master/cpp_kernels/systolic_array/src/mmult.cpp).\n\nData (Array A and B) is stored in a server node (Peer 1), while computation is executed in a client node (Peer 2).\n\nBefore we run the example, we need to configure hugepages in both servers.\n```\n# 1. Edit /etc/sysctl.conf file and configure number of hugepages by setting 'vm.nr_hugepages'. Each \n#    hugepage will have 2MB size\n$ vm.nr_hugepages = 2048\n# 2. Refresh the kernel parameters\n$ sudo sysctl -p\n```\nCompilation\n```\n$ cd examples/network_systolic_mm\n$ make\n$ ./network_systolic_mm -h\nusage: ./network_systolic_mm [OPTIONS]\n\n  -d (--device) character device name (defaults to /dev/reconic-mm)\n  -p (--pcie_resource) PCIe resource\n  -r (--src_ip) Source IP address\n  -i (--dst_ip) Destination IP address\n  -u (--udp_sport) UDP source port\n  -t (--tcp_sport) TCP source port\n  -q (--dst_qp) Destination QP number\n  -s (--server) Server node\n  -c (--client) Client node\n  -h (--help) print usage help and exit\n\n```\n\n**Peer 1** - Server (192.100.51.1)\n```\n$ sudo ./network_systolic_mm -d /dev/reconic-mm -p /sys/bus/pci/devices/0000\\:d8\\:00.0/resource2 -r 192.100.51.1 -i 192.100.52.1 -u 22222 -t 11111 --dst_qp 2 -s 2\u003e\u00261 | tee server_debug.log\n```\n\n**Peer 2** - Client (192.100.52.1)\n```\n$ cd software/network_systolic_mm\n$ make\n$ sudo ./network_systolic_mm -d /dev/reconic-mm -p /sys/bus/pci/devices/0000\\:d8\\:00.0/resource2 -r 192.100.52.1 -i 192.100.51.1 -u 22222 -t 11111 --dst_qp 2 -c 2\u003e\u00261 | tee client_debug.log\n```\n\n## Performance Evaluation\n\n### DMA testing\n\nThe [dma_test](examples/dma_test) folder is used to test data copy functionality between host and device's memory. It supports both read and write from/to the NIC's memory. In this example, the host acts as a master.\n\n```\n$ cd examples/dma_test\n$ make\n$ ./dma_test -help\nusage: ./dma_test [OPTIONS]\n\n  -d (--device) device\n  -a (--address) the start address on the AXI bus\n  -s (--size) size of a single transfer in bytes, default 32,\n  -o (--offset) page offset of transfer\n  -c (--count) number of transfers, default 1\n  -f (--data infile) filename to read the data from (ignored for read scenario)\n  -w (--data outfile) filename to write the data of the transfers\n  -h (--help) print usage help and exit\n  -v (--verbose) verbose output\n  -r (--read) use read scenario (write scenario without this flag)\n```\n\n* dma_test write\n```\n$ ./dma_test -d /dev/reconic-mm -s 65536000 -c 200\n```\n\n* dma_test read\n```\n$ ./dma_test -d /dev/reconic-mm -s 65536000 -c 200 -r\n```\n\n* PCIe bandwidth measurement for data copy\n\nBefore measuring the bandwidth, we need to determine which CPU core is bound to the specific PCIe slot used for RecoNIC. To do so, we need to find the NUMA node bound to the PCIe slot. It's fine if you measure bandwidth without setting the CPU affinity. This might end up with lower performance if the system schedules other NUMA node that's not bound to the corresponding PCIe slot.\n\n```\n$ lspci | grep Xilinx\nd8:00.0 Memory controller: Xilinx Corporation Device 903f\n$ sudo lspci -vv -s d8:00.0 | grep 'NUMA node'\n        NUMA node: 1\n$ cat /sys/devices/system/node/node1/cpulist\n1,3,5,7,9,11,13,15\n```\n\nNow, we are ready to test the bandwidth\n```\n$ taskset -c 1,3,5,7 ./measure_dma.sh /dev/reconic-mm 4 write 65536000\n\nNumber of dma_test (write) threads: 4\nCalculate total write bandwidth achieved:\n-- The total write bandwidth is: 13.065046 GB/sec\n\n$ taskset -c 1,3,5,7 ./measure_dma.sh /dev/reconic-mm 4 read 65536000\n\nNumber of dma_test (read) threads: 4\nCalculate total read bandwidth achieved:\n-- The total read bandwidth is: 12.998869 GB/sec\n\n```\n\n## Hardware Simulation\n\nThe simulation framework supports self-testing and regression test. Stimulus, control metadata and golden data are generated from a python script, *packet_gen.py*. User can specify their own json file to generate a new set of testing under *./sim/testcases* folder. The testbenches will automatically read those generated files and construct packets in AXI-streaming format and other control-related signals. The simulation framework can support xsim and questasim.\n\nBefore running the simulation, you have to export \"VIVADO_DIR\" and the simulation library directory, \"COMPILED_LIB_DIR\" (Questasim only), into your environment. If you do not know how to compile a simulation library for Vivado, please follow the instructions from [this link](https://support.xilinx.com/s/article/64083?language=en_US).\n```\n$ export VIVADO_DIR=/your/vivado/installation/path/Vivado/2021.2\n$ export COMPILED_LIB_DIR=/your/vivado/compiled_lib_dir/for/questasim\n```\n\n1. Generate vivado IPs\n\n```\n$ cd ./sim/scripts\n$ vivado -mode batch -source gen_vivado_ip.tcl\n```\nIf the output shows a \"board_part definition\" error, please provide board_repo path in the command line\n```\n$ vivado -mode batch -source gen_vivado_ip.tcl -tclargs -board_repo $BOARD_REPO\n```\n\n2. Generate stimulus/control/golden data and start simulation\u003cbr/\u003e\nThe main script is run_testcase.py located at *./sim*. Its usage is shown below.\n```\n# install required python package\n$ pip install scapy\n$ pip install numpy\n$ python run_testcase.py -h\nINFO:run_testcase:Usage:\nINFO:run_testcase:  python run_testcase.py [options] regression,\nINFO:run_testcase:  python run_testcase.py [options] -tc \"testcase1 testcase2 ... testcasek\"\nINFO:run_testcase:Options:\nINFO:run_testcase:  -debug     : Debug mode\nINFO:run_testcase:  -questasim : Use Questa Sim as the simulator. Default is Vivado XSIM\nINFO:run_testcase:  -roce      : Generate configuration files for RDMA simulation\nINFO:run_testcase:  -no_pktgen : Run testcases without re-generating packets\nINFO:run_testcase:  -no_sim    : Only run analysis on the previous simulation results\nINFO:run_testcase:  -gui       : Use gui mode with the simulator\n```\nHere is an example showing how to use the script to simulate 'read_2rdma' testcase under ./sim/testcases/read_2rdma folder\n```\n$ cd ../sim\n# start simulation with xsim\n$ python run_testcase.py -roce -tc read_2rdma -gui\n# start simulation with questasim\n$ python run_testcase.py -roce -tc read_2rdma -questasim -gui\n```\n\nUser can specify their own configuraiton file to construct a new testcase. The configuration file is in the form of 'json'. Here is an example for generating configuration files for RDMA read operations\n```\n{\n  \"top_module\"            : \"rn_tb_2rdma_top\",\n  \"pkt_type\"              : \"rocev2\",\n  \"pkt_op\"                : \"read\",\n  \"non_roce_traffic\"      : \"no\",\n  \"noise_roce_en\"         : \"no\",\n  \"payload_size\"          : 64,\n  \"src_baseaddr_location\" :\"dev_mem\",\n  \"src_baseaddr\"          : 2048,\n  \"dst_baseaddr\"          : 1024,\n  \"num_data_buffer\"       : 4,\n  \"mr_buf_size\"           : 32768,\n  \"data_buffer_size\"      : 4096,\n  \"num_qp\"                : 4,\n  \"udp_sport\"             : 17185,\n  \"destination_qpid\"      : 2,\n  \"sq_depth\"              : 4,\n  \"rq_depth\"              : 4,\n  \"mtu_size\"              : 4096,\n  \"rq_buffer_size\"        : 2048,\n  \"partition_key\"         : 4660,\n  \"r_key\"                 : 22,\n  \"sq_psn\"                : 10\n}\n```\n\"src_baseaddr_location\" is used to specify the source buffer location either at host memory (\"sys_mem\") or device memory (\"dev_mem\").\n\nThe simulation source code is located at [sim/src](sim/src).\n\n## Citation\n\nIf you use RecoNIC in your research and projects, please cite\n```\n@misc{zhong2023primer,\n      title={A Primer on RecoNIC: RDMA-enabled Compute Offloading on SmartNIC}, \n      author={Guanwen Zhong and Aditya Kolekar and Burin Amornpaisannon and Inho Choi and Haris Javaid and Mario Baldi},\n      year={2023},\n      eprint={2312.06207},\n      archivePrefix={arXiv},\n      primaryClass={cs.DC}\n}\n```\n\nIf you find this project helpful, please consider giving it a star! Your support is greatly appreciated.⭐\n\n-----\n\n\u003cp align=\"center\"\u003eCopyright\u0026copy; 2021-2023 Advanced Micro Devices, Inc.\u003c/p\u003e\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fxilinx%2Freconic","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fxilinx%2Freconic","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fxilinx%2Freconic/lists"}