{"id":13408552,"url":"https://github.com/darklife/darkriscv","last_synced_at":"2026-02-10T16:31:51.770Z","repository":{"id":39574008,"uuid":"145329615","full_name":"darklife/darkriscv","owner":"darklife","description":"opensouce RISC-V cpu core implemented in Verilog from scratch in one night!","archived":false,"fork":false,"pushed_at":"2025-07-16T23:38:41.000Z","size":4187,"stargazers_count":2386,"open_issues_count":7,"forks_count":314,"subscribers_count":94,"default_branch":"master","last_synced_at":"2025-08-14T07:42:33.634Z","etag":null,"topics":["core","fpga","risc-v","riscv","verilog"],"latest_commit_sha":null,"homepage":"","language":"Verilog","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"bsd-3-clause","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/darklife.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2018-08-19T18:55:50.000Z","updated_at":"2025-08-12T09:39:00.000Z","dependencies_parsed_at":"2023-09-24T00:01:39.067Z","dependency_job_id":"18bce41f-1a94-4973-a082-cf30196f0487","html_url":"https://github.com/darklife/darkriscv","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/darklife/darkriscv","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/darklife%2Fdarkriscv","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/darklife%2Fdarkriscv/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/darklife%2Fdarkriscv/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/darklife%2Fdarkriscv/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/darklife","download_url":"https://codeload.github.com/darklife/darkriscv/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/darklife%2Fdarkriscv/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":29307904,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-02-10T16:09:25.305Z","status":"ssl_error","status_checked_at":"2026-02-10T16:08:52.170Z","response_time":65,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["core","fpga","risc-v","riscv","verilog"],"created_at":"2024-07-30T20:00:53.595Z","updated_at":"2026-02-10T16:31:51.751Z","avatar_url":"https://github.com/darklife.png","language":"Verilog","funding_links":[],"categories":["Projects and IPs","Verilog","Open Source Implementations","CPU RISC-V","Applications","Tools"],"sub_categories":["Information Technology","Cores","网络服务_其他","Mesh networks"],"readme":"# DarkRISCV\n[![Build Status][WorkflowBadgeLinux]][WorkflowUrlLinux]\n\nOpensource RISC-V implemented from scratch in one night!\n\n![darkriscv](https://github.com/darklife/darkriscv/blob/master/doc/boot.png)\n\n## Quick Start!\n\nCase you already have the Icarus Verilog installed, just clone the code and type make!\n\n    git clone git@github.com:darklife/darkriscv.git\n    cd darkriscv\n    make\n\nAnd it will run the DarkRISCV with the default firmware, which will print\nlots of fun messages from the core itself, dump some pipeline information\nand generate a VCD file!\n\nThe VCD file can be checked with GTKWave:\n\n    gtkwave sim/darksocv.vcd\n\nSo, you can add the signals from each module and explore the waveforms! :)\n\n## Table of Contents\n\n- [DarkRISCV](#darkriscv)\n\t- [Table of Contents](#table-of-contents)\n\t- [Introduction](#introduction)\n\t- [History](#history)\n\t- [Project Background](#project-background)\n\t- [Directory Description](#directory-description)\n\t\t- [\"src\" Directory](#src-directory)\n\t\t- [\"sim\" Directory](#sim-directory)\n\t\t- [\"rtl\" Directory](#rtl-directory)\n\t\t- [\"board\" Directory](#board-directory)\n\t- [Implementation Notes*](#implementation-notes)\n\t- [Development Tools](#development-tools)\n\t- [Development Boards](#development-boards)\n\t- [FuseSoC support](#fusesoc-support)\n\t- [Creating a RISCV from scratch](#creating-a-riscv-from-scratch)\n\t- [Academic Papers and Applications](#academic-papers-and-applications)\n\t- [Performance Comparisons](#performance-comparisons)\n\t- [Acknowledgments](#acknowledgments)\n\t- [References](#references)\n\n## Introduction\n\nDeveloped in a magic night of 19 Aug, 2018 between 2am and 8am, the\n*DarkRISCV* softcore started as an proof of concept for the opensource\nRISC-V instruction set.  \n\nAlthough the code is small and crude when compared with other RISC-V\nimplementations, the *DarkRISCV* has lots of impressive features:\n\n- implements the UCB RISC-V RV32E and RV32I user space instruction set\n- optional CSRs for interrupts and debug\n- works up to 250MHz in a ultrascale ku040 (400MHz w/ overclock!)\n- up to 100MHz in a cheap spartan-6, fits in small spartan-3E such as XC3S100E!\n- can sustain 1 clock per instruction most of time (typically 70% of time)\n- flexible harvard architecture (easy to integrate a cache controller, bus bridges, etc)\n- works fine in a real xilinx (spartan-3, spartan-6, spartan-7, artix-7, kintex-7 and kintex ultrascale)\n- works fine with some real Altera and Lattice FPGAs too!\n- works fine with gcc 9.0.0 or above for RISC-V (no patches required!)\n- uses between 850-1500LUTs (core only with LUT6 technology, depending of enabled features and optimizations)\n- optional RV32E support (smaller and faster, works better with LUT4 FPGAs)\n- optional 16x16-bit MAC instruction (for digital signal processing) \n- optional coarse-grained multi-threading (MT)\n- DSP-like pipeline: no interlock/stall/forward between pipeline stages!\n- optional interrupt handled on machine level\n- optional breakpoints handled on supervisor level\n- optional instruction and data caches\n- optional harvard to von neumann bridge (DarkBridge)\n- optional SDRAM controller (from kianRiscV project)\n- optional support for big-endian\n- BSD license: can be used anywhere with no restrictions!\n\nSome extra features are planned for the future, under development or tested by some customers:\n\n- ethernet controller (GbE)\n- multi-processing (SMP)\n- network on chip (NoC)\n- rv64i support (not so easy as it appears...)\n- dynamic bus sizing and big-endian support\n- user/supervisor modes\n- misaligned memory access\n- bridge for 8/16/32-bit buses \n\nAnd much other features!\n\nThe following picture shows the DarkRISCV core block diagram:\n\n![darkriscv core](https://github.com/darklife/darkriscv/blob/master/doc/darkriscv.png)\n\nThe caches are added just to make easy to understand, but they are typically external, on the DarkSoCV or DarkBridge. It is easy to see that there is a huge optimization in the instruction path, so it have, in fact 3 stages: PF (pre-fetch), IF (instruction-fetch) and ID (instruction decode). In the EX (execute), there is a single stage, which explain why DarkRISCV does not need forward and does not stall on execution. Also, differently from PF/IF/ID, the EX have four ALUs: one complete ALU for reg/reg and reg/imm operations, one dedicated ALU for branch tests, one dedicated ALU for PC update and one dedicated ALU for memory address calculation, all they working in parallel. Finally, there is the register bank, which is a clocked single-path on write but combinational and multi-path on read, so it is possible feed the ALUs without forward or stall.\n\nOf course, the DarkRISCV needs external blocks around it in order to work, so the following picture shows the DarkSoCV in the mixed Harvard and von Neumann mode, when the core is working around Harvard architecture parallel caches for instruction and data but the rest of SoC is working around a von Neumann architecture, with sequential instructiona and data in the same bus, so it is possible share the main memories (BRAM and SDRAM):\n\n![darkriscv SoC](https://github.com/darklife/darkriscv/blob/master/doc/darksocv.png)\n\nThanks to BSD license, the project is fully open, so feel free to make suggestions and good hacking! o/\n\n## History\n\nThe initial concept was based in my other early 16-bit RISC processors and\ncomposed by a simplified two stage pipeline, where a instruction is fetch\nfrom a instruction memory in the first clock and then the instruction is\ndecoded/executed in the second clock.  The pipeline is overlapped without\ninterlocks, in a way that the *DarkRISCV* can reach the performance of one\nclock per instruction most of time, except by a taken branch, where one\nclock is lost in the pipeline flush.  Of course, in order to perform read\noperations in blockrams in a single clock, a single-phase clock with\ncombinational memory OR a two-phase clock with blockram memory is required,\nin a way that no wait states are required in that cases.\n\nAs result, the code was very compact, with around three hundred lines of\nobfuscated but beautiful Verilog code.  After lots of exciting sleepless\nnights of work and the help of lots of colleagues, the *DarkRISCV* reached a\nvery good quality result, in a way that the code compiled by the standard\nGCC for RV32I worked fine.\n\nAfter two years of development, a three stage pipeline working\nwith a single clock phase was also available, resulting in a better\ndistribution between the decode and execute stages.  In this case the\ninstruction is fetch in the first clock from a blockram, decoded in the\nsecond clock and executed in the third clock.\n\nAs long the load instruction cannot load the data from a blockram in a\nsingle clock, the external logic inserts one extra clock in IO operations. \nAlso, there are two extra clocks in order to flush the pipeline in the case\nof taken branches.  The impact of the pipeline flush depends of the compiler\noptimizations, but according to the lastest measurements, the 3-stage\npipeline version can reach a instruction per clock (IPC) of 0.7, smaller\nthan the measured IPC of 0.8 in the case of the 2-stage pipeline version.\n\nAnyway, with the 3-stage pipeline and some other expensive optimizations,\nthe *DarkRISCV* can reach up to 100MHz in a low-cost Spartan-6, which results in\nmore performance when compared with the 2-stage pipeline version (typically\n50MHz).\n\nIn order to celebrate six of the project, some effort was done in order to\norganize the SoC in a better way, breaking it in separate modules and \nintroducing new bus concepts in order to support large systems.  As result \nDarkRISCV continue to support very well the small and high performance DSP-like \nHarvard architecture systems, as well large and computer-like von Neumann \narchitecture systems.\n\n## Project Background\n\nThe main motivation for the *DarkRISCV* was create a migration path for some\nprojects around the 680x0/Coldfire family.\n\nAlthough there are lots of 680x0 cores available, they are designed around\ndifferent concepts and requirements, in a way that I found no much options\nregarding my requirements (more than 50MIPS with around 1000LUTs).  The best\noption at this moment, the TG68, requires at least 2400LUTs (by removing the\nMUL/DIV instructions), and works up to 40MHz in a Spartan-6.  As addition,\nthe TG68 core requires at least 2 clock per instruction, which means a peak\nperformance of 20MIPS.  As long the 680x0 instruction is too complex, this\nresult is really not bad at all and, at this moment, probably the best\nopensource option to replace the 68000.\n\nAnyway, it does not match my requirements regarding space and\nperformance.  As part of the investigation, I tested other cores, but I\nfound no much options as good as the TG68 and I even started design a\nrisclized-68000 core, in order to try find a solution.  \n\nUnfortunately, due to compiler requirements (standard GCC), I found no much\nways to reduce the space and increase the performance, in a way that I\nstarted investigate about non-680x0 cores.  \n\nAfter lots of tests with different cores, I found the *picorv32* core and\nthe all the ecosystem around the RISC-V.  The *picorv32* is a very nice\nproject and can peak up to 150MHz in a low-cost Spartan-6.  Although most\ninstructions requires 3 or 4 clocks per instruction, the *picorv32*\nresembles the 68020 in some ways, but running at 150MHz and providing a peak\nperformance of 50MIPS, which is very impressive.\n\nAlthough the *picorv32* is a very good option to directly replace the 680x0\nfamily, it is not powerful enough to replace some Coldfire processors (more\nthan 75MIPS).  \n\nAs long I had some good experience with experimental 16-bit RISC cores for\nDSP-like applications, I started code the *DarkRISCV* only to check the\nlevel of complexity and compare with my risclized-68000.  For my surprise,\nin the first night I mapped almost all instructions of the rv32i\nspecification and the *DarkRISCV* started to execute the first instructions\ncorrectly at 75MHz and with one clock per instruction, which not only\nresembles a fast and nice 68040, but also can beat some Coldfires!  wow!  :)\n\nAfter the success of the first nigth of work, I started to work in order to\nfix small details in the hardware and software implementation.\n\n## Directory Description\n\nAlthough the *DarkRISCV* is only a small processor core, a small eco-system \nis required in order to test the core, including RISCV compatible software,\nsupport for simulations and support for peripherals, in a way that the \nprocessor core produces observable results. Each element is stored with \nsimilar elements in directories, in a way that the top level has the\nfollowing organization:\n\n- [README.md](README.md): the top level README file (points to this document)\n- [LICENSE](LICENSE): unlimited freedom! o/\n- [Makefile](Makefile): the show start here!\n- [src](src): the source code for the test firmware (boot.c, main.c etc in C language)\n- [rtl](rtl): the source code for the *DarkRISCV* core and the support logic (Verilog)\n- [sim](sim): the source code for the simulation to test the rtl files (currently via icarus)\n- [boards](boards): support and examples for different boards (currently via Xilinx ISE)\n- [tmp](tmp): empty, but the ISE will create lots of files here)\n\n\nSetup Instructions:\n\nStep 1: Clone the DarkRISC repo to your local using below code.\ngit clone https://github.com/darklife/darkriscv.git\n\nPre Setup Guide for MacOS:\n\nThe document encompasses all the dependencies and steps to install those\ndependencies to successfully utilize the Darkriscv ecosystem on MacOS.\n\nEssentially, the ecosystem cannot be utilized in MacOS because of on of the\ndependencies Xilinx ISE 14.7 Design suit, which currently do not support\nMacOS.\n\nIn order to overcome this issue, we need to install Linux/Windows on MacOS\nby using below two methods:\n\na) WineSkin, which is a kind of Windows emulator that runs the Windows\napplication natively but intercepts and emulate the Windows calls to map\ndirectly in the macOS.  \n\nb) VirtualBox (or VMware, Parallels, etc) in order to run a complete Windows\nOS or Linux, which appears to be far better than the WineSkin option.\n\nI used the second method and installed VMware Fusion to install Linux Mint. \nPlease find below the links I used to obtain download files.\n\nDependencies:\n\n1.  Icarus Verilog\na.  Bison\nb.  GNU\nc.  G++\nd.  FLEX\n\n2.  Xilinx 14.7 ISE\n\n\nIcarus Verilog Setup:\n\nThe steps have been condensed for linux operating system.  Complete steps\nfor all other OS platforms are available on\nhttps://iverilog.fandom.com/wiki/Installation_Guide.\n\nStep 1: Download Verilog download tar file from\nftp://ftp.icarus.com/pub/eda/verilog/ .  Always install the latest version. \nVerilog-10.3 is the latest version as of now.\n\nStep 2: Extract the tar file using ‘% tar -zxvf verilog-version.tar.gz’.\n\nStep 3: Go to the Verilog folder using ‘cd Verilog-version’.  Here it is cd\nVerilog-10.3.\n\nStep 4: Check if you have the following libraries installed: Flex, Bison,\ng++ and gcc.  If not use ‘sudo apt-get install flex bison g++ gcc’ in\nterminal to install.  Restart the system once for effects to change place.\n\nStep 5: Run the below commands in directory Verilog-10.3\n1.  ./configure\n2.  Make\n3.  Sudo make install\n\nStep 6: Use ‘sudo apt-get install verilog’ to install Verilog.\n\nOptional Step: sudo apt-get install gtkwave\n\nXilinx Setup:\n\nFollow the below video on youtube for complete installation.\n\nhttps://www.youtube.com/watch?v=meO-b6Ib17Y\n\nNote: Make sure you have libncurses libraries installed in linux. \n\nIf not use the below codes:\n\n1.  For 64 bit architechure\na.  Sudo apt-get install libncurses5 libncursesw-dev\n2.  For 32 bit architecture\na.  Sudo apt-get install libncurses5:i386\n\nOnce all pre-requisites are installed, go to root directory and run the\nbelow code:\n\ncd darkriscv\nmake (use sudo if required)\n\n\nThe top level *Makefile* is responsible to build everything, but it must \nbe edited first, in a way that the user at least must select the compiler \npath and the target board.\n\nBy default, the top level *Makefile* uses:\n\n\tCROSS = riscv32-embedded-elf\n\tCCPATH = /usr/local/share/gcc-$(CROSS)/bin/\n\tICARUS = /usr/local/bin/iverilog\n\tBOARD  = avnet_microboard_lx9\n\t\nJust update the configuration according to your system configuration, type\n*make* and hope everything is in the correct location!  You probably will\nneed fix some paths and set some others in the PATH environment variable,\nbut it will eventually work.\n\nAnd, when everything is correctly configured, the result will be something\nlike this:\n\n```$ \n# make\nmake -C src all             CROSS=riscv32-embedded-elf CCPATH=/usr/local/share/gcc-riscv32-embedded-elf/bin/ ARCH=rv32e HARVARD=1\nmake[1]: Entering directory `/home/marcelo/Documents/Verilog/darkriscv/v38/src'\n/usr/local/share/gcc-riscv32-embedded-elf/bin//riscv32-embedded-elf-gcc -Wall -I./include -Os -march=rv32e -mabi=ilp32e -D__RISCV__ -DBUILD=\"\\\"Sat, 30 May 2020 00:55:20 -0300\\\"\" -DARCH=\"\\\"rv32e\\\"\" -S boot.c -o boot.s\n/usr/local/share/gcc-riscv32-embedded-elf/bin//riscv32-embedded-elf-as -march=rv32e -c boot.s -o boot.o\n/usr/local/share/gcc-riscv32-embedded-elf/bin//riscv32-embedded-elf-gcc -Wall -I./include -Os -march=rv32e -mabi=ilp32e -D__RISCV__ -DBUILD=\"\\\"Sat, 30 May 2020 00:55:20 -0300\\\"\" -DARCH=\"\\\"rv32e\\\"\" -S stdio.c -o stdio.s\n/usr/local/share/gcc-riscv32-embedded-elf/bin//riscv32-embedded-elf-as -march=rv32e -c stdio.s -o stdio.o\n/usr/local/share/gcc-riscv32-embedded-elf/bin//riscv32-embedded-elf-gcc -Wall -I./include -Os -march=rv32e -mabi=ilp32e -D__RISCV__ -DBUILD=\"\\\"Sat, 30 May 2020 00:55:21 -0300\\\"\" -DARCH=\"\\\"rv32e\\\"\" -S main.c -o main.s\n/usr/local/share/gcc-riscv32-embedded-elf/bin//riscv32-embedded-elf-as -march=rv32e -c main.s -o main.o\n/usr/local/share/gcc-riscv32-embedded-elf/bin//riscv32-embedded-elf-gcc -Wall -I./include -Os -march=rv32e -mabi=ilp32e -D__RISCV__ -DBUILD=\"\\\"Sat, 30 May 2020 00:55:21 -0300\\\"\" -DARCH=\"\\\"rv32e\\\"\" -S io.c -o io.s\n/usr/local/share/gcc-riscv32-embedded-elf/bin//riscv32-embedded-elf-as -march=rv32e -c io.s -o io.o\n/usr/local/share/gcc-riscv32-embedded-elf/bin//riscv32-embedded-elf-gcc -Wall -I./include -Os -march=rv32e -mabi=ilp32e -D__RISCV__ -DBUILD=\"\\\"Sat, 30 May 2020 00:55:21 -0300\\\"\" -DARCH=\"\\\"rv32e\\\"\" -S banner.c -o banner.s\n/usr/local/share/gcc-riscv32-embedded-elf/bin//riscv32-embedded-elf-as -march=rv32e -c banner.s -o banner.o\n/usr/local/share/gcc-riscv32-embedded-elf/bin//riscv32-embedded-elf-cpp -P  -DHARVARD=1 darksocv.ld.src darksocv.ld\n/usr/local/share/gcc-riscv32-embedded-elf/bin//riscv32-embedded-elf-ld -Tdarksocv.ld -Map=darksocv.map -m elf32lriscv  boot.o stdio.o main.o io.o banner.o -o darksocv.o\n/usr/local/share/gcc-riscv32-embedded-elf/bin//riscv32-embedded-elf-ld: warning: section `.data' type changed to PROGBITS\n/usr/local/share/gcc-riscv32-embedded-elf/bin//riscv32-embedded-elf-objdump -d darksocv.o \u003e darksocv.lst\n/usr/local/share/gcc-riscv32-embedded-elf/bin//riscv32-embedded-elf-objcopy -O binary  darksocv.o darksocv.text --only-section .text* \nhexdump -ve '1/4 \"%08x\\n\"' darksocv.text \u003e darksocv.rom.mem\n#xxd -p -c 4 -g 4 darksocv.o \u003e darksocv.rom.mem\nrm darksocv.text\nwc -l darksocv.rom.mem\n1016 darksocv.rom.mem\necho rom ok.\nrom ok.\n/usr/local/share/gcc-riscv32-embedded-elf/bin//riscv32-embedded-elf-objcopy -O binary  darksocv.o darksocv.data --only-section .*data*\nhexdump -ve '1/4 \"%08x\\n\"' darksocv.data \u003e darksocv.ram.mem\n#xxd -p -c 4 -g 4 darksocv.o \u003e darksocv.ram.mem\nrm darksocv.data\nwc -l darksocv.ram.mem\n317 darksocv.ram.mem\necho ram ok.\nram ok.\necho sources ok.\nsources ok.\nmake[1]: Leaving directory `/home/marcelo/Documents/Verilog/darkriscv/v38/src'\nmake -C sim all             ICARUS=/usr/local/bin/iverilog HARVARD=1\nmake[1]: Entering directory `/home/marcelo/Documents/Verilog/darkriscv/v38/sim'\n/usr/local/bin/iverilog -I ../rtl -o darksocv darksimv.v ../rtl/darksocv.v ../rtl/darkuart.v ../rtl/darkriscv.v\n./darksocv\nWARNING: ../rtl/darksocv.v:280: $readmemh(../src/darksocv.rom.mem): Not enough words in the file for the requested range [0:1023].\nWARNING: ../rtl/darksocv.v:281: $readmemh(../src/darksocv.ram.mem): Not enough words in the file for the requested range [0:1023].\nVCD info: dumpfile darksocv.vcd opened for output.\nreset (startup)\n\n              vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv\n                  vvvvvvvvvvvvvvvvvvvvvvvvvvvv\nrrrrrrrrrrrrr       vvvvvvvvvvvvvvvvvvvvvvvvvv\nrrrrrrrrrrrrrrrr      vvvvvvvvvvvvvvvvvvvvvvvv\nrrrrrrrrrrrrrrrrrr    vvvvvvvvvvvvvvvvvvvvvvvv\nrrrrrrrrrrrrrrrrrr    vvvvvvvvvvvvvvvvvvvvvvvv\nrrrrrrrrrrrrrrrrrr    vvvvvvvvvvvvvvvvvvvvvvvv\nrrrrrrrrrrrrrrrr      vvvvvvvvvvvvvvvvvvvvvv  \nrrrrrrrrrrrrr       vvvvvvvvvvvvvvvvvvvvvv    \nrr                vvvvvvvvvvvvvvvvvvvvvv      \nrr            vvvvvvvvvvvvvvvvvvvvvvvv      rr\nrrrr      vvvvvvvvvvvvvvvvvvvvvvvvvv      rrrr\nrrrrrr      vvvvvvvvvvvvvvvvvvvvvv      rrrrrr\nrrrrrrrr      vvvvvvvvvvvvvvvvvv      rrrrrrrr\nrrrrrrrrrr      vvvvvvvvvvvvvv      rrrrrrrrrr\nrrrrrrrrrrrr      vvvvvvvvvv      rrrrrrrrrrrr\nrrrrrrrrrrrrrr      vvvvvv      rrrrrrrrrrrrrr\nrrrrrrrrrrrrrrrr      vv      rrrrrrrrrrrrrrrr\nrrrrrrrrrrrrrrrrrr          rrrrrrrrrrrrrrrrrr\nrrrrrrrrrrrrrrrrrrrr      rrrrrrrrrrrrrrrrrrrr\nrrrrrrrrrrrrrrrrrrrrrr  rrrrrrrrrrrrrrrrrrrrrr\n\n       INSTRUCTION SETS WANT TO BE FREE\n\nboot0: text@0 data@4096 stack@8192\nboard: simulation only (id=0)\nbuild: darkriscv fw build Sat, 30 May 2020 00:55:21 -0300\ncore0: darkriscv@100.0MHz with rv32e+MT+MAC\nuart0: 115200 bps (div=868)\ntimr0: periodic timer=1000000Hz (io.timer=99)\n\nWelcome to DarkRISCV!\n\u003e no UART input, finishing simulation...\necho simulation ok.\nsimulation ok.\nmake[1]: Leaving directory `/home/marcelo/Documents/Verilog/darkriscv/v38/sim'\nmake -C boards all          BOARD=piswords_rs485_lx9 HARVARD=1\nmake[1]: Entering directory `/home/marcelo/Documents/Verilog/darkriscv/v38/boards'\ncd ../tmp \u0026\u0026 xst -intstyle ise -ifn ../boards/piswords_rs485_lx9/darksocv.xst -ofn ../tmp/darksocv.syr\nReading design: ../boards/piswords_rs485_lx9/darksocv.prj\n\n*** lots of weird FPGA related messages here *** \n\ncd ../tmp \u0026\u0026 bitgen -intstyle ise -f ../boards/avnet_microboard_lx9/darksocv.ut ../tmp/darksocv.ncd\necho done.\ndone.\n```\n\nWhich means that the software compiled and liked correctly, the simulation \nworked correctly and the FPGA build produced a image that can be loaded in \nyour FPGA board with a *make install* (case you has a FPGA board and, of\ncourse, you have a JTAG support script in the board directory).\n\nCase the FPGA is correctly programmed and the UART is attached to a terminal\nemulator, the FPGA will be configured with the DarkRISCV, which will run the\ntest software and produce the following result:\n\n```\n              vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv\n                  vvvvvvvvvvvvvvvvvvvvvvvvvvvv\nrrrrrrrrrrrrr       vvvvvvvvvvvvvvvvvvvvvvvvvv\nrrrrrrrrrrrrrrrr      vvvvvvvvvvvvvvvvvvvvvvvv\nrrrrrrrrrrrrrrrrrr    vvvvvvvvvvvvvvvvvvvvvvvv\nrrrrrrrrrrrrrrrrrr    vvvvvvvvvvvvvvvvvvvvvvvv\nrrrrrrrrrrrrrrrrrr    vvvvvvvvvvvvvvvvvvvvvvvv\nrrrrrrrrrrrrrrrr      vvvvvvvvvvvvvvvvvvvvvv  \nrrrrrrrrrrrrr       vvvvvvvvvvvvvvvvvvvvvv    \nrr                vvvvvvvvvvvvvvvvvvvvvv      \nrr            vvvvvvvvvvvvvvvvvvvvvvvv      rr\nrrrr      vvvvvvvvvvvvvvvvvvvvvvvvvv      rrrr\nrrrrrr      vvvvvvvvvvvvvvvvvvvvvv      rrrrrr\nrrrrrrrr      vvvvvvvvvvvvvvvvvv      rrrrrrrr\nrrrrrrrrrr      vvvvvvvvvvvvvv      rrrrrrrrrr\nrrrrrrrrrrrr      vvvvvvvvvv      rrrrrrrrrrrr\nrrrrrrrrrrrrrr      vvvvvv      rrrrrrrrrrrrrr\nrrrrrrrrrrrrrrrr      vv      rrrrrrrrrrrrrrrr\nrrrrrrrrrrrrrrrrrr          rrrrrrrrrrrrrrrrrr\nrrrrrrrrrrrrrrrrrrrr      rrrrrrrrrrrrrrrrrrrr\nrrrrrrrrrrrrrrrrrrrrrr  rrrrrrrrrrrrrrrrrrrrrr\n\n       INSTRUCTION SETS WANT TO BE FREE\n\nboot0: text@0 data@4096 stack@8192\nboard: piswords rs485 lx9 (id=6)\nbuild: darkriscv fw build Fri, 29 May 2020 23:56:39 -0300\ncore0: darkriscv@100.0MHz with rv32e+MT+MAC\nuart0: 115200 bps (div=868)\ntimr0: periodic timer=1000000Hz (io.timer=99)\n\nWelcome to DarkRISCV!\n\u003e \n```\n\nThe beautiful ASCII RISCV logo was produced by Andrew Waterman! [6]\n\nAs long as the build works, it is possible start make changes, but my\nrecommendation when working with soft processors is *not work* in the\nhardware and software *at the same time*!  This means that is better freeze\nthe hardware and work only with the software *or* freeze the software and\nwork only with the hardware.  It is perfectly possible make your research in\nboth, but not at the same time, otherwise you find the *DarkRISCV* in a\nnon-working state after software and hardware changes and you will not be\nsure where the problem is.\n\n### \"src\" Directory\n\nThe *src* directory contains the source code for the test firmware, which \nincludes the boot code, the main process and auxiliary libraries. The code is\ncompiled via *gcc* in a way that some auxiliary files are produced, \nfor example:\n\n- boot.c: the original C code for the boot process\n- boot.s: the assembler version of the C code, generated automatically by the gcc\n- boot.o: the compiled version of the C code, generated automatically by the gcc\n\nWhen all .o files are produced, the result is linked in a *darksocv.o* ELF \nfile, which is used to produce the *darksocv.bin* file, which is converted to \nhexadecimal and separated in ROM and RAM files (which are loaded by the Verilog\ncode in the blockRAMs). The linker also produces a *darksocv.lst* with a \ncomplete list of the code generated and the *darsocv.map*, which shows the\nmap of all functions and variables in the produced code.\n\nThe firmware concept is very simple:\n\n- boot.c contains the boot code\n- main.c contains the main application code (shell)\n- banner.c contains the riscv banner\n- stdio.c contains a small version of stdio\n- io.c contains the IO interfaces\n\nExtra code can be easily added in the compilation by editing the *src/Makefile*.\n\nFor example, in order to add a lempel-ziv code *lz.c*, it is necessary make the\nMakefile knows that we need the *lz.s* and *lz.o*:\n\n\tOBJS = boot.o stdio.o main.o io.o banner.o lz.o\n\tASMS = boot.s stdio.s main.s io.s banner.s lz.s\n\tSRCS = boot.c stdio.c main.c io.c banner.c lz.c\n\nAnd add a \"lz\" command in the *main.c*, in a way that is possible call \nthe function via the prompt. Alternatively, it is possible entirely replace\nthe provided firmware and use your own firmware.\n\n### \"sim\" Directory\n\nThe simulation, in the other hand will show some waveforms and is possible\ncheck the *DarkRISCV* operation when running the example code.  \n\nThe main simulation tool for *DarkRISCV* is the iSIM from Xilinx ISE 14.7,\nbut the Icarus simulator is also supported via the Makefile in the *sim*\ndirectory (the changes regarding Icarus are active when the symbol\n__ICARUS__ is detected). I also included a workaround for ModelSim, as \npointed by our friend HYF (the changes regarding ModelSim are active when the \nsymbol MODEL_TECH is detected).\n\nThe simulation runs the same firmware as in the real FPGA, but in order to\nimprove the simulation performance, the UART code is not simulated, since\nthe 115200 bps requires lots dead simulation time.\n\n### \"rtl\" Directory\n\nThe RTL directory contains the *DarkRISCV* core and some auxiliary files,\nsuch as the DarkSoCV (a small system-on-chip with ROM, RAM and IO),\nthe DarkUART (a small UART for debug) and the configuration file, where is\npossible enable and disable some features that are described in the\nImplementation Notes section.\n\nFor more detail, check the README.md file in the [rtl](https://github.com/darklife/darkriscv/tree/master/rtl) directory.\n\n### \"board\" Directory\n\nThe current supported boards are:\n\n- id==0   simulation only\n- id==1   avnet_microboard_lx9\n- id==2   xilinx_ac701_a200\n- id==3   qmtech_sdram _lx16\n- id==4   qmtech_spartan7_s15\n- id==5   lattice_brevia2_lxp2\n- id==6   piswords_rs485_lx9\n- id==7   digilent_spartan3_s200\n- id==8   aliexpress_hpc40gbe k420\n- id==9   qmtech_artix7_a35\n- id==10  aliexpress_hpc40gbe_ku040\n- id==11  papilio_duo_logicstart\n- id==12  qmtech_kintex7_k325\n- id==13  scarab_minispartan6-plus_lx9\n- id==14  colorlighti9_ecp5-45f\n- id==15  colorlighti5_ecp5-25f\n- id==16  ulx3s_ecp5-85f\n\nThe organization is self-explained, w/ the vender, board and FPGA model\nin the name of the directory. Each  *board* directory contains the project \nfiles to be open in the Xilinx ISE 14.x, as well Makefiles to build the\nFPGA image regarding that board model. Although a *ucf* file is provided in \norder to generate a complete build with a UART and some LEDs, the FPGA is \nNOT fully wired in any particular configuration and you must add the \npins that you will use in your FPGA board.\n\nAnyway, although not wired, the build always gives you a good estimation \nabout the FPGA utilization and about the timing (because the UART output \nensures that the complete processor must be synthesized).\n\nAs long there are much supported boards, there is no way to test all boards\neverytime, which means that sometimes the changes regarding one board may\naffect other board in a wrong way.\n\n## Implementation Notes*\n\n[*This section is kept for reference, but the description may not match\nexactly with the current code]\n\nSince my target is the ultra-low-cost Xilinx Spartan-6 family of FPGAs, the\nproject is currently based in the Xilinx ISE 14.7 for Linux, which is the\nlatest available ISE version.  However, there is no explicit reference for\nXilinx elements and all logic is inferred directly from Verilog, which means\nthat the project is easily portable to other FPGA families and easily\nportable to other environments, as can be observed in the case of Lattice\nXP2 support.  Anyway, keep in mind that certain Verilog structures may not\nwork well in some FPGAs.\n\nIn the last update I included a way to test the firmware in the x86 host,\nwhich helps as lot, since is possible interact with the firmware and fix\nquickly some obvious bugs. Of course, the x86 code does not run the boot.c\ncode, since makes no sense (?) run the RISCV boot code in the x86.\n\nAnyway, as main recomendation when working with softcores try never work in\nthe hardware and in the software at the same time!  Start with the minimum\nsoftware configuration possible and freeze the software.  When implementing\nnew software updates, use the minium hardware configuration possible and\nfreeze the hardware.\n\nThe RV32I specification itself is really impressive and easy to implement\n(see [1], page 16).  Of course, there are some drawbacks, such as the funny\nlittle-endian bus (opposed to the network oriented big-endian bus found in\nthe 680x0 family), but after some empirical tests it is easy to make work.\n\nThe funny information here is that, after lots of research regarding add\nsupport for big-endian in the *DarkRISCV*, I found no way to make the GCC\ngenerate the code and data correctly.\n\nAnother drawback in the specification is the lacking of delayed branches.\nAlthough i understand that they are bad from the conceptual point of view,\nthey are good trick in order to extract more performance. As reference, the\nlack of delayed branches or branch predictor in the *DarkRISCV* may reduce\nbetween 20 and 30% the performance, in a way that the real measured\nperformance may be between 1.25 and 1.66 clocks per instruction.\n\nAlthough the branch prediction is not complex to implement, I found the\nexperimental multi-threading support far more interesting, as long enable\nuse the idle time in the branches to swap the processor thread.  Anyway, I\nwill try debug the branch prediction code in order to improve the\nsingle-thread performance.\n\nThe core supports 2 or 3-state pipelines and, although the main logic is\nalmost the same, there are huge difference in how they works. Just for\nreference, the following section reflects the historic evolution of the\ncore and may not reflect the current core code.\n\nThe original 2-stage pipeline design has a small problem concerning\nthe ROM and RAM timing, in a way that, in order to pre-fetch and execute the\ninstruction in two clocks and keep the pre-fetch continously working at the\nrate of 1 instruction per clock (and the same in the execution), the ROM and\nRAM must respond before the next clock.  This means that the memories must\nbe combinational or, at least, use a 2-phase clock.\n\nThe first solution for the 2-stage pipeline version with a 2-phase clock is\nthe default solution and makes the *DarkRISCV* work as a pseudo 4-stage\npipeline:\n\n- 1/2 stage for instruction pre-fetch (rom)\n- 1/2 stage for static instruction decode (core)\n- 1/2 stage for address generation, register read and data read/write (ram) \n- 1/2 stage for data write (register write)\n\nFrom the processor point of view, there are only 2 stages and from the\nmemory point of view, there are also 2 stages. But they are in different\nclock phases. In normal conditions, this is not recommended because decreases the\nperformance by a 2x factor, but in the case of *DarkRISCV* the performance\nis always limited by the combinational logic regarding the instruction\nexecution.\n\nThe second solution with a 2-stage pipeline is use combinational logic in\norder to provide the needed results before the next clock edge, in a way\nthat is possible use a single phase clock.  This solution is composed by a\ninstruction and data caches, in a way that when the operand is stored in a\nsmall LUT-based combinational cache, the processor can perform the memory\noperation with no extra wait states.  However, when the operand is not\nstored in the cache, extra wait-states are inserted in order to fetch the\noperand from a blockram or extenal memory.  According to some preliminary\ntests, the instruction cache w/ 64 direct mapped instructions can reach a\nhit ratio of 91%.  The data cache performance, although is not so good (with\na hit ratio of only 68%), will be a requirement in order to access external\nmemory and reduce the impact of slow SDRAMs and FLASHes.\n\nBoth the use of the cache and a 2-phase clock does not perform well, on the\npoint of view of combinational timing.  By this way, a 3-stage pipeline version \nis provided, in order to use a single clock phase with blockrams.\n\nThe concept in this case is separate the pre-fetch and decode, in a way that\nthe pre-fetch can be done entirely in the blockram side for the instruction\nbus. The decode, in a different stage, provides extra performance and the \nexecute stage works with one clock almost all the time, except when the load\ninstruction is executed. In this case, the external memory logic inserts one\nwait-state. The write operation, however, is executed in a single clock.\n\nThe solution with wait-states can be used in the 2-stage pipeline version,\nbut decreases the performance too much. Case is possible run all versions\nwith the same, clock, the theorical performance in clocks per instruction\nCPI), number of clocks to flush the pipeline in the taken branch (FLUSH) and\nmemory wait-states (WSMEM) will be:\n\n- 2-stage pipe w/ 2-phase clock: CPI=1, FLUSH=1, WSMEM=0: real CPI=~1.25\n- 3-stage pipe w/ 1-phase clock: CPI=1, FLUSH=2, WSMEM=1: real CPI=˜1.66\n- 2-stage pipe w/ 1-phase clock: CPI=2, FLUSH=1, WSMEM=1, real CPI=~2.00\n\nEmpiracally, the impact of the FLUSH in the 2-stage pipeline is around 20%\nand in the 3-stage pipeline is 30%. The real impact depends of the code\nitself, of course... In the case of the impact of the wait-states in the\nmemory access regarding the load instruction, the impact ranges between 5\nand 10%, again, depending of the code.\n\nHowever, the clock in the case of the 3-stage pipeline is far better than the\n2-stage pipeline, in special because the better distribuition of the logic\nbetween the decode and execute stages.\n\nCurrently, the most expensive path in the Spartan-6 is the address bus\nfor the data side of the core (connected to RAM and peripherals). The\nproblem regards to the fact that the following actions must be done in a\nsingle clock:\n\n- generate the DADDR[31:0] = REG[SPTR][31:0]+EXTSIG(IMM[11:0])\n- generate the BE[3:0] according to the operand size and DADDR[1:0]\n\nIn the case of read operation, the DATAI path includes also a small mux\nin order to separate RAM and peripheral buses, as well separate the\ndiferent peripherals, which means that the path increases as long the\nnumber of peripherals and the complexity increases.\n\nOf course, the best performance setup uses a 3-state pipeline and a\nsingle-clock phase (posedge) in the entire logic, in a way that the 2-stage\npipeline and dual-clock phase will be kept only for reference.  \n\nThe only disadvantage of the 3-state pipeline is one extra wait-state in the\nload operation and the longer pipeline flush of two clocks in the taken\nbranches.\n\nJust for reference, I registered some details regarding the performance\nmeasurements:\n\nThe current firmware example runs in the 3-stage pipeline version clocked at\n100MHz runs at a verified performance of 62 MIPS.  The theorical 100MIPS\nperformance is not reached 5% due to the extra wait-state in the load\ninstruction and 32% due to pipeline flushes after taken branches.  The\n2-stage pipeline version, in the other side, runs at a verified performance\nof 79MIPS with the same clock.  The only loss regards to 20% due to pipeline\nflushes after a taken branch.\n\nOf course, the impact of the pipeline flush depends also from the software\nand, as long the software is currently optimized for size. When compiled\nwith the -O2 instead of -Os, the performance increase to 68MIPS in the\n3-state pipeline and the loss changed to 6% for load and 25% for the\npipeline flush. The -O3 option resulted in 67MIPS and the best result was\nthe -O1 option, which produced 70MIPS in the 3-stage version and 85MIPS in\nthe 2-stage version.\n\nBy this way, case the performance is a requirement, the src/Makefile must be\nchanged in order to use the -O1 optimization instead of the -Os default. \n\nAnd although the 2-stage version is 15% faster than the 3-stage version, the\n3-stage version can reach better clocks and, by this way, will provide\nbetter performance.\n\nRegarding the pipeline flush, it is required after a taken branch, as long\nthe RISCV does not supports delayed branches.  The solution for this problem\nis implement a branch cache (branch predictor), in a way that the core\npopulates a cache with the last branches and can predict the future\nbranches.  In some inicial tests, the branch prediction with a 4 elements\nentry appers to reach a hit ratio of 60%.\n\nAnother possibility is use the flush time to other tasks, for example handle\ninterrupts.  As long the interrupt handling and, in a general way, threading\nrequires flush the current pipelines in order to change context, by this\nway, match the interrupt/threading with the pipeline flush makes some sense!\n\nWith the option __THREADING__ is possible test this feature. \n\nThe implementation is in very early stages of development and does not\nhandle correctly the initial SP and PC.  Anyway, it works and enables the\nmain() code stop in a gets() while the interrupt handling changes the OPORT\nat a rate of more than 1 million interrupts per second without affecting the\nexecution and with little impact in the performance!  :)\n\nThe interrupt support can be expanded to a more complete threading support,\nbut requires some tricks in the hardware and in the software, in order to \npopulate the different threads with the correct SP and PC.\n\nThe interrupt handling use a concept around threading and, with some extra\neffort, it is probably possible support 4, 8 or event 16 threads.  The\ndrawback in this case is that the register bank increses in size, which\nexplain why the rv32e is an interesting option for threading: with half the\nnumber of registers is possible store two more threads in the core.\n\nCurrently, the time to switch the context in the *darkricv* is two clocks in\nthe 3-stage pipeline, which match with the pipeline flush itself. At 100MHz,\nthe maximum empirical number of context switches per second is around 2.94\nmillion.\n\nAbout the new MAC instruction, it is implemented in a very preliminary way\nwith the opcode 7'b0001011 (custom-0 opcode).  I am checking about the possibility \nto use the p.mac instruction, but at this time the instruction is hand encoded \nin the mac() function available in the stdio.c (i.e.  the darkriscv libc).  \nThe details about include new instructions and make it work with GCC can be \nfound in the reference [5].\n\nThe preliminary tests pointed, as expected, that the performance decreases\nto 90MHz and although it was possible run at 100MHz with a non-zero timing\nscore and reach a peak performance of 100MMAC/s, the small 32-bit\naccumulator saturates too fast and requries extra tricks in order to avoid\noverflows.\n\nThe mul operation uses two 16-bit integers and the result is added with a\nseparate 32-bit register, which works as accumulator.  As long the operation\nis always signed and the signal always use the MSB bit, this means that the\n15x15 mul produces a 30 bit result which is added to a 31-bit value, which\nmeans that the overflow is reached after only two MAC operations.\n\nIn order to avoid overflows, it is possible shift the input operands.  For\nexample, in the case of G711 w/ u-law encoding, the effective resolution is\n14 bits (13 bits for integer and 1 bit for signal), which means that a 13x13\nbit mul will be used and a 26-bit result produced to be added in a 31-bit\ninteger, enough to run 32xMAC operations before overflow (in this case, when\nthe ACC reach a negative value):\n\n    # awk 'BEGIN { ACC=2**31-1; A=2**13-1; B=-A; for(i=0;ACC\u003e=0;i++) print i,A,B,A*B,ACC+=A*B }'\n    0 8191 -8191 -67092481 2080391166\n    1 8191 -8191 -67092481 2013298685\n    2 8191 -8191 -67092481 1946206204\n    ...\n    30 8191 -8191 -67092481 67616736\n    31 8191 -8191 -67092481 524255\n    32 8191 -8191 -67092481 -66568226\n\nIs this theory correct? I am not sure, but looks good! :)\n\nAs complement, I included in the stdio.c the support for the GCC functions\nregarding the native *, / and % (mul, div and mod) operations with 32-bit\nsigned and unsigned integers, which means true 32x32 bit operations\nproducing 32-bit results.  The code was derived from an old 68000-related\nproject (as most of code in the stdio.c) and, although is not so faster, I\nguess it is working. As long the MAC instruction is better defined in the\nsyntax and features, I think is possible optimize the mul/div/mod in order\nto try use it and increase the performance.\n\nHere some additional performance results (synthesis only, 3-stage \nversion) for other Xilinx devices available in the ISE for speed grade 2:\n\n- Spartan-6:\t100MHz (measured 70MIPS w/ gcc -O1)\n- Artix-7: \t178MHz\n- Kintex-7: \t225MHz\n\nFor speed grade 3:\n\n- Spartan-6:\t117MHz\n- Artix-7: \t202MHz\n- Kintex-7:\t266MHz\n\nThe Kintex-7 can reach, theorically 186MIPS w/ gcc -O1.\n\nThis performance is reached w/o the MAC and THREADING activated.  Thanks to\nthe RV32E option, the synthesis for the Spartan-3E is now possible with\nresulting in 95% of LUT occupation in the case of the low-cost 100E model\nand 70MHz clock (synthesis only and speed grade 5):\n\n- Spartan-3E:   70MHz\n\nFor the 2-stage version and speed grade 2, we have less impact from the\npipeline flush (20%), no impact in the load and some impact in the clock due\nto the use of a 2-phase clock:\n\n- Spartan-6:    56MHz (measured 47MIPS w/ -O1)\n\nAbout the compiler performance, from boot until the prompt, tested w/ the\n3-stage pipeline core at 100MHz and no interrupts, rom and ram measured in\n32-bit words:\n\n- gcc w/ -O3: t=289us rom=876 ram=211\n- gcc w/ -O2: t=291us rom=799 ram=211\n- gcc w/ -O1: t=324us rom=660 ram=211\n- gcc w/ -O0: t=569us rom=886 ram=211\n- gcc w/ -Os: t=398us rom=555 ram=211\n\nDue to reduced ROM space in the FPGA, the -Os is the default option.\n\nIn another hand, regarding the support for Vivado, it is possible convert\nthe Artix-7 (Xilinx AC701 available in the ise/boards directory) project to\nVivado and make some interesting tests.  The only problem in the conversion\nis that the UCF file is not converted, which means that a new XDC file with\nthe pin description must be created.\n\nThe Vivado is very slow compared to ISE and needs *lots of time* to\nsynthesise and inform a minimal feedback about the performance...  but after\nsome weeks waiting, and lots of empirical calculations, I get some numbers\nfor speed grade 2 devices:\n\n- Artix7: \t147MHz\n- Spartan-7:\t146MHz\n\nAnd one number for speed grade 3 devices:\n\n- Kintex-7:\t221MHz\n\nAlthough Vivado is far slow and shows pessimistic numbers for the same FPGAs\nwhen compared with ISE, I guess Vivado is more realistic and, at least, it\nsupports the new Spartan-7, which shows very good numbers (almost the same\nas the Artix-7!).\n\nThat values are only for reference.  The real values depends of some options\nin the core, such as the number of pipeline stages, who the memories are\nconnected, etc.  Basically, the best clock is reached by the 3-stage\npipeline version (up to 100MHz in a Spartan-6), but it requires at lease 1\nwait state in the load instruction and 2 extra clocks in the taken branches\nin order to flush the pipeline.  The 2-state pipeline requires no extra wait\nstates and only 1 extra clock in the taken branches, but runs with less\nperformance (56MHz).\n\nWell, my conclusion after some years of research is that the branch\nprediction solve lots of problems regarding the performance, but introduce\nlots of other problems, so the best solution may not implement it but try\nhand optimizations when possible, such as unroll loops.\n\nAnother possible enhancement tested was the DBNZ instruction, well known \non Z80 and 68000, basically a loop instruction which decrements a counter\ntest for zero and branch, repeating a loop until the counter is not zero...\n\nIn the case of RISC-V, a DBNZ intruction impact is very small, basically\nreplacing a SUBI+BNE, but with no effect on the real problem, which is\nthe pipeline flush on branches. So, it was tested in the DarkRISCV a \nspecial variant of DBNZD w/ delayed branch, in a way was possible run 2\nextra instructions after the DBNZ on 3-stage pipeline version (the \nDBNZ was not included in the 2-stage pipeline version).\n\nAlthough the code w/ DBNZD was 3-clocks faster than SUBI+BNE, hand \noptimized schemes, such as code unroll, may reach similar results and\nthe DBNZD was not included in DarkRISCV.\n\n## Development Tools\n\nAbout the gcc compiler, I am working with the experimental gcc 9.0.0 for\nRISC-V.  No patches or updates are required for the *DarkRISCV* other than\nthe -march=rv32i.  Although the fence*, e* and crg* instructions are not\nimplemented, the gcc appears to not use of that instructions and they are\nnot available in the core.\n\nAlthough is possible use the compiler set available in the oficial RISC-V\nsite, our colleagues from *lowRISC* project pointed a more clever way to\nbuild the toolchain:\n\nhttps://www.lowrisc.org/blog/2017/09/building-upstream-risc-v-gccbinutilsnewlib-the-quick-and-dirty-way/\n\nBasically:\n\n\tgit clone --depth=1 git://gcc.gnu.org/git/gcc.git gcc\n\tgit clone --depth=1 git://sourceware.org/git/binutils-gdb.git\n\tgit clone --depth=1 git://sourceware.org/git/newlib-cygwin.git\n\tmkdir combined\n\tcd combined\n\tln -s ../newlib-cygwin/* .\n\tln -sf ../binutils-gdb/* .\n\tln -sf ../gcc/* .\n\tmkdir build\n\tcd build\t\n\t../configure --target=riscv32-unknown-elf --enable-languages=c --disable-shared --disable-threads --disable-multilib --disable-gdb --disable-libssp --with-newlib --with-arch=rv32ima --with-abi=ilp32 --prefix=/usr/local/share/gcc-riscv32-unknown-elf\n\tmake -j4\n\tmake\n\tmake install\n\texport PATH=$PATH:/usr/local/share/gcc-riscv32-unknown-elf/bin/\n\triscv32-unknown-elf-gcc -v\n\nand everything will magically work! (:\n\nCase you have no succcess to build the compiler, have no interest to change\nthe firmware or is just curious about the darkriscv running in a FPGA, the\nproject includes the compiled ROM and RAM, in a way that is possible examine\nall derived objects, sources and correlated files generated by the compiler\nwithout need compile anything.\n\nFinally, as long the *DarkRISCV* is not yet fully tested, sometimes is a\nvery good idea compare the code execution with another stable reference!\n\nIn this case, I am working with the project *picorv32*:\n\nhttps://github.com/cliffordwolf/picorv32\n\nWhen I have some time, I will try create a more well organized support in\norder to easily test both the *DarkRISCV* and *picorv32* in the same cache,\nmemory and IO sub-systems, in order to make possible select the core\naccording to the desired features, for example, use the *DarkRISCV* for more\nperformance or *picorv32* for more features.\n\nAbout the software, the most complex issue is make the memory design match\nwith the linker layout.  Of course, it is a gcc issue and it is not even a\nproblem, in fact, is the way that the software guys works when linking the\ncode and data!\n\nIn the most simplified version, directly connected to blockRAMs, the\n*DarkRISCV* is a pure harvard architecture processor and will requires the\nseparation between the instruction and data blocks!\n\nWhen the cache controller is activated, the cache controller provides\nseparate memories for instruction and data, but provides a interface for a\nmore conventional von neumann memory architecture.\n\nIn both cases, a proper designed linker script (darksocv.ld) probably solves \nthe problem! \n\nThe current memory map in the linker script is the follow:\n\n- 0x00000000: 4KB ROM \n- 0x00001000: 4KB RAM\n\nAlso, the linker maps the IO in the following positions:\n\n- 0x80000000: UART status\n- 0x80000004: UART xmit/recv buffer\n- 0x80000008: LED buffer\n\nThe RAM memory contains the .data area, the .bss area (after the .data \nand initialized with zero), the .rodada and the stack area at the end of RAM.\n\nAlthough the RISCV is defined as little-endian, appears to be easy change\nthe configuration in the GCC.  In this case, it is supposed that the all\nvariables are stored in the big-endian format.  Of course, the change\nrequires a similar change in the core itself, which is not so complex, as\nlong it affects only the load and store instructions.  In the future, I will\ntry test a big-endian version of GCC and darkriscv, in order to evaluate\npossible performance enhancements in the case of network oriented\napplications! :)\n\nFinally, the last update regarding the software included  new option to\nbuild a x86 version in order to help the development by testing exactly the\nsame firmware in the x86.\n\nIn a preliminary way, it is possible build the gcc for RV32E with the folllowing configuration:\n\n    git clone --depth=1 git://gcc.gnu.org/git/gcc.git gcc\n    git clone --depth=1 git://sourceware.org/git/binutils-gdb.git\n    git clone --depth=1 git://sourceware.org/git/newlib-cygwin.git\n    mkdir combined\n    cd combined\n    ln -s ../newlib-cygwin/* .\n    ln -sf ../binutils-gdb/* .\n    ln -sf ../gcc/* .\n    mkdir build\n    cd build\n    ../configure --target=riscv32-embedded-elf --enable-languages=c --disable-shared --disable-threads --disable-multilib --disable-gdb --disable-libssp --with-newlib  --with-arch-rv32e --with-abi=ilp32e --prefix=/usr/local/share/gcc-riscv32-embedded-elf\n    make -j4\n    make\n    make install\n    export PATH=$PATH:/usr/local/share/gcc-riscv32-embedded-elf/bin/\n    riscv32-embedded-elf-gcc -v\n\nCurrently, I found no easy way to make the GCC build big-endian code for\nRISCV. Instead, the easy way is make the endian switch directly in the IO\ndevice or in the memory region.\n\nAs long is not so easy build the GCC in some machines, I left in a public\nshare the source and the pre-compiled binary set of GCC tools for RV32E:\n\nhttps://drive.google.com/drive/folders/1GYkqDg5JBVeocUIG2ljguNUNX0TZ-ic6?usp=sharing\n\nAs far as i remember it was compiled in a Slackware Linux or something like,\nanyway, it worked fine in the Windows 10 w/ WSL and in other linux-like\nenvironments.\n\nAs update, more modern GCC 12+ was tested w/ DarkRISCV without any problem!\n\n## Development Boards\n\nCurrently, the following boards are supported:\n\n- Avnet Microboard LX9: equipped with a Xilinx Spartan-6 LX9 running at 100MHz\n- XilinX AC701 A200: equipped with a Xilinx Artix-7 A200 running at 90MHz\n- QMTech SDRAM LX16: equipped with a Xilinx Spartan-6 LX16 running at 100MHz\n- QMTech NORAM S15: equipped with a Xilinx Spartan-7 S15 running at 100MHz\n- Lattice Brevia2 XP2: equipped with a Lattice XP2-6 running at 50MHz\n- Piswords RS485 LX9: equipped with a Xilinx Spartan-6 LX9 running at 100MHz\n- Digilent S3 Starter Board: equipped with a Xilinx Spartan-3 S200 running at 50MHz\n\nThe speeds are related to available clocks in the boards and different\nclocks may be generated by programming a clock generator. The Spartan-6 is\nfound in most boards and the core runs fine at ~100MHz, regardless the\nfrequency of the main oscillator (typically 50MHz).\n\nAll Xilinx based boards typically supports a 115200 bps UART for console,\nsome LEDs for debug and on-chip 4KB ROM and 4KB RAM (as well the RESET\nbutton to restart the core and the DEBUG signals for an oscilloscope).\n\nIn the case of QMTECH boards, that does not include the JTAG neither the\nUART/USB port, and external USB/UART converter and a low-cost JTAG adapter\ncan solve the problem easily!\n\nThe Lattice Brevia is clocked by the on-board 50MHz oscillator, with the \nUART operating at 115200bps and the LED and DEBUG ports wired to the on-\nboard LEDs.\n\nAlthough the Digilent Spartan-3 Starter Board, this is a very useful board\nto work as reference for LUT4 technology, in a way that is possible improve\nthe support in the future for alternative low-cost LUT4 FPGAs.\n\nIn the software side, a small shell is available with some basic commands:\n\n- clear: clear display\n- dump \u003cval\u003e: dumps an area of the RAM\n- led \u003cval\u003e: change the LED register (which turns on/off the LEDs)\n- timer \u003cval\u003e: change the timer prescaler, which affects the interrupt rate\n- oport \u003cval\u003e: change the OPORT register (which changes the DEBUG lines)\n- iport: print the OPORT register\n\nThe proposal of the shell is provide some basic test features which can\nprovide a go/non-go status about the current hardware status.\n\nUseful memory areas: \n\n- 4096: the start of RAM (data)\n- 4608: the start of RAM (data)\n- 5120: empty area\n- 5632: empty area\n- 6144: empty area\n- 6656: empty area\n- 7168: empty area\n- 7680: the end of RAM (stack)\n\nAs long the *DarkRISCV* uses separate instruction and data buses, it is not\npossible dump the ROM area.  However, this limitation is not present even when\nthe option __HARVARD__ is activated, as long the core is constructed in a\nway that the ROM bus is conected to one bus from a dual-ported memory and\nthe RAM bus is connected to a different bus from the same dual-ported\nmemory. From the *DarkRISCV* point of view, they are fully separated and\nindependent buses, but in reality they area in the same memory area, which\nmakes possible the data bus change the area where the code is stored. With\nthis feature, it will be possible in the future create loadable codes from\nthe FLASH memory! :)\n\n## FuseSoC support\n\nLast xmas (2022) our colleague Lucas Teske added support for FuseSoC in the darkriscv... I am not much aware how it works and how use it, but it is supposed to handle the build tools automatically! It is the same tool used by SERV and I used it to add some kilocore records in the past... in order to make it work in the darkriscv, please try:\n\n- fusesoc run --target=qmtech_artix7_a35 darklife:darkriscv:darksocv\n\nAt this moment, not all boards are really supported yet. Supported boards are:\n\n- Colorlight i9\n- Colorlight i5\n- Lattice iCE40 Devkit\n- QMtech Artix 7 (Vivado)\n\n## Yosys support\n\nOur colleague Hirosh Dabui (from KianRiscV project) added support for \nLattice FPGAs via Yosys, in a way that is possible use makefiles to build \nand program the FPGA directly from Linux!\n\n## Creating a RISCV from scratch\n\nI found that some people are very reticent about the possibility of \ndesigning a RISC-V processor in one night. Of course, it is not so easy \nas it appears and, in fact, it require a lot of experience, planning and \nlucky. Also, the fact that the processor correctly run some few instructions \nand put some garbage in the serial port does not really means that the \ndesign is perfect, instead you will need lots and lots of debug time \nin order to fix all hidden problems.\n\nAs reference, I released some time ago the code from the 16-bit \"microrisc\" \ncore that I designed before DarkRISCV:\n\n- https://github.com/darklife/udarkrisc\n\nAlthough far more simple, it is very close to some original DarkRISCV \nconcepts and was the case for other cores from the same era, targeting small \nXilinx Spartan-3 FPGAs. Oh, since there are lots of \"micro\"-something,\nI renamed it as micro-DarkRISC -- not RISCV -- so it is now part of Dark\nfamily! :)\n\nThere are also other good projects that can be used as reference: \n\nI found a set of online videos from my friend (Lucas Teske) that shows the \ndesign of a RISC-V processor from scratch (playlist with 9 videos):\n\n- https://www.youtube.com/playlist?list=PLEP_M2UAh9q52a-w3ZUEChEoG_ROeMa88\n\nAlternatively, there are the original videos in the twitch:\n\n- https://www.twitch.tv/videos/840983740 Register bank (4h50)\n- https://www.twitch.tv/videos/845651672 Program counter and ALU (3h49)\n- https://www.twitch.tv/videos/846763347 ALU tests, CPU top level (3h47) \n- https://www.twitch.tv/videos/848921415 Computer problems and microcode planning (08h19)\n- https://www.twitch.tv/videos/850859857 instruction decode and execute - part 1/3 (08h56)\n- https://www.twitch.tv/videos/852082786 instruction decode and execute - part 2/3 (10h56)\n- https://www.twitch.tv/videos/858055433 instruction decode and execute - part 3/3 - SoC simulation (10h24)\n- TBD tests in the Lattice FPGA\n- TBD tests in the Lattice FPGA w/ LCD display\n\nUnfortunately the video set is currently in portuguese only and there a lot\nof parallel discussions about technology, including the fix of the Teske's\nnotebook online!  I hope in the future will be possible edit the video set\nand, maybe, create english subtitles.\n\nAbout the processor itself, it is a microcode oriented concept with a\nclassic von neumann archirecture, designed to support more easily different\nISAs.  It is really very different than the traditional RISC cores that we\nfound around!  Also, it includes a very good eco-system around opensource\ntools, such as Icarus, Yosys and gtkWave!\n\nAlthough not finished yet (95% done!), I think it is very illustrative about the RISC-V design:\n\n- rv32e instruction set: very reduced (37) and very ortogonal bit patterns (6) \n- rv32e register set: 16x32-bit register bank and a 32-bit program counter\n- rv32e ALU with basic operations for reg/imm and reg/reg instructions\n- rv32e instruction decode: very simple to understand, very direct to implement\n- rv32e software support: the GCC support provides an easy way to generate code and test it!\n\nThe Teske's proposal is not design the faster RISC-V core ever (we already\nhave lots of faster cores with CPI ~ 1, such as the darkriscv, vexriscv,\netc), but create a clean, reliable and compreensive RISC-V core.\n\nYou can check the code in the following repository:\n\n- https://github.com/racerxdl/riskow\n\nAnother good reference is the KivanRiscV from my friend Hirosh:\n\n- https://github.com/splinedrive/kianRiscV\n\n## Academic Papers and Applications\n\nIn a funny way, the DarkRISCV appears in some academic papers, sometimes in a comparative way, sometimes as a laboratory mouse.\n\n- Design and Implementation of a 256-Bit RISC-V-Based Dynamically Scheduled Very Long Instruction Word on FPGA -- Here we found an interesting comparison between the DarkRISCV versus a huge 8-way VLIW core, as well the Kronos RISCV, the PicoRV32 and the NEORV32. Nice results for DarkRISCV: 2nd place w/ IPC 0.71 and 1st place with only 1500LUTs. https://ieeexplore.ieee.org/iel7/6287639/8948470/09200617.pdf\n\n- ReCon: From the Bitstream to Piracy Detection -- Interesting paper about IP piracy detection, basically how detect an IP inside an bitstream, they used the PicoRV32, OpenRISC and DarkRISCV as IPs to be detected. https://homes.luddy.indiana.edu/lukefahr/papers/skipper_paine20.pdf\n\n- A Low-Cost Fault-Tolerant RISC-V Processor for Space Systems -- Here we found an interesting comparison between low-cost RISCV cores, but in this case the DarkRISCV performs very badly against the PicoRV32, mRISCV, Ibex and a radiation hardned RISCV core. Not sure about the tools and the target, as long I have no access to the paper, just some pictures. https://www.semanticscholar.org/paper/A-Low-Cost-Fault-Tolerant-RISC-V-Processor-for-Santos-Luza/b8cd0b62ac914678f1999df09a4b77b857178d33 \n\n- Fault Classification and Vulnerability Analysis of Microprocessors -- No much information, since the paper will released only in 2022, but the abstract is very interesting, basically they will inject lots of faults in the PicoRV32 and DarkRISCV in order to see what happens. https://repository.tudelft.nl/islandora/object/uuid:4c85a1ba-2721-4563-bb13-31d506d9c906?collection=education\n\nRegarding real world applications, standard embbeded C code typically runs very well with DarkRISCV. Some examples of applications currently in use:\n\n- microcontroller programmers (JTAG and other complex protocols)\n- data compression/decompression (LZ streams)\n- cryptography (RSA and SHA256, requires hardware accelerators)\n- digital signal processing (requires mac instruction)\n\nIn the case of RSA, the simple inclusion of a pipelined 32x32-bit multiplier mapped via IO (i.e. not the M extension, instead an IO mapped register accessible via load/store instructions) increased the RSA performance by a factor of 20x when compared to the bare RV32E instruction set. In the case of SHA256, a complete SHA256 accelerator in hardware and mapped via IO can transfor seconds of processing in few miliseconds. In both cases, no need for complex instruction integration in the core, no need to wait -- the core and the accelerator can work in parallel.\n\n## Performance Comparisons\n\nI tried prepare a fair performance comparison between the DarkRISCV and different FPGAs, but it is not so easy as it appears! The first problem is locate HDL versions of each core, since the tools require verilog or VHDL files in order to build something. The second problem is decide what compile: there are lots of combinations of top level blocks, with different peripherals and concepts. In this case, I included only the core. The third problem regards to the core configuration: it is not easy or clear how to configure them to the minimum area or maximum speed, so I just used the default configuration for all cores. In short words, I solved the problem in a dummy way (which reflects the reality in 95% of time).\n\nAs long I have separate builds for each core, there is the final problem: how analyse the results and rank the cores. My option was calculate how many MIPS is possible in a fixed FPGA, in this case the Kintex-7 K420 with 260600 LUTs of 6-inputs. Different cores will require different amount of LUTs, just divide the total number by the required ammount and we have the theorical number of cores per FPGA (peak cores/FPGA). Also, different cores have different theorical peak IPC. guessed numbers and, according to the synthesis tool in the default setup, will run at different maximum frequencies. As long we know the number of instruction per clock and the maximum number of clocks per second (the maximum frequency), we have the maximum number of instructions per second (peak MIPS) per core. As long we also knows about the maximum number of cores, we can calculate the maximum peak MIPS per FPGA.\n\nThe following list is far from complete, but it is my suggestion to compare different cores:\n\n\tCore\t\tLUT\tFF\tDSP\tBRAM\tIPC\tMFreq\tNcores\tMIPS/k7\tREPO\n\tDarkRISCV\t1177\t246\t0\t0\t1\t150\t221\t33232\thttps://github.com/darklife/darkriscv\n\tVexRISCV\t1993\t1345\t4\t5\t1\t214\t130\t28013\thttps://github.com/m-labs/VexRiscv-verilog\n\tPicoRV32\t1291\t568\t0\t1\t1/4\t309\t201\t15630\thttps://github.com/cliffordwolf/picorv32\n\tSERV\t\t217\t174\t0\t0\t1/32\t367\t1200\t13788\thttps://github.com/olofk/serv\n\tRPU\t\t2943\t1103\t12\t1\t1\t111\t88\t9905\thttps://github.com/Domipheus/RPU\n\tUE RISC-V\t3676\t2289\t4\t0\t1\t124\t70\t8811\thttps://github.com/ultraembedded/riscv\n\tUE BiRISC-V\t15667\t6324\t4\t0\t2\t87\t16\t2917\thttps://github.com/ultraembedded/biriscv\n\nAre that results fair enough? from my point of view, as long the conditions are the same, yes. Are that results true enough? from my point of view, maybe. At least two cases does not match: the DarkRISCV reaches up to 240MHz in the same FPGA when the top level is in the build. Not sure why the core only result in smaller maximum frequency, but the point is try to find a way to compare different cores, so it is okay. The second case is the SERV, which shows a maximum frequency of 367MHz and up to 1200 cores/FPGA. In fact, I tested up to 1000 SERV cores in that FPGA and appears to be possivel fit between 1100 and 1200 cores as result of optimizations across the multi-core hierarchy. The bad news is that the maximum clock barely reached 128MHz per core. Again, the point is try to find a way to compare differnt cores, so it is okay.\n\nFor sure that numbers will be very useful for future designers and the message is clear: keep it simple! Less area per core means more cores in the same area and, in some cases, means better performance. In most cases, less are also means better clock performance, but in this case the better clock is useful only when the IPC is around 1, which is not so easy to keep. Just as example, the DarkRISCV can trully reach the IPC ~ 1, but the code must be optimized by hand and the top level must be changed in a way that there is no latency regarding the BRAM. In a more general way, with less effort, the compiler and a more standard top level can keep the IPC around 0.7, which is good enough for most applications. So, keep your expectives very, very low! :)\n\nIn order try point how fast the DarkRISCV is, some benchmarks are available on the src directory:\n\n- coremark: it is possible measure both the absolute coremark rate as well the relative coremark/MHz value. In the case of relative coremark rate, the 3-stage pipeline will peaks 0.9 coremark/MHz, while the 2-stage will peaks 1.1 coremark/MHz, which is almost the same as an old 486SX core, measured for an RV32E (no mul/div, which would increase the result by 2x!). In the case of absolute coremark rate, the best result would be on the KU040 running at 400MHz and 3-stage pipeline, resulting in 360 loops/s. While the 2-stage is more efficient, it would run at half the clock, resulting in 220 loops/s. When the MAC instruction is enabled, the DarkRISC-v peaks 1.02 coremark/MHz, which points that the coremark is highly dependent on the 32x32-bit mul.\n- dhrystone: it is posible measure the dhrystone value in order to get both the absolute dhrystone/s value (aka DMIPS), as well the relative value (DMIPS/MHz), which is currently 66 DMIPS at 100MHz and 0.66 DMIPS/MHz for 3-stage and 50 DMIPS at 66MHz and 0.75 DMIPS/MHz for 2-stage. When the MAC instruction is enabled, there is no change on the results.\n- primes: it is possible compare the DarkRISCV performance with other RISC-V cores and other architectures, thanks to a huge list included on the prime source code. In the case of DarkRISCV in the basic RV32E setup, the performance is pretty similar to the VexRISCV (RV32IM w/ 5-stage pipeline) and the DarkRISCV can outperform the VexRISCV w/ the MAC instruction is enabled, so the 32x32-bit mul is performed by 3x16x16-bit MAC instructions.\n- MB/s: we typically talk a lot about the bus bandwidth on the DarkRISCV and, since it uses Harvard Architecture, the instruction and data buses can work in parallel, resulting in up to 400MB/s on each bus when clocked at 100MHz. Of course, while the instruction bus really keep that bandwith in full time, the data bus will be conditioned to the load/store flow, so the more realistic values are up to 200MB/s on load and 400MB/s on store, when running on advanced Xilinx FPGAs (series 6+). When running on less advanced FPGAs, both load and store may be limited to 200MB/s, because the need of read-modify-write cycles... as reference, a good memcpy128() otimized to move 16-bytes per loop may takes 18 clocks and peaks only 88MB/s on advanced FPGAs (https://godbolt.org/z/75a531zn6), part because we need load and store data (so we have half of bandwidth), part because BRAM needs 2 clocks on read.\n- synth on Xilinx FPGAs: another good benchmark is the synth on Xilinx FPGAs, which result typically in ~1000LUTs/core and ~240MHz on Kintex-7... by scaling the relative benchmarks above, it is possible calculate the coremarks or dhrystones on a specific FPGA and, dividing the total LUT count by the core LUT count it is possible calculate how much cores fit on that FPGA and scale up the benchmark results, in order to calculate the total performance per chip!\n\n## Acknowledgments\n\nSpecial thanks to my old colleagues from the Verilog/VHDL/IT area:\n\n- Paulo Matias (jedi master and verilog/bluespec/riscv guru)\n- Paulo Bernardi (co-worker and verilog guru)\n- Evandro Hauenstein (co-worker and git guru)\n- Lucas Mendes (technology guru)\n- Marcelo Toledo (technology guru)\n- Fabiano Silos (technology guru)\n\nAlso, special thanks to the \"friends of darkriscv\" that found the project in\nthe internet and contributed in any way to make it better:\n\n- Guilherme Barile (technology guru and first guy to post anything about the darkriscv! [2]).\n- Alasdair Allan (technology guru, posted an article about the darkriscv [3]) \n- Gareth Halfacree (technology guru, posted an article about the DarkRISCV [4])\n- Ivan Vasilev (ported DarkRISCV for Lattice Brevia XP2!)\n- timdudu from github (fix in the LDATA and found a bug in the BCC instruction)\n- hyf6661669 from github (lots of contributions, including the fixes regarding the AUIPC and S{B,W,L} instructions, ModelSIM simulation, the memory byte select used by store/load instructions and much more!)\n- zmeiresearch from github (support for Lattice XP2 Brevia board)\n- Hirosh Dubai (motivation and lots of talks about RISCV!)\n- All other colleagues from github that contributed with fixes, corrections and suggestions.\n\nFinally, thanks to all people who directly and indirectly contributed to\nthis project, including the company I work for and all colleagues that\ntested the *DarkRISCV*.\n\n## References\n\n\t[1] https://www.amazon.com/RISC-V-Reader-Open-Architecture-Atlas/dp/099924910X\n\t[2] https://news.ycombinator.com/item?id=17852876\n\t[3] https://blog.hackster.io/the-rise-of-the-dark-risc-v-ddb49764f392\n\t[4] https://abopen.com/news/darkriscv-an-overnight-bsd-licensed-risc-v-implementation/\n\t[5] http://quasilyte.dev/blog/post/riscv32-custom-instruction-and-its-simulation/\n\t[6] https://github.com/riscv/riscv-pk/blob/master/bbl/riscv_logo.txt\n\n[WorkflowBadgeLinux]: https://github.com/darklife/darkriscv/workflows/Linux/badge.svg\n[WorkflowUrlLinux]: https://github.com/darklife/darkriscv/actions/workflows/Linux.yml\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdarklife%2Fdarkriscv","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdarklife%2Fdarkriscv","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdarklife%2Fdarkriscv/lists"}