https://github.com/hughperkins/verigpu
  
  
    OpenSource GPU, in Verilog, loosely based on RISC-V ISA 
    https://github.com/hughperkins/verigpu
  
asic-design gpu gpu-acceleration hardware-designs machine-learning risc-v risc-v-assembly verification verilog
        Last synced: 17 days ago 
        JSON representation
    
OpenSource GPU, in Verilog, loosely based on RISC-V ISA
- Host: GitHub
 - URL: https://github.com/hughperkins/verigpu
 - Owner: hughperkins
 - License: mit
 - Created: 2022-03-01T10:49:57.000Z (over 3 years ago)
 - Default Branch: main
 - Last Pushed: 2024-11-22T04:42:11.000Z (12 months ago)
 - Last Synced: 2025-02-06T02:47:06.557Z (9 months ago)
 - Topics: asic-design, gpu, gpu-acceleration, hardware-designs, machine-learning, risc-v, risc-v-assembly, verification, verilog
 - Language: SystemVerilog
 - Homepage:
 - Size: 6.76 MB
 - Stars: 900
 - Watchers: 31
 - Forks: 102
 - Open Issues: 11
 - 
            Metadata Files:
            
- Readme: README.md
 - Funding: .github/FUNDING.yml
 - License: LICENSE
 
 
Awesome Lists containing this project
README
          # OpenSource GPU
Build an opensource GPU, targeting ASIC tape-out, for [machine learning](https://en.wikipedia.org/wiki/Machine_learning)  ("ML"). Hopefully, can get it to work with the [PyTorch](https://pytorch.org) deep learning framework.
# Vision
Create an opensource GPU for machine learning.
I don't actually intend to tape this out myself, but I intend to do what I can to verify somehow that tape-out would work ok, timings ok, etc.
Intend to implement a [HIP](https://github.com/ROCm-Developer-Tools/HIP) API, that is compatible with [pytorch](https://pytorch.org) machine learning framework. Open to provision of other APIs, such as [SYCL](https://www.khronos.org/sycl/) or [NVIDIA® CUDA™](https://developer.nvidia.com/cuda-toolkit).
Internal GPU Core ISA loosely compliant with [RISC-V](https://riscv.org/technical/specifications/) ISA. Where RISC-V conflicts with designing for a GPU setting, we break with RISC-V.
Intend to keep the cores very focused on ML. For example, [brain floating point](https://en.wikipedia.org/wiki/Bfloat16_floating-point_format) ("BF16") throughout, to keep core die area low. This should keep the per-core cost low. Similarly, Intend to implement only few float operations critical to ML, such as `exp`, `log`, `tanh`, `sqrt`.
# Architecture
Big Picture:

GPU Die Architecture:

Single Core:

Single-source compilation and runtime

# Simulation
## Single-source C++
Single-source C++:
- [examples/cpp_single_source/sum_ints.cpp](/examples/cpp_single_source/sum_ints.cpp)

Compile the GPU and runtime:
- CMakeLists.txt: [src/gpu_runtime/CMakeLists.txt](/src/gpu_runtime/CMakeLists.txt)
- GPU runtime: [src/gpu_runtime/gpu_runtime.cpp](/src/gpu_runtime/gpu_runtime.cpp)
- GPU controller: [src/gpu_controller.sv](/src/gpu_controller.sv)
- Single GPU RISC-V core: [src/core.sv](/src/core.sv)

Compile the single-source C++, and run:
- [examples/cpp_single_source/run.sh sum_ints](/examples/cpp_single_source/run.sh)

# Planning
What direction are we thinking of going in? What works already? See:
- [docs/planning.md](docs/planning.md)
# Tech details
Our assembly language implementation and progress. Design of GPU memory, registers, and so on. See:
- [docs/tech_details.md](docs/tech_details.md)
# Verification
If we want to tape-out, we need solid verification. Read more at:
- [docs/verification.md](docs/verification.md)
# Metrics
we want the GPU to run quickly, and to use minimal die area. Read how we measure timings and area at:
- [docs/metrics.md](docs/metrics.md)