https://github.com/hughperkins/verigpu

OpenSource GPU, in Verilog, loosely based on RISC-V ISA
https://github.com/hughperkins/verigpu

asic-design gpu gpu-acceleration hardware-designs machine-learning risc-v risc-v-assembly verification verilog

Last synced: 17 days ago
JSON representation

OpenSource GPU, in Verilog, loosely based on RISC-V ISA

Host: GitHub
URL: https://github.com/hughperkins/verigpu
Owner: hughperkins
License: mit
Created: 2022-03-01T10:49:57.000Z (over 3 years ago)
Default Branch: main
Last Pushed: 2024-11-22T04:42:11.000Z (12 months ago)
Last Synced: 2025-02-06T02:47:06.557Z (9 months ago)
Topics: asic-design, gpu, gpu-acceleration, hardware-designs, machine-learning, risc-v, risc-v-assembly, verification, verilog
Language: SystemVerilog
Homepage:
Size: 6.76 MB
Stars: 900
Watchers: 31
Forks: 102
Open Issues: 11
Metadata Files:
- Readme: README.md
- Funding: .github/FUNDING.yml
- License: LICENSE

Awesome Lists containing this project

README

          # OpenSource GPU

Build an opensource GPU, targeting ASIC tape-out, for [machine learning](https://en.wikipedia.org/wiki/Machine_learning)  ("ML"). Hopefully, can get it to work with the [PyTorch](https://pytorch.org) deep learning framework.

# Vision

Create an opensource GPU for machine learning.

I don't actually intend to tape this out myself, but I intend to do what I can to verify somehow that tape-out would work ok, timings ok, etc.

Intend to implement a [HIP](https://github.com/ROCm-Developer-Tools/HIP) API, that is compatible with [pytorch](https://pytorch.org) machine learning framework. Open to provision of other APIs, such as [SYCL](https://www.khronos.org/sycl/) or [NVIDIA® CUDA™](https://developer.nvidia.com/cuda-toolkit).

Internal GPU Core ISA loosely compliant with [RISC-V](https://riscv.org/technical/specifications/) ISA. Where RISC-V conflicts with designing for a GPU setting, we break with RISC-V.

Intend to keep the cores very focused on ML. For example, [brain floating point](https://en.wikipedia.org/wiki/Bfloat16_floating-point_format) ("BF16") throughout, to keep core die area low. This should keep the per-core cost low. Similarly, Intend to implement only few float operations critical to ML, such as `exp`, `log`, `tanh`, `sqrt`.

# Architecture

Big Picture:

![Big Picture](docs/img/overall.png)

GPU Die Architecture:

![GPU Die Architecture](/docs/img/gpu_die.png)

Single Core:

![Single Core](/docs/img/core.png)

Single-source compilation and runtime

![End-to-end Architecture](/docs/img/endtoend.png)

# Simulation

## Single-source C++

Single-source C++:

- [examples/cpp_single_source/sum_ints.cpp](/examples/cpp_single_source/sum_ints.cpp)

![Single-source C++](/docs/img/single_source_code.png)

Compile the GPU and runtime:

- CMakeLists.txt: [src/gpu_runtime/CMakeLists.txt](/src/gpu_runtime/CMakeLists.txt)

- GPU runtime: [src/gpu_runtime/gpu_runtime.cpp](/src/gpu_runtime/gpu_runtime.cpp)

- GPU controller: [src/gpu_controller.sv](/src/gpu_controller.sv)

- Single GPU RISC-V core: [src/core.sv](/src/core.sv)

![Compile GPU and runtime](/docs/img/compile_gpu_and_runtime.png)

Compile the single-source C++, and run:

- [examples/cpp_single_source/run.sh sum_ints](/examples/cpp_single_source/run.sh)

![Run single-source example](/docs/img/single_source_run.png)

# Planning

What direction are we thinking of going in? What works already? See:

- [docs/planning.md](docs/planning.md)

# Tech details

Our assembly language implementation and progress. Design of GPU memory, registers, and so on. See:

- [docs/tech_details.md](docs/tech_details.md)

# Verification

If we want to tape-out, we need solid verification. Read more at:

- [docs/verification.md](docs/verification.md)

# Metrics

we want the GPU to run quickly, and to use minimal die area. Read how we measure timings and area at:

- [docs/metrics.md](docs/metrics.md)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/hughperkins/verigpu

Awesome Lists containing this project

README