{"id":44539391,"url":"https://github.com/avikde/tiny-xpu","last_synced_at":"2026-02-13T18:55:46.459Z","repository":{"id":337694140,"uuid":"1154636405","full_name":"avikde/tiny-xpu","owner":"avikde","description":"Modular systolic array with software interface","archived":false,"fork":false,"pushed_at":"2026-02-10T22:33:24.000Z","size":7,"stargazers_count":1,"open_issues_count":3,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-02-10T22:54:46.247Z","etag":null,"topics":["npu","systemverilog","systolic-array","testbench","tpu"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/avikde.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-02-10T16:02:31.000Z","updated_at":"2026-02-10T19:57:56.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/avikde/tiny-xpu","commit_stats":null,"previous_names":["avikde/tiny-xpu"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/avikde/tiny-xpu","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/avikde%2Ftiny-xpu","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/avikde%2Ftiny-xpu/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/avikde%2Ftiny-xpu/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/avikde%2Ftiny-xpu/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/avikde","download_url":"https://codeload.github.com/avikde/tiny-xpu/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/avikde%2Ftiny-xpu/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":29414286,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-02-13T06:24:03.484Z","status":"ssl_error","status_checked_at":"2026-02-13T06:23:12.830Z","response_time":78,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["npu","systemverilog","systolic-array","testbench","tpu"],"created_at":"2026-02-13T18:55:42.722Z","updated_at":"2026-02-13T18:55:46.446Z","avatar_url":"https://github.com/avikde.png","language":"Python","readme":"# tiny-xpu\n\n## Project goal\n\nWhile there are other projects building up small (~2x2) TPU-inspired designs (see related projects below), this project has a salient combination of goals:\n\n- Modular SystemVerilog setup to support non-rectangular systolic architectures\n- Easy software interface via ONNX EP and maybe others\n- Support for FPGA deployment\n\n## Setup, build, and test\n\nSet up in WSL or other Linux: \n\n- `sudo apt install iverilog` -- Icarus Verilog for simulation\n- Install the [Surfer waveform viewer](https://marketplace.visualstudio.com/items?itemName=surfer-project.surfer) VSCode extension for viewing `.vcd` waveform files\n- `sudo apt install yosys` -- Yosys for synthesis (or [build from source](https://github.com/YosysHQ/yosys) for the latest version)\n- `pip install cocotb` -- Python tool for more powerful testing capabilities\n\nBuild:\n\n```shell\nmkdir -p build \u0026\u0026 cd build\ncmake ..\nmake -j\n```\n\nTest:\n\n```shell\ncd build \u0026\u0026 ctest --verbose\n```\n\nTests produce waveform files (`*.fst`) in `test/sim_build/`. Open them in VSCode with the Surfer extension to inspect signals.\n\n## Architecture\n\n### PE (`pe.sv`)\n\nProcessing Element (PE) for systolic array, named as in Kung (1982)\n\n- Performs multiply-accumulate: `acc += weight * data_in`\n- Passes data through to neighboring PEs via `data_out`\n- The PE does `int8 × int8 → int32`, then `int32 + int32 → int32`\n- `int8×int8→int32` is the standard choice (used by [Google's TPUs](https://cloud.google.com/blog/products/compute/accurate-quantized-training-aqt-for-tpu-v5e), [Arm NEON `sdot`](https://developer.arm.com/architectures/instruction-sets/intrinsics/vdot_s32), etc.)\n\nIn a systolic array, there are two distinct phases:\n\n1. Weight loading phase (`weight_ld=1, en=0`): Before computation begins, you load each PE with its weight from the weight matrix. In a 2x2 systolic array doing `C = A × B`, each PE gets one element of B. This happens once per matrix multiply (or once per tile, for larger matrices).\n2. Compute phase (`weight_ld=0, en=1`): The weights stay \"stationary\" (this is the weight-stationary dataflow). Input activations stream through via data_in/data_out, and partial sums accumulate via acc_in/acc_out. The weights don't change during this phase.\n\nSo the typical sequence is:\n\n- Load weights for all PEs (a few cycles with `weight_ld=1`)\n- Stream many inputs through with weights held fixed (`en=1, weight_ld=0`)\n- When you need new weights (next layer, next tile), load again\n\nThis is why it's called \"weight-stationary\" — weights move once, data flows repeatedly\n\n## Related projects\n\nThere are a number of \"tiny TPU\"-type projects, due to the current popularity of TPUs and LLMs.\n\n- [tiny-tpu-v2/tiny-tpu](https://github.com/tiny-tpu-v2/tiny-tpu/tree/main)\n- [Alanma23/tinytinyTPU](https://github.com/Alanma23/tinytinyTPU)\n\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Favikde%2Ftiny-xpu","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Favikde%2Ftiny-xpu","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Favikde%2Ftiny-xpu/lists"}