Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/wpmed92/f16-to-f32-gpu
Decompress f16 data to f32 using WebGPU
https://github.com/wpmed92/f16-to-f32-gpu
gpu javascript parallel-computing webgpu
Last synced: 14 days ago
JSON representation
Decompress f16 data to f32 using WebGPU
- Host: GitHub
- URL: https://github.com/wpmed92/f16-to-f32-gpu
- Owner: wpmed92
- License: mit
- Created: 2023-12-08T10:18:41.000Z (about 1 year ago)
- Default Branch: main
- Last Pushed: 2023-12-10T10:16:20.000Z (about 1 year ago)
- Last Synced: 2024-12-08T02:43:31.489Z (15 days ago)
- Topics: gpu, javascript, parallel-computing, webgpu
- Language: JavaScript
- Homepage:
- Size: 85.9 KB
- Stars: 4
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE.md
Awesome Lists containing this project
README
# f16-to-f32-gpu
[![Unit Tests](https://github.com/wpmed92/f16-to-f32-gpu/actions/workflows/test.yml/badge.svg)](https://github.com/wpmed92/f16-to-f32-gpu/actions/workflows/test.yml) [![npm version](https://img.shields.io/npm/v/f16-to-f32-gpu.svg?style=flat)](https://www.npmjs.com/package/f16-to-f32-gpu)
This project's goal is to allow fast decompression of large half-float arrays to `Float32Array` in JavaScript. Since half-precision floating point arrays are not natively supported in JavaScript, decoding float16 data, and doing it fast, is challenging. CPU-based solutions are slow, so we utilize WebGPU to process values in parallel on the GPU.## Usage
The following input types are supported: `Uint8Array`, `Uint16Array`, `Uint32Array`
Passing any other type will result in a `ErrorReason.UNSUPPORTED_TYPE` error.
The input data has to be 2-byte aligned, otherwise an `ErrorReason.UNALIGNED_INPUT` will be raised.
Internally, we view the input data as `Uint32Array`, so we perform 4-byte alignment when needed. By doing this, we can decode two half-float values in a single kernel invocation. (the lower, and upper 16 bits of the input `u32` value are decoded to `f32`)```JavaScript
import { f16tof32GPU } from "f16-to-f32-gpu";try {
const f32FromUint8 = await f16tof32GPU(new Uint8Array([0x00, 0xC0]));
const f32FromUint16 = await f16tof32GPU(new Uint16Array([0xC000]));
const f32FromUint32 = await f16tof32GPU(new Uint32Array([0xC0000000]));
console.log(f32FromUint8[0]); // -2
console.log(f32FromUint16[0]); // -2
console.log(f32FromUint32[0]); // -2
} catch (error) {
console.log(`Error: ${error.cause}, ${error.message}`)
}
```## Used by
- tinygrad [Stable Diffusion WebGPU port](https://github.com/tinygrad/tinygrad/tree/master/examples/webgpu/stable_diffusion): try it [here](https://softwiredtech.github.io/stable-diffusion-webgpu/). Since `f16` support is limited in browsers, the compute in Stable Diffusion WebGPU is in `f32`. However, the `f32` weights of the model used in the demo exceed 4 gigabytes, which is too much data to download and then cache in the browser. To optimize for weight download speed, the demo fetches the weights in `f16`, and all the >2 Gigabytes of data are decompressed client-side using `f16tof32GPU`. The decompressed `f32` buffer is then used by the inference WebGPU kernels.
## License
MIT