https://github.com/roryclear/transformer

Last synced: 4 months ago
JSON representation

Host: GitHub
URL: https://github.com/roryclear/transformer
Owner: roryclear
Created: 2024-02-07T11:10:19.000Z (over 1 year ago)
Default Branch: main
Last Pushed: 2024-04-12T19:30:53.000Z (about 1 year ago)
Last Synced: 2024-04-14T04:23:43.111Z (about 1 year ago)
Language: Python
Size: 383 MB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

        # Transformer

## Running:

### Online Notebook (CUDA)

https://www.kaggle.com/code/roryclear/cuda-gpt2-demo

### Locally

git clone --depth 1 https://github.com/roryclear/transformer.git

pip install -r requirements.txt

pip install pycuda (CUDA only)

pip install pyobjc-framework-Metal (Apple Silicon only)

python demo.py --p="your prompt" 

## CUDA, OpenCL and Metal GPT-2 inference

- [ ] Fastest Inference

- [X] Support for all GPT-2 Models

- [X] Multiple Compute Languages

- [X] Remove all Numpy usage (For calculations)

- [ ] Support any transformer

## Performance:

### CUDA

|T4 x2 (Kaggle Cloud)         | GPT2          |GPT2-Medium    |GPT2-Large |

| -----------                 | -----------   |------         |----       |

| tinygrad                    |74 t/s         |39 t/s         |24 t/s     |

| huggingface/transformers    |30 t/s         |12 t/s         |6.1 t/s    |  

|**roryclear/transformer**    |**75 t/s**     |**31 t/s**     |**15 t/s** |

|P100 (Kaggle Cloud)          | GPT2          |GPT2-Medium    |GPT2-Large |

| -----------                 | -----------   |------         |----       |

| tinygrad                    |57 t/s         |31 t/s         |20 t/s     |

| huggingface/transformers    |31 t/s         |12 t/s         |6.0 t/s    |  

|**roryclear/transformer**    |**59 t/s**     |**21 t/s**     |**10 t/s** |

### Metal

|Apple M2                   | GPT2          |GPT2-Medium    |GPT2-Large |

| -----------               | -----------   |------         |----       |

| tinygrad                  |30 t/s         |22 t/s         |15 t/s     |

| huggingface/transformers  |53 t/s         |17 t/s         |8 t/s      |  

| **roryclear/transformer** |**33 t/s**     |**16 t/s**     |**10 t/s**  |

### OpenCL

|Intel Integrated Graphics (2020 XPS13)         | GPT2          |GPT2-Medium    |GPT2-Large |

| -----------                                   | -----------   |------         |----       |

| tinygrad                                      |16 t/s         |5.8 t/s        |2.1 t/s    |

| huggingface/transformers                      |34 t/s         |15 t/s         |7.7 t/s    |  

|**roryclear/transformer**                      |**11 t/s**     |**5.1 t/s**    |**3.1 t/s**|

*generating 100 tokens from a 13 token prompt, I don't own any Nvidia hardware to measure CUDA speeds properly.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/roryclear/transformer

Awesome Lists containing this project

README