Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/roryclear/transformer
https://github.com/roryclear/transformer
Last synced: about 1 month ago
JSON representation
- Host: GitHub
- URL: https://github.com/roryclear/transformer
- Owner: roryclear
- Created: 2024-02-07T11:10:19.000Z (11 months ago)
- Default Branch: main
- Last Pushed: 2024-04-12T19:30:53.000Z (9 months ago)
- Last Synced: 2024-04-14T04:23:43.111Z (9 months ago)
- Language: Python
- Size: 383 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Transformer
## Running:
### Online Notebook (CUDA)
https://www.kaggle.com/code/roryclear/cuda-gpt2-demo### Locally
git clone --depth 1 https://github.com/roryclear/transformer.gitpip install -r requirements.txt
pip install pycuda (CUDA only)
pip install pyobjc-framework-Metal (Apple Silicon only)
python demo.py --p="your prompt"
## CUDA, OpenCL and Metal GPT-2 inference
- [ ] Fastest Inference- [X] Support for all GPT-2 Models
- [X] Multiple Compute Languages
- [X] Remove all Numpy usage (For calculations)
- [ ] Support any transformer
## Performance:
### CUDA
|T4 x2 (Kaggle Cloud) | GPT2 |GPT2-Medium |GPT2-Large |
| ----------- | ----------- |------ |---- |
| tinygrad |74 t/s |39 t/s |24 t/s |
| huggingface/transformers |30 t/s |12 t/s |6.1 t/s |
|**roryclear/transformer** |**75 t/s** |**31 t/s** |**15 t/s** ||P100 (Kaggle Cloud) | GPT2 |GPT2-Medium |GPT2-Large |
| ----------- | ----------- |------ |---- |
| tinygrad |57 t/s |31 t/s |20 t/s |
| huggingface/transformers |31 t/s |12 t/s |6.0 t/s |
|**roryclear/transformer** |**59 t/s** |**21 t/s** |**10 t/s** |### Metal
|Apple M2 | GPT2 |GPT2-Medium |GPT2-Large |
| ----------- | ----------- |------ |---- |
| tinygrad |30 t/s |22 t/s |15 t/s |
| huggingface/transformers |53 t/s |17 t/s |8 t/s |
| **roryclear/transformer** |**33 t/s** |**16 t/s** |**10 t/s** |### OpenCL
|Intel Integrated Graphics (2020 XPS13) | GPT2 |GPT2-Medium |GPT2-Large |
| ----------- | ----------- |------ |---- |
| tinygrad |16 t/s |5.8 t/s |2.1 t/s |
| huggingface/transformers |34 t/s |15 t/s |7.7 t/s |
|**roryclear/transformer** |**11 t/s** |**5.1 t/s** |**3.1 t/s**|*generating 100 tokens from a 13 token prompt, I don't own any Nvidia hardware to measure CUDA speeds properly.