Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/arrmansa/Basic-UI-for-GPT-J-6B-with-low-vram
A repository to run gpt-j-6b on low vram machines (4.2 gb minimum vram for 2000 token context, 3.5 gb for 1000 token context). Model loading takes 12gb free ram.
https://github.com/arrmansa/Basic-UI-for-GPT-J-6B-with-low-vram
gpt gpt-neo transformers
Last synced: 2 months ago
JSON representation
A repository to run gpt-j-6b on low vram machines (4.2 gb minimum vram for 2000 token context, 3.5 gb for 1000 token context). Model loading takes 12gb free ram.
- Host: GitHub
- URL: https://github.com/arrmansa/Basic-UI-for-GPT-J-6B-with-low-vram
- Owner: arrmansa
- License: apache-2.0
- Created: 2021-06-22T22:18:16.000Z (over 3 years ago)
- Default Branch: main
- Last Pushed: 2021-12-23T06:41:25.000Z (about 3 years ago)
- Last Synced: 2024-08-03T18:22:07.934Z (6 months ago)
- Topics: gpt, gpt-neo, transformers
- Language: Jupyter Notebook
- Homepage:
- Size: 48.8 KB
- Stars: 114
- Watchers: 5
- Forks: 12
- Open Issues: 4
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- stars - arrmansa/Basic-UI-for-GPT-J-6B-with-low-vram - A repository to run gpt-j-6b on low vram machines (4.2 gb minimum vram for 2000 token context, 3.5 gb for 1000 token context). Model loading takes 12gb free ram. (Jupyter Notebook)
README
# Basic-UI-for-GPT-J-6B-with-low-vram
A repository to run GPT-J-6B on low vram systems by using both ram, vram and pinned memory.## There seem to be some issues with the weights in the drive link. There seems to be some performance loss, most likely because of poor 16 bit conversion.
## How to run :
Use - pip install git+https://github.com/finetuneanon/transformers@gpt-neo-localattention3
Use the link - https://drive.google.com/file/d/1tboTvohQifN6f1JiSV8hnciyNKvj9pvm/view?usp=sharing to dowload the model that has been saved as described here - https://github.com/arrmansa/saving-and-loading-large-models-pytorch## Timing (2000 token context)
### 1
#### system -
16 gb ddr4 ram . 1070 8gb gpu.
23 blocks on ram (ram_blocks = 23) out of which 18 are on shared/pinned memory (max_shared_ram_blocks = 18).#### timing -
single run of the model(inputs) takes 6.5 seconds.
35 seconds to generate 25 tokens at 2000 context. (1.4 seconds/token)### 2
#### system -
16 gb ddr4 ram . 1060 6gb gpu.
26 blocks on ram (ram_blocks = 26) out of which 18 are on shared/pinned memory (max_shared_ram_blocks = 18).#### timing -
40 seconds to generate 25 tokens at 2000 context. (1.6 seconds/token)