Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/arrmansa/Basic-UI-for-GPT-J-6B-with-low-vram

A repository to run gpt-j-6b on low vram machines (4.2 gb minimum vram for 2000 token context, 3.5 gb for 1000 token context). Model loading takes 12gb free ram.
https://github.com/arrmansa/Basic-UI-for-GPT-J-6B-with-low-vram

gpt gpt-neo transformers

Last synced: 2 months ago
JSON representation

A repository to run gpt-j-6b on low vram machines (4.2 gb minimum vram for 2000 token context, 3.5 gb for 1000 token context). Model loading takes 12gb free ram.

Awesome Lists containing this project

README

        

# Basic-UI-for-GPT-J-6B-with-low-vram
A repository to run GPT-J-6B on low vram systems by using both ram, vram and pinned memory.

## There seem to be some issues with the weights in the drive link. There seems to be some performance loss, most likely because of poor 16 bit conversion.

## How to run :
Use - pip install git+https://github.com/finetuneanon/transformers@gpt-neo-localattention3

Use the link - https://drive.google.com/file/d/1tboTvohQifN6f1JiSV8hnciyNKvj9pvm/view?usp=sharing to dowload the model that has been saved as described here - https://github.com/arrmansa/saving-and-loading-large-models-pytorch

## Timing (2000 token context)
### 1
#### system -

16 gb ddr4 ram . 1070 8gb gpu.

23 blocks on ram (ram_blocks = 23) out of which 18 are on shared/pinned memory (max_shared_ram_blocks = 18).

#### timing -

single run of the model(inputs) takes 6.5 seconds.

35 seconds to generate 25 tokens at 2000 context. (1.4 seconds/token)

### 2
#### system -

16 gb ddr4 ram . 1060 6gb gpu.

26 blocks on ram (ram_blocks = 26) out of which 18 are on shared/pinned memory (max_shared_ram_blocks = 18).

#### timing -

40 seconds to generate 25 tokens at 2000 context. (1.6 seconds/token)