https://github.com/arrmansa/Basic-UI-for-GPT-J-6B-with-low-vram

A repository to run gpt-j-6b on low vram machines (4.2 gb minimum vram for 2000 token context, 3.5 gb for 1000 token context). Model loading takes 12gb free ram.
https://github.com/arrmansa/Basic-UI-for-GPT-J-6B-with-low-vram

gpt gpt-neo transformers

Last synced: 2 months ago
JSON representation

A repository to run gpt-j-6b on low vram machines (4.2 gb minimum vram for 2000 token context, 3.5 gb for 1000 token context). Model loading takes 12gb free ram.

Host: GitHub
URL: https://github.com/arrmansa/Basic-UI-for-GPT-J-6B-with-low-vram
Owner: arrmansa
License: apache-2.0
Created: 2021-06-22T22:18:16.000Z (about 4 years ago)
Default Branch: main
Last Pushed: 2021-12-23T06:41:25.000Z (over 3 years ago)
Last Synced: 2025-04-07T01:43:29.257Z (3 months ago)
Topics: gpt, gpt-neo, transformers
Language: Jupyter Notebook
Homepage:
Size: 48.8 KB
Stars: 115
Watchers: 5
Forks: 12
Open Issues: 4
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

stars - arrmansa/Basic-UI-for-GPT-J-6B-with-low-vram - A repository to run gpt-j-6b on low vram machines (4.2 gb minimum vram for 2000 token context, 3.5 gb for 1000 token context). Model loading takes 12gb free ram. (Jupyter Notebook)

README

# Basic-UI-for-GPT-J-6B-with-low-vram
A repository to run GPT-J-6B on low vram systems by using both ram, vram and pinned memory.

## There seem to be some issues with the weights in the drive link. There seems to be some performance loss, most likely because of poor 16 bit conversion.

## How to run :
Use - pip install git+https://github.com/finetuneanon/transformers@gpt-neo-localattention3

Use the link - https://drive.google.com/file/d/1tboTvohQifN6f1JiSV8hnciyNKvj9pvm/view?usp=sharing to dowload the model that has been saved as described here - https://github.com/arrmansa/saving-and-loading-large-models-pytorch

## Timing (2000 token context)
### 1
#### system -

16 gb ddr4 ram . 1070 8gb gpu.

23 blocks on ram (ram_blocks = 23) out of which 18 are on shared/pinned memory (max_shared_ram_blocks = 18).

#### timing -

single run of the model(inputs) takes 6.5 seconds.

35 seconds to generate 25 tokens at 2000 context. (1.4 seconds/token)

### 2
#### system -

16 gb ddr4 ram . 1060 6gb gpu.

26 blocks on ram (ram_blocks = 26) out of which 18 are on shared/pinned memory (max_shared_ram_blocks = 18).

#### timing -

40 seconds to generate 25 tokens at 2000 context. (1.6 seconds/token)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/arrmansa/Basic-UI-for-GPT-J-6B-with-low-vram

Awesome Lists containing this project

README