https://github.com/soniccodes/oracle-head-gpt
probe for predicting future hiddenstates on gpt-2 vibes....
https://github.com/soniccodes/oracle-head-gpt
Last synced: 11 months ago
JSON representation
probe for predicting future hiddenstates on gpt-2 vibes....
- Host: GitHub
- URL: https://github.com/soniccodes/oracle-head-gpt
- Owner: SonicCodes
- Created: 2024-10-27T07:41:05.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2024-10-28T13:31:51.000Z (over 1 year ago)
- Last Synced: 2025-02-24T22:52:16.039Z (over 1 year ago)
- Language: Python
- Size: 14.6 KB
- Stars: 4
- Watchers: 1
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# oracle-head-gpt
I've always had a feeling hidden states in LLM's hidden states are not just next token predictors as there's no constraint for them to not contain future trajectories, this information is destroyed/redacted by the discreetization or sampling procedure, but it's kinda still there...
to prove this I trained an oracle head that takes a detached hiddenstate and learn residuals to transform it into N+1, N+2, N+3 token's hidden states, Futhermore I froze the lm_head to make sure the oracle is not cheating, loss is computed normally with different shifting masks per position...
I started with open-ai's GPT2 113M checkpoint, used FineWeb-1B subsample with 512 ctx length, and did the training procedure on an A100 with batch size of 16, looking at the results it seems that the loss goes as expected! loss becomes worse, as N, N+1, N+2, N+3
>> I have two versions of this, sighted and blind oracle
>> 1) Sighted oracle: uses previous hiddenstates to predict the residuals that transform it to the next tokens,

>> 2) Blind oracle: recieves noise to predict the next residuals that transform it to the next tokens,

Reason for the blind test is to make sure it's not actually pattern matching what is statistically likely to come after, since on the blind version it doesn't have a context of what it's generating these positional residuals for, but we technically should want it to know as the embedding space is not as structured as typical contrasive ones go...
(I plan to work on this soon, this is a rushed reproduction of a previous work i did , but accidentally deleted)