{"id":25691040,"url":"https://github.com/soniccodes/oracle-head-gpt","last_synced_at":"2025-07-11T16:15:24.207Z","repository":{"id":259675852,"uuid":"879174409","full_name":"SonicCodes/oracle-head-gpt","owner":"SonicCodes","description":"probe for predicting future hiddenstates on gpt-2 vibes....","archived":false,"fork":false,"pushed_at":"2024-10-28T13:31:51.000Z","size":15,"stargazers_count":4,"open_issues_count":0,"forks_count":1,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-02-24T22:52:16.039Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/SonicCodes.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-10-27T07:41:05.000Z","updated_at":"2025-01-06T04:13:28.000Z","dependencies_parsed_at":"2024-10-27T09:35:58.580Z","dependency_job_id":null,"html_url":"https://github.com/SonicCodes/oracle-head-gpt","commit_stats":null,"previous_names":["soniccodes/oracle-head-gpt"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/SonicCodes/oracle-head-gpt","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SonicCodes%2Foracle-head-gpt","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SonicCodes%2Foracle-head-gpt/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SonicCodes%2Foracle-head-gpt/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SonicCodes%2Foracle-head-gpt/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/SonicCodes","download_url":"https://codeload.github.com/SonicCodes/oracle-head-gpt/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SonicCodes%2Foracle-head-gpt/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":264851266,"owners_count":23673234,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-02-24T22:52:20.154Z","updated_at":"2025-07-11T16:15:24.190Z","avatar_url":"https://github.com/SonicCodes.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# oracle-head-gpt\n\nI've always had a feeling hidden states in LLM's hidden states are not just next token predictors as there's no constraint for them to not contain future trajectories, this information is destroyed/redacted by the discreetization or sampling procedure, but it's kinda still there...\n\nto prove this I trained an oracle head that takes a detached hiddenstate and learn residuals to transform it into N+1, N+2, N+3 token's hidden states, Futhermore I froze the lm_head to make sure the oracle is not cheating, loss is computed normally with different shifting masks per position...\n\nI started with open-ai's GPT2 113M checkpoint, used FineWeb-1B subsample with 512 ctx length, and did the training procedure on an A100 with batch size of 16, looking at the results it seems that the loss goes as expected! loss becomes worse, as N, N+1, N+2, N+3\n\n\u003e\u003e I have two versions of this, sighted and blind oracle\n\u003e\u003e 1) Sighted oracle: uses previous hiddenstates to predict the residuals that transform it to the next tokens,\n![image](https://github.com/user-attachments/assets/a9f17153-4665-4249-be9e-165a905b8860)\n\n\u003e\u003e 2) Blind oracle: recieves noise to predict the next residuals that  transform it to the next tokens,\n![image](https://github.com/user-attachments/assets/eae623c3-7801-4cd0-a5e8-289bf84c0133)\n\n\nReason for the blind test is to make sure it's not actually pattern matching what is statistically likely to come after, since on the blind version it doesn't have a context of what it's generating these positional residuals for, but we technically should want it to know as the embedding space is not as structured as typical contrasive ones go...\n\n(I plan to work on this soon, this is a rushed reproduction of a previous work i did , but accidentally deleted)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsoniccodes%2Foracle-head-gpt","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsoniccodes%2Foracle-head-gpt","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsoniccodes%2Foracle-head-gpt/lists"}