{"id":30141278,"url":"https://github.com/edwinhere/surgery","last_synced_at":"2025-10-26T04:18:34.084Z","repository":{"id":289996244,"uuid":"973058478","full_name":"edwinhere/surgery","owner":"edwinhere","description":"My LLM surgeries. Ablation studies.","archived":false,"fork":false,"pushed_at":"2025-04-26T07:08:54.000Z","size":11,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-04-26T08:19:01.028Z","etag":null,"topics":["ablation","llm"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/edwinhere.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-04-26T06:57:20.000Z","updated_at":"2025-04-26T07:10:52.000Z","dependencies_parsed_at":"2025-04-26T08:19:05.667Z","dependency_job_id":"a2660c31-1141-463b-98b1-9d043eb96a6c","html_url":"https://github.com/edwinhere/surgery","commit_stats":null,"previous_names":["edwinhere/surgery"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/edwinhere/surgery","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/edwinhere%2Fsurgery","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/edwinhere%2Fsurgery/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/edwinhere%2Fsurgery/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/edwinhere%2Fsurgery/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/edwinhere","download_url":"https://codeload.github.com/edwinhere/surgery/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/edwinhere%2Fsurgery/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":269832167,"owners_count":24482301,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-08-11T02:00:10.019Z","response_time":75,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ablation","llm"],"created_at":"2025-08-11T04:36:17.117Z","updated_at":"2025-10-26T04:18:29.053Z","avatar_url":"https://github.com/edwinhere.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# TL;DR:\n\n`W_k` and `W_v` are so amazingly low dimensional spaces if v-stacked and PCA fit together. With just 30% of the variance explained using principal components, the LLM still performs reasonably well.\n\nNo wonder the DeepSeek team added a latent matrix for `W_kv` to generate `W_k` and `W_v`. `K` and `V` is a nothing burger.\n\nAlso this PCA trick is a way to quickly find architectural improvements to LLM without using fine tuning and blind explorations.\n\n# Project Surgery\n\nThis project explores the effects of dimensionality reduction on TinyLlama by replacing the Wk, Wq, and Wv attention matrices with reconstructed versions from their PCA components.\n\n## Trivial Result\n\nIn the `pca` branch of this repository, the Wk, Wq, and Wv matrices of TinyLlama are replaced with matrices reconstructed from their PCA components that explain different amounts of the original variance:\n- 99% of original variance ![image](https://github.com/user-attachments/assets/2b0b07a8-a949-46e4-ae37-620cbffce119)\n- 97.5% of original variance ![image](https://github.com/user-attachments/assets/000ec881-07ca-49b5-9cef-661abbb132e5)\n- 95% of original variance ![image](https://github.com/user-attachments/assets/d5fea694-6efb-4e4d-9eef-855a96e56052)\n\nAs the variance threshold decreases, the model's performance progressively degrades, demonstrating the trade-offs between model compression and capabilities.\n\n## Interesting Result\n\nIn the branch `shared_kv_pca` of this repository For each layer:\n\n1. The code now groups parameters by layer name (e.g., `model.layers.0.self_attn`)\n2. For each layer:\n   - `q_proj` is processed separately with its own PCA\n   - `k_proj` and `v_proj` are processed together with a shared PCA:\n     - The matrices are stacked vertically using `np.vstack`\n     - A single PCA is fit on the combined matrix\n     - The same PCA components are then used to transform and reconstruct each matrix separately\n\nThis approach ensures that `k_proj` and `v_proj` matrices use the same principal components, potentially capturing related patterns across both matrices while still maintaining their individual characteristics.\n\n- 99% of original variance ![image](https://github.com/user-attachments/assets/99f485bf-b3d0-4903-969f-6c0dd6f61628)\n- 90% of original variance ![image](https://github.com/user-attachments/assets/01807b82-12dd-4ada-9c95-5fda7a538f06)\n- 80% of original variance ![image](https://github.com/user-attachments/assets/196542bf-6de6-49ab-880d-f0946fa66e90)\n- 70% of original variance ![image](https://github.com/user-attachments/assets/d43d9c8d-a260-4905-8d78-0be3933f645f)\n- 60% of original variance ![image](https://github.com/user-attachments/assets/0c1e907a-1fe1-4afd-999d-727c53197dd2)\n- 50% of original variance ![image](https://github.com/user-attachments/assets/c1c28515-47cb-4265-90ef-d9f8f54cbf42)\n- 40% of original variance ![image](https://github.com/user-attachments/assets/ba58fe9e-c5b0-4200-b35d-7b1dc5b85dc9)\n- 30% of original variance ![image](https://github.com/user-attachments/assets/ad56c5a6-629d-45af-8d8e-09fd11986db9)\n- 20% of original variance ![image](https://github.com/user-attachments/assets/8906e8d8-d106-4e83-a2fe-c929d30c9dba)\n- 10% of original variance ![image](https://github.com/user-attachments/assets/a9b59bd0-c9b7-49d6-ac01-0ebe75ec51ac)\n\nNotice how when we `np.vstack`, the `k_proj` and `v_proj` matrices and calculate a shared PCA used to reconstruct `k_proj` and `v_proj` the LLM is much more resilient to PCA variance threshold, than the Trivial Result of using separate PCA for `k_proj` and `v_proj`.\n\n## Conclusion \u0026 Further Direction\n\nNotice how this method requires no training, fine-tuning or backpropagation. It runs very fast on CPU. Perhaps this is how teams competing to make foundation models quickly derive their insight that latent matrices which help create `W_k` and `W_v` share structure. Also it seems like KV has a lot more redundancy than I thought. Perhaps the next step is to reconstruct from PCA components shared between layers. Requesting for comments.\n\n## Setup\n\nMake sure you have Python and [uv](https://github.com/astral-sh/uv) installed.\n\n## Usage\n\n### Run with default settings\n\n```bash\nmake run\n```\n\nThis runs the script with the default variance threshold (90%).\n\n### Run with custom variance threshold\n\n```bash\nmake run VARIANCE_THRESHOLD=0.95\n```\n\nYou can specify any variance threshold between 0 and 1.\n\n### Development mode\n\nTo automatically rerun the script when Python files change:\n\n```bash\nmake watch\n```\n\nYou can also specify a custom variance threshold in watch mode:\n\n```bash\nmake watch VARIANCE_THRESHOLD=0.99\n```\n\n### Installation only\n\nTo just create the virtual environment and install dependencies:\n\n```bash\nmake install\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fedwinhere%2Fsurgery","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fedwinhere%2Fsurgery","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fedwinhere%2Fsurgery/lists"}