https://github.com/mydarapy/smollm-experiments-with-grouped-query-attention
(Unofficial) building Hugging Face SmolLM-blazingly fast and small language model with PyTorch implementation of grouped query attention (GQA)
https://github.com/mydarapy/smollm-experiments-with-grouped-query-attention
attention grouped-query-attention huggingface huggingface-smol-lm llm ml-efficiency smol smol-lm transformer
Last synced: about 1 month ago
JSON representation
(Unofficial) building Hugging Face SmolLM-blazingly fast and small language model with PyTorch implementation of grouped query attention (GQA)
- Host: GitHub
- URL: https://github.com/mydarapy/smollm-experiments-with-grouped-query-attention
- Owner: MyDarapy
- Created: 2024-09-19T07:34:16.000Z (7 months ago)
- Default Branch: master
- Last Pushed: 2025-01-11T08:58:27.000Z (3 months ago)
- Last Synced: 2025-02-27T01:53:21.832Z (about 2 months ago)
- Topics: attention, grouped-query-attention, huggingface, huggingface-smol-lm, llm, ml-efficiency, smol, smol-lm, transformer
- Language: Python
- Homepage:
- Size: 439 KB
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
### Improving Sub-billion Scale LLM Design Experiments
Some of the techniques used in the LLM pretraining design include:
- Embedding Sharing
- Grouped Query Design
- SwiGLU Activations for the Multi Perceptron Layer
- Intermidate blockwise weight sharing
![]()