Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/ridgerchu/matmulfreellm
Implementation for MatMul-free LM.
https://github.com/ridgerchu/matmulfreellm
large-language-model linear-transformer llm
Last synced: 6 days ago
JSON representation
Implementation for MatMul-free LM.
- Host: GitHub
- URL: https://github.com/ridgerchu/matmulfreellm
- Owner: ridgerchu
- License: apache-2.0
- Created: 2024-04-23T20:44:31.000Z (9 months ago)
- Default Branch: master
- Last Pushed: 2024-11-05T23:38:38.000Z (2 months ago)
- Last Synced: 2025-01-02T01:05:06.522Z (13 days ago)
- Topics: large-language-model, linear-transformer, llm
- Language: Python
- Homepage:
- Size: 1.52 MB
- Stars: 2,942
- Watchers: 44
- Forks: 187
- Open Issues: 22
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- awesome - ridgerchu/matmulfreellm - Implementation for MatMul-free LM. (Python)
- StarryDivineSky - ridgerchu/matmulfreellm - Free LM 是一种语言模型架构,无需矩阵乘法 (MatMul) 运算。此存储库提供了与 🤗 Transformers 库兼容的 MatMul-Free LM 实现。我们评估了缩放定律如何拟合 Transformer++ 和我们的模型中的 370M、1.3B 和 2.7B 参数模型。为了公平比较,每个操作的处理方式相同,尽管我们的模型在某些层中使用了更有效的三元权重。有趣的是,与 Transformer++ 相比,我们模型的缩放投影表现出更陡峭的下降,这表明我们的架构在利用额外计算来提高性能方面更有效。 (A01_文本生成_文本对话 / 大语言对话模型及数据)
README
MatMul-Free LM
If you like our project, please give us a star ⭐ on GitHub for the latest updates.
This repo is adapted from flash-linear-attention.
[![hf_model](https://img.shields.io/badge/🤗-Models-blue.svg)](https://huggingface.co/collections/ridger/matmulfree-lm-665f4d2b4e4648756e0dd13c) [![arXiv](https://img.shields.io/badge/Arxiv-2406.02528-b31b1b.svg?logo=arXiv)](https://arxiv.org/abs/2406.02528)
# Introduction
MatMul-Free LM is a language model architecture that eliminates the need for Matrix Multiplication (MatMul) operations. This repository provides an implementation of MatMul-Free LM that is compatible with the 🤗 Transformers library.# Scaling Law
We evaluate how the scaling law fits to the 370M, 1.3B and 2.7B parameter models in both Transformer++ and our model. For a fair comparison, each operation is treated identically, though our model uses more efficient ternary weights in some layers. Interestingly, the scaling projection for our model exhibits a steeper descent compared to Transformer++, suggesting our architecture is more efficient in leveraging additional compute to improve performance.# Installation
The following requirements should be satisfied
- [PyTorch](https://pytorch.org/) >= 2.0
- [Triton](https://github.com/openai/triton) >=2.2
- [einops](https://einops.rocks/)```sh
pip install -U git+https://github.com/ridgerchu/matmulfreellm
```# Usage
## Pre-trained Model Zoo
| Model Size | Layer | Hidden dimension | Trained tokens |
|:----------------|:------------:|:----------------:|:------------------:|
| [370M](https://huggingface.co/ridger/MMfreeLM-370M) | 24 | 1024 | 15B |
| [1.3B](https://huggingface.co/ridger/MMfreeLM-1.3B) | 24 | 2048 | 100B |
| [2.7B](https://huggingface.co/ridger/MMfreeLM-2.7B) | 32 | 2560 | 100B |## Model
We provide the implementations of models that are compatible with 🤗 Transformers library.
Here's an example of how to initialize a model from the default configs in `matmulfreelm`:
This is a huggingface-compatible library that you can use such command to initialize the model with huggingface `AutoModel`:```py
>>> from mmfreelm.models import HGRNBitConfig
>>> from transformers import AutoModel
>>> config = HGRNBitConfig()
>>> AutoModel.from_config(config)
HGRNBitModel(
(embeddings): Embedding(32000, 2048)
(layers): ModuleList(
(0): HGRNBitBlock(
(attn_norm): RMSNorm(2048, eps=1e-06)
(attn): HGRNBitAttention(
(i_proj): FusedBitLinear(
in_features=2048, out_features=2048, bias=False
(norm): RMSNorm(2048, eps=1e-08)
)
(f_proj): FusedBitLinear(
in_features=2048, out_features=2048, bias=False
(norm): RMSNorm(2048, eps=1e-08)
)
(g_proj): FusedBitLinear(
in_features=2048, out_features=2048, bias=False
(norm): RMSNorm(2048, eps=1e-08)
)
(g_norm): FusedRMSNormSwishGate()
(o_proj): FusedBitLinear(
in_features=2048, out_features=2048, bias=False
(norm): RMSNorm(2048, eps=1e-08)
)
)
(mlp_norm): RMSNorm(2048, eps=1e-06)
(mlp): HGRNBitMLP(
(gate_proj): FusedBitLinear(
in_features=2048, out_features=11264, bias=False
(norm): RMSNorm(2048, eps=1e-08)
)
(down_proj): FusedBitLinear(
in_features=5632, out_features=2048, bias=False
(norm): RMSNorm(5632, eps=1e-08)
)
(act_fn): SiLU()
)
)
)
>>>```
## Generation
Upon successfully pretraining a model, it becomes accessible for generating text using the 🤗 text generation APIs.
In the following, we give a generation example in `generate.py`:```py
import os
os.environ["TOKENIZERS_PARALLELISM"] = "false"
import mmfreelm
from transformers import AutoModelForCausalLM, AutoTokenizer
#Change here to our open-sourced model
name = ''
tokenizer = AutoTokenizer.from_pretrained(name)
model = AutoModelForCausalLM.from_pretrained(name).cuda().half()
input_prompt = "In a shocking finding, scientist discovered a herd of unicorns living in a remote, "
input_ids = tokenizer(input_prompt, return_tensors="pt").input_ids.cuda()
outputs = model.generate(input_ids, max_length=32, do_sample=True, top_p=0.4, temperature=0.6)
print(tokenizer.batch_decode(outputs, skip_special_tokens=True)[0])
```# Citation
If you use this repo in your work, please cite our preprint:
```bib
@article{zhu2024scalable,
title={Scalable MatMul-free Language Modeling},
author={Zhu, Rui-Jie and Zhang, Yu and Sifferman, Ethan and Sheaves, Tyler and Wang, Yiqiao and Richmond, Dustin and Zhou, Peng and Eshraghian, Jason K},
journal={arXiv preprint arXiv:2406.02528},
year={2024}
}
```