{"id":13562862,"url":"https://github.com/microsoft/Samba","last_synced_at":"2025-04-03T19:31:39.159Z","repository":{"id":244118836,"uuid":"811092760","full_name":"microsoft/Samba","owner":"microsoft","description":"[ICLR 2025] Samba: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modeling","archived":false,"fork":false,"pushed_at":"2025-02-19T07:04:06.000Z","size":591,"stargazers_count":857,"open_issues_count":8,"forks_count":47,"subscribers_count":24,"default_branch":"main","last_synced_at":"2025-03-31T00:06:22.930Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"https://arxiv.org/pdf/2406.07522","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/microsoft.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":"SECURITY.md","support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-06-05T23:18:07.000Z","updated_at":"2025-03-28T18:07:35.000Z","dependencies_parsed_at":"2024-06-13T01:39:01.199Z","dependency_job_id":"3ac0db86-f295-40b5-a482-8d1d543067c5","html_url":"https://github.com/microsoft/Samba","commit_stats":{"total_commits":14,"total_committers":5,"mean_commits":2.8,"dds":0.5714285714285714,"last_synced_commit":"6b7935bc7c2e25588a0cbc7bf21b536f13edf0b1"},"previous_names":["microsoft/samba"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/microsoft%2FSamba","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/microsoft%2FSamba/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/microsoft%2FSamba/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/microsoft%2FSamba/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/microsoft","download_url":"https://codeload.github.com/microsoft/Samba/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247065275,"owners_count":20877745,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-08-01T13:01:12.957Z","updated_at":"2025-04-03T19:31:39.142Z","avatar_url":"https://github.com/microsoft.png","language":"Python","funding_links":[],"categories":["Python","A01_文本生成_文本对话","others"],"sub_categories":["大语言对话模型及数据"],"readme":"\n\u003cdiv align=\"center\"\u003e\n  \u003cimg src=\"assets/Samba-pic.webp\" width=\"300\"/\u003e\n\u003c/div\u003e\n\n\n\u003ch1 align=\"left\"\u003e Samba: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modeling\u003c/h1\u003e\n\n[![arXiv](https://img.shields.io/badge/Paper-2406.07522-blue.svg?style=flat-square)](https://arxiv.org/abs/2406.07522)\n\n\nSamba is a simple yet powerful hybrid model with an **unlimited** context length. Its architecture is frustratingly simple: \n\nSamba = Mamba + MLP + Sliding Window Attention + MLP stacking at the layer level.\n\nOur largest model, `Samba-3.8B`, is trained on 3.2 trillion tokens from the Phi3 dataset, outperforming `Phi3-mini` on major benchmarks (e.g. MMLU, GSM8K and HumanEval) by a large margin. Samba can also achieve perfect **long-context** retrieval ability with minimal instruction tuning, while still maintaining its **linear complexity** with respect to sequence length. This ability leads to the impressive performance of `Samba-3.8B-instruct` on downstream tasks such as long-context summarization. \n\n\n## Performance :rocket:\n\u003cdiv align=\"left\"\u003e\n  \u003cimg src=\"assets/ppl.jpg\" width=\"300\"/\u003e\n  \u003cimg src=\"assets/gen_speed.jpg\" width=\"298\"/\u003e\n\u003c/div\u003e\n\n\n| Model                         | MMLU | GSM8K | HumanEval | GovReport | SQuALITY |\n|-------------------------------|------|-------|-----------|-----------|----------|\n| Phi-3-mini-4K-instruct   | 68.8 | 82.5  | 58.5      | 14.4      | **21.6**     |\n| Samba-3.8B-instruct (preview)       | **71.9** | **87.6** | **62.8**      | **18.9**      | 21.2     |\n\nWe report 5-shot accuracy for MMLU, 8-shot CoT accuracy for GSM8K, 0-shot pass@1 for HumanEval and ROUGE-L for both GovReport and SQuALITY.\n## Updates\n- [Jan. 22] Samba has been accepted to ICLR 2025!\n- [Dec. 8] Added the evaluation script and more baseline architectures.\n- [June 11] Released the codebase for training Samba-421M and Samba-1.3B on SlimPajama. \n\n\n## Code Overview\nOur training infrastructure on SlimPajama is a modified version of [TinyLlama](https://github.com/jzhang38/TinyLlama) and [LitGPT](https://github.com/Lightning-AI/litgpt). One can easily specify different architectual configurations through modifying the [`model_name`](pretrain.py#L30) and the [`config file`](lit_gpt/config.py) which includes tons of baseline architectures mentioned in the paper. Our RetNet and GLA implementations are from the awesome [Flash Linear Attention](https://github.com/sustcsonglin/flash-linear-attention) repository.\n\n\n## Pretraining Samba from scratch\nPlease follow the [`Dockerfile`](Dockerfile) to setup the environment. The data preparation mainly follows TinyLlama except that we only use the SlimPajama dataset.\n\n### Data Preparation\n\nDownload the Slimpajama dataset to your chosen directory.\n```bash\ncd /path/to/dataset\ngit lfs install\ngit clone https://huggingface.co/datasets/cerebras/SlimPajama-627B\n```\nThe SlimPajama dataset takes 893GB diskspace. Use the provided scripts to tokenize the datasets and divide them into chunks.\n```bash\npython scripts/prepare_slimpajama.py --source_path /path/to/SlimPajama --tokenizer_path data/llama  --destination_path data/slim --split validation --percentage 1.0\npython scripts/prepare_slimpajama.py --source_path /path/to/SlimPajama --tokenizer_path data/llama  --destination_path data/slim --split train --percentage 1.0\n```\nYou are now ready to launch a job!\n\n### Training\nThe following script trains a default Samba-421M model on a single node of 8 GPUs with 20B tokens.\n```bash\ntorchrun --nnodes=1 --nproc_per_node=8 --rdzv_id=samba-421M --rdzv_backend=c10d  --rdzv_endpoint=${MASTER_ADDR}:${MASTER_PORT} pretrain.py --train_data_dir data/slim --val_data_dir data/slim \n```\nYou can modify [`model_name`](pretrain.py#L33) to \"Samba_1.3B\" and [`train_config`](pretrain.py#L34) to \"tsz512x4k_100B\" for training a Samba-1.3B model with 100B tokens. We assume that you have 8 nodes each with 8 GPUs, and you can modify the number of [`nodes`](pretrain.py#L43) for training on fewer gpus.\n\n### Evaluation\n\nWe leverage [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness) for the evaluation of our pretrained models. We only support non-generation based tasks for now.\n```bash\npip install lm-eval\npython eval.py --model Samba \\\n          --model_args pretrained=path/to/ckpt.pth,config=\"Samba_1.3B\" \\\n          --tasks lambada_openai,arc_easy,winogrande,hellaswag,piqa --device cuda:0 --batch_size 1 --trust_remote_code \n```\n\n\n\n\n## Citation\n\nIf you find our work useful, please consider citing:\n\n```bibtex\n@article{ren2024samba,\n      title={Samba: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modeling}, \n      author={Liliang Ren and Yang Liu and Yadong Lu and Yelong Shen and Chen Liang and Weizhu Chen},\n      journal = {arXiv preprint},\n      year={2024},\n      url={https://arxiv.org/abs/2406.07522}\n}\n```\n\n## Contact\n\nLiliang Ren (liliangren@microsoft.com)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmicrosoft%2FSamba","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmicrosoft%2FSamba","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmicrosoft%2FSamba/lists"}