{"id":15116120,"url":"https://github.com/facebookresearch/MobileLLM","last_synced_at":"2025-09-27T21:31:47.197Z","repository":{"id":247504355,"uuid":"826013014","full_name":"facebookresearch/MobileLLM","owner":"facebookresearch","description":"MobileLLM Optimizing Sub-billion Parameter Language Models for On-Device Use Cases. In ICML 2024.","archived":false,"fork":false,"pushed_at":"2024-11-27T04:20:26.000Z","size":385,"stargazers_count":1222,"open_issues_count":11,"forks_count":67,"subscribers_count":23,"default_branch":"main","last_synced_at":"2025-01-17T11:01:29.450Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/facebookresearch.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-07-09T00:19:08.000Z","updated_at":"2025-01-17T09:42:33.000Z","dependencies_parsed_at":"2024-07-09T05:09:07.960Z","dependency_job_id":"fb4ae3e1-df4d-4b50-a32c-4dfc936bba76","html_url":"https://github.com/facebookresearch/MobileLLM","commit_stats":null,"previous_names":["facebookresearch/mobilellm"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/facebookresearch%2FMobileLLM","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/facebookresearch%2FMobileLLM/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/facebookresearch%2FMobileLLM/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/facebookresearch%2FMobileLLM/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/facebookresearch","download_url":"https://codeload.github.com/facebookresearch/MobileLLM/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":234460504,"owners_count":18836837,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-09-26T01:44:10.423Z","updated_at":"2025-09-27T21:31:47.192Z","avatar_url":"https://github.com/facebookresearch.png","language":"Python","funding_links":[],"categories":["Python","🧠 SOTA 2024-2025: Mobile LLMs \u0026 Multimodal","A01_文本生成_文本对话","Building"],"sub_categories":["🤖 On-Device Large Language Models","大语言对话模型及数据","LLM Models"],"readme":"# MobileLLM\n\nThis repository contains the training code of MobileLLM introduced in our work: \"[MobileLLM: Optimizing Sub-billion Parameter Language Models for On-Device Use Cases](https://arxiv.org/abs/2402.14905)\", published in ICML 2024.\n\nIn this work, we comprehensively consider multiple design factors to obtain high-quality LLMs with fewer than a billion parameters. We integrated (1) SwiGLU activation function, (2) deep and thin architectures, (3) embedding sharing, (4) grouped-query attention to build MobileLLM. MobileLLM-125M/350M attains a remarkable 2.7%/4.3% accuracy boost over preceding 125M/350M SoTA models on zero-shot commonsense reasoning tasks. In our updated version, we further demonstrate that our design philosophy scales effectively to larger models, with SoTA results for MobileLLM-600M/1B/1.5B.\n\n\u003cdiv align=center\u003e\n\u003cimg width=50% src=\"./mobilellm.png\"/\u003e\n\u003c/div\u003e\n\n## News\n- Oct 30, 2024: 🚀 MobileLLM models are publicly available on [HuggingFace](https://huggingface.co/collections/facebook/mobilellm-6722be18cb86c20ebe113e95)\n\n## Citation\n\nIf you find our code useful for your research, please consider citing:\n    \n    @article{liu2024mobilellm,\n        title={MobileLLM: Optimizing Sub-billion Parameter Language Models for On-Device Use Cases},\n        author={Liu, Zechun and Zhao, Changsheng and Iandola, Forrest and Lai, Chen and Tian, Yuandong and Fedorov, Igor and Xiong, Yunyang and Chang, Ernie and Shi, Yangyang and Krishnamoorthi, Raghuraman and others},\n        journal={arXiv preprint arXiv:2402.14905},\n        year={2024}\n    }\n    \n## Run\n\n### Step 1. Requirements:\n* python 3.9, pytorch \u003e= 2.0\n* pip install -r requirement.txt\n   \n### Step 2. Data preprocessing\nDividing a tokenized dataset or tokenize your own dataset, and even distribute it across the total number of training nodes, where each node comprises 1x8 GPUs. Next, organize the data into the following structure: \n- basepath\n  - 1\n    - xxx.jsonl\n  - 2\n    - xxx.jsonl\n  - ...\n  - #nodes\n    - xxx.jsonl\n\nEach line of a jsonl file is a key-value pair of tokenized data {\"token_ids\": [1,2,3,4,...]}. \n\nOur training code is compatible with the data pre-processing method in https://github.com/LLM360/amber-data-prep.\n\n\n### Step 3. Training script\nThe script `pretrain.sh` is provided to initiate training on a 1x8 node setup using torchrun. This script can be modified to adjust the `--nnodes` parameter and other settings to suit different multi-node configurations, such as those using slurm or torchx. The learning rate in the script is for 1x8 node with a batch size of 32. If you increase the number of nodes or the batch size, you need to increase the learning rate linearly.\n\nSteps to run:\n* In `pretrain.sh` file, specify the  `--train_data_local_path` to the pre-processed data in Step 2 and `--input_model_filename` to `./configs/{model_size}/`.\n* Run `bash pretrain.sh `\n\n### Evaluation on Wiki\nDownload the models and update the checkpoint path in eval.sh\n* Run `bash eval.sh `\n\n## Training cost \nIt takes the following number of days to train MobileLLM on 1T tokens using 32 NVIDIA A100 80G GPUs.\n| 125M | 350M | 600M | 1B | 1.5B | \n| --- | --- | --- | --- | --- |\n| ~3 days| ~6 days| ~8 days | ~12 days | ~18 days |\n\n\n## Results on Zero-shot Common Sense Reasoning tasks\n\n### MobileLLM-125M\n\n| model | arc_easy | arc_challenge | boolq | piqa | siqa | hellaswag | obqa | winogrande | avg. |\n| --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |\n| OPT-125M | 41.3 | 25.2 | 57.5 | 62.0 | 41.9 | 31.1 | 31.2 | 50.8 | 42.6 |\n| GPT-neo-125M | 40.7 | 24.8 | 61.3 | 62.5 | 41.9 | 29.7 | 31.6 | 50.7 | 42.9 |\n| Pythia-160M | 40.0 | 25.3 | 59.5 | 62.0 | 41.5 | 29.9 | 31.2 | 50.9 | 42.5 |\n| **MobileLLM-125M** | 43.9 | 27.1 | 60.2 | 65.3 | 42.4 | 38.9 | 39.5 | 53.1 | **46.3** |\n| **MobileLLM-LS-125M** | 45.8 | 28.7 | 60.4 | 65.7 | 42.9 | 39.5 | 41.1 | 52.1 | **47.0** |\n\n### MobileLLM-350M\n\n| model | arc_easy | arc_challenge | boolq | piqa | siqa | hellaswag | obqa | winogrande | avg. |\n| --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |\n| OPT-350M | 41.9 | 25.7 | 54.0 | 64.8 | 42.6 | 36.2 | 33.3 | 52.4 | 43.9 |\n| Pythia-410M | 47.1 | 30.3 | 55.3 | 67.2 | 43.1 | 40.1 | 36.2 | 53.4 | 46.6 |\n| **MobileLLM-350M** | 53.8 | 33.5 | 62.4 | 68.6 | 44.7 | 49.6 | 40.0 | 57.6 | **51.3** |\n| **MobileLLM-LS-350M** | 54.4 | 32.5 | 62.8 | 69.8 | 44.1 | 50.6 | 45.8 | 57.2 | **52.1** | \n\n### MobileLLM-600M\n\n| model | arc_easy | arc_challenge | boolq | piqa | siqa | hellaswag | obqa | winogrande | avg. |\n| --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |\n| Qwen1.5-500M | 54.7 | 32.1 | 46.9 | 68.9 | 46.0 |  48.8 | 37.7 | 55.0 | 48.8 | \n| BLOOM-560M | 43.7 | 27.5 | 53.7 | 65.1 | 42.5 | 36.5 | 32.6 | 52.2 | 44.2 | \n| MobiLlama-800M | 52.0 | 31.7 | 54.6 | 73.0 |  43.3 | 52.3 | 42.5 | 56.3 | 50.7 | \n| **MobileLLM-600M** | 58.1 |  35.8 |  61.0 |  72.3 | 44.9 | 55.9 |  47.9 |  58.6 | **54.3** |  \n\n### MobileLLM-1B\n\n| model | arc_easy | arc_challenge | boolq | piqa | siqa | hellaswag | obqa | winogrande | avg. |\n| --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |\n| Pythia-1B | 49.9 | 30.4 | 58.7 | 69.2 | 43.3 | 47.4 | 38.6 | 52.2 | 48.7 | \n| MobiLlama-1B | 59.7 | 38.4 | 59.2 | 74.5 | 44.9 | 62.0 | 43.7 | 59.0 | 55.2 | \n| Falcon-1B | 59.5 | 38.4 | 63.9 | 74.6 |  44.6 | 62.9 |  45.6 | 60.9 | 56.3 | \n| BLOOM-1.1B | 47.6 | 27.3 | 58.6 | 67.0 | 42.4 | 42.2 | 36.6 | 53.8 | 46.9 | \n| TinyLlama-1.1B | 59.2 | 37.1 | 58.1 | 72.9 | 43.9 | 59.1 | 44.7 | 58.8 | 54.2 | \n| **MobileLLM-1B** | 63.0 |  39.0 |  66.7 |  74.4 | 45.0 |  61.4 | 46.8 | 62.3 | **57.3** |  \n\n### MobileLLM-1.5B\n\n| model | arc_easy | arc_challenge | boolq | piqa | siqa | hellaswag | obqa | winogrande | avg. |\n| --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |\n| GPT-neo-1.3B | 51.3 | 33.0 | 61.8 | 70.9 | 43.7 | 48.6 | 41.2 | 54.5 | 50.6 | \n| OPT-1.3B | 54.4 | 31.7 | 58.4 | 71.5 | 44.7 | 53.7 | 44.6 | 59.1 | 52.3 | \n| BLOOM-1.7B | 50.9 | 31.2 | 61.7 | 70.0 | 43.2 | 47.2 | 36.2 | 56.1 | 49.6 | \n| Qwen1.5-1.8B | 61.1 | 36.5 | 68.3 | 74.1 | 47.2 |  60.4 | 42.9 | 61.2 | 56.5 | \n| GPT-neo-2.7B | 55.8 | 34.3 | 62.4 | 72.9 | 43.6 | 55.6 | 40.0 | 57.9 | 52.8 | \n| OPT-2.7B | 56.6 | 34.6 | 61.8 | 74.5 | 45.6 | 60.2 | 48.2 | 59.6 | 55.1 | \n| Pythia-2.8B | 59.4 | 38.9 | 66.1 |  73.8 | 44.5 | 59.6 | 45.0 | 59.4 | 55.8 | \n| BLOOM-3B | 55.1 | 33.6 | 62.1 | 70.5 | 43.2 | 53.9 | 41.6 | 58.2 | 52.3 | \n| **MobileLLM-1.5B** | 67.5 |  40.9 |  65.7 | 74.8 |  46.4 | 64.5 | 50.5 | 64.7 | **59.4** | \n\n## Acknowledgement\n\nThis code is partially based on HuggingFace [Transformers](https://github.com/huggingface/transformers) repo under [Apache License](https://github.com/huggingface/transformers/blob/main/LICENSE).\n\n## Contact\n\nZechun Liu, Meta Inc (zechunliu at meta dot com)\n\nChangsheng Zhao, Meta Inc (cszhao at meta dot com)\n\n## Relevant Projects\n\nSpinQuant: LLM Quantization with Learned Rotations [[Paper](https://arxiv.org/pdf/2405.16406)] [[Code](https://github.com/facebookresearch/SpinQuant)]\n\nLLM-QAT: Data-Free Quantization Aware Training for Large Language Models [[Paper](https://arxiv.org/pdf/2305.17888)] [[Code](https://github.com/facebookresearch/LLM-QAT)]\n\n## License\n\nMobileLLM is FAIR NC licensed as of now.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffacebookresearch%2FMobileLLM","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ffacebookresearch%2FMobileLLM","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffacebookresearch%2FMobileLLM/lists"}