{"id":24663582,"url":"https://github.com/microsoft/encoder-decoder-slm","last_synced_at":"2025-03-21T08:26:31.313Z","repository":{"id":274113913,"uuid":"921929217","full_name":"microsoft/encoder-decoder-slm","owner":"microsoft","description":"Efficient encoder-decoder architecture for small language models (≤1B parameters) with cross-architecture knowledge distillation and vision-language capabilities","archived":false,"fork":false,"pushed_at":"2025-02-07T19:15:46.000Z","size":989,"stargazers_count":23,"open_issues_count":1,"forks_count":1,"subscribers_count":6,"default_branch":"main","last_synced_at":"2025-03-13T01:20:55.641Z","etag":null,"topics":["decoder-only","encoder-decoder","llm","vision-and-language"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/microsoft.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2025-01-24T22:18:36.000Z","updated_at":"2025-03-11T15:22:08.000Z","dependencies_parsed_at":"2025-01-25T01:29:20.656Z","dependency_job_id":null,"html_url":"https://github.com/microsoft/encoder-decoder-slm","commit_stats":null,"previous_names":["microsoft/encoder-decoder-slm"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/microsoft%2Fencoder-decoder-slm","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/microsoft%2Fencoder-decoder-slm/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/microsoft%2Fencoder-decoder-slm/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/microsoft%2Fencoder-decoder-slm/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/microsoft","download_url":"https://codeload.github.com/microsoft/encoder-decoder-slm/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":244762517,"owners_count":20506285,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["decoder-only","encoder-decoder","llm","vision-and-language"],"created_at":"2025-01-26T05:15:19.742Z","updated_at":"2025-03-21T08:26:31.306Z","avatar_url":"https://github.com/microsoft.png","language":"Python","readme":"# Return of the Encoder: Efficient Small Language Models\nCode and Models of the Paper [Return of the Encoder](https://arxiv.org/pdf/2501.16273)\n\n## Overview\nWhile large language models continue to grow in size, smaller models (≤1B parameters) require thoughtful architectural decisions. Our work demonstrates that encoder-decoder models inherently outperform decoder-only architectures before any optimizations:\n\n- Base encoder-decoder achieves +2-4% performance improvement across tasks\n- After knowledge distillation, performance gains increase to +6-8%\n- Significantly more efficient than decoder-only counterparts:\n - 📉 47% lower first-token latency\n - 🚀 4.7x higher throughput on edge devices\n - 💾 11-16% less memory usage\n - ⚡ 22% fewer FLOPs for sequence generation\n\nWe note that our work focuses on architectural comparisons rather than competing with recent SLM developments (e.g., SmolLM, MobileLLM). Our analysis isolates the fundamental advantages of encoder-decoder versus decoder-only designs in sub-1B parameter regimes, with particular emphasis on deployment efficiency.\n\n![Architectural Comparison](IntroFigure.png)\n*\nArchitectural Efficiency in SLMs. Left: Comparison of architectures where encoder-decoder creates a fixed input representation with KV cache only for output, while decoder-only requires growing KV caches for both input and output. Top right: Inference time scaling with input length, showing encoder-decoder's efficient fixed-representation approach versus decoder-only's steeper computational growth. Bottom right: Performance across tasks showing encoder-decoder's advantages at fixed compute budget, further enhanced by KD.*\n\n## Technical Highlights\n- **Efficient Base Architecture**: 2/3-1/3 encoder-decoder split consistently outperforms decoder-only\n- **Enhanced Performance**: Knowledge distillation from larger teachers while maintaining architectural benefits\n- **Hardware Efficiency**: Superior across GPU (86ms), CPU (1591ms), and NPU (189ms) platforms\n\n## Performance\nOur 330M parameter model outperforms decoder-only baselines (given same training data \u0026 FLOPs):\n- SQuAD 2.0: 0.69/0.94 vs 0.57/0.90\n- IELTS: 0.32/0.46 vs 0.31/0.40\n- CodeXGLUE: 0.93/0.74 vs 0.93/0.63\n- XSum: 0.27/0.20 vs 0.24/0.19\nWe also show that results continue as we scale the models up to 1B parameters.\n\n## Usage\n### Package Installation\nStart by creating a virtual conda environment using python==3.10, and install necessary packages.\n```bash\ncd encoder-decoder-slm\nconda create -n slm_env python=3.10 -y\nconda activate slm_env\npip install --upgrade pip\npip install -e .\n```\n\n### Text2text Inference\nWe provide example inference code for a text2text encoder-decoder model that is trained for QA+context. Feel free to modify the `question` and `context` values in `src/mu/generate_text2text.py` if you want to try other examples.\n```bash\ncd encoder-decoder-slm\npython -m mu.generate_text2text\n```\n\n### Text+image2text Inference\nWe provide example inference code for a text+image2text encoder-decoder model that is trained for VQA. Several images are included under `artifacts/images` for you to try. You can modify the `image_file` and `question` values in `src/mu/generate_text+image2text.py` if you want to try other examples.\n```bash\ncd encoder-decoder-slm\npython -m mu.generate_text+image2text\n```\n\n### Training\nRun KD training using the following command\n```bash\ncd encoder-decoder-slm\ntorchrun --nproc_per_node=${GPU_COUNT} -m mu.train_text2text_by_kd\n```\nNote that the KD training code references a 'teacher.pt' (which should be placed at `artifacts/models/teacher.pt`) that is a Phi-3-mini with a LoRA finetuned on Squad-v2.0 available on Hugging Face.\n\n\n\n\n⭐ Star this repository to get notified when we release the rest of the codes and models!\n\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmicrosoft%2Fencoder-decoder-slm","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmicrosoft%2Fencoder-decoder-slm","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmicrosoft%2Fencoder-decoder-slm/lists"}