{"id":29264333,"url":"https://github.com/niutrans/lamate","last_synced_at":"2025-07-26T17:37:32.326Z","repository":{"id":302033683,"uuid":"941326980","full_name":"NiuTrans/LaMaTE","owner":"NiuTrans","description":"Beyond Decoder-only: Large Language Models Can be Good Encoders for Machine Translation","archived":false,"fork":false,"pushed_at":"2025-06-30T07:11:05.000Z","size":63,"stargazers_count":22,"open_issues_count":1,"forks_count":2,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-06-30T08:26:51.413Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/NiuTrans.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-03-02T02:50:48.000Z","updated_at":"2025-06-30T07:11:09.000Z","dependencies_parsed_at":"2025-06-30T08:37:04.022Z","dependency_job_id":null,"html_url":"https://github.com/NiuTrans/LaMaTE","commit_stats":null,"previous_names":["niutrans/lamate"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/NiuTrans/LaMaTE","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/NiuTrans%2FLaMaTE","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/NiuTrans%2FLaMaTE/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/NiuTrans%2FLaMaTE/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/NiuTrans%2FLaMaTE/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/NiuTrans","download_url":"https://codeload.github.com/NiuTrans/LaMaTE/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/NiuTrans%2FLaMaTE/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":263540369,"owners_count":23477454,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-07-04T12:30:37.581Z","updated_at":"2025-07-04T12:31:50.074Z","avatar_url":"https://github.com/NiuTrans.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Beyond Decoder-only: Large Language Models Can be Good Encoders for Machine Translation\n\n\u003cp align=\"center\"\u003e\n  \u003ca href=\"https://arxiv.org/abs/2503.06594\" alt=\"paper\"\u003e\u003cimg src=\"https://img.shields.io/badge/Paper-LaMaTE-blue?logo=arxiv\u0026logoColor=white\"/\u003e\u003c/a\u003e\n  \u003ca href=\"https://huggingface.co/NiuTrans/LaMaTE\" alt=\"Model\"\u003e\u003cimg src=\"https://img.shields.io/badge/Model-LaMaTE-yellow?logo=huggingface\"/\u003e\u003c/a\u003e\n  \u003ca href=\"https://huggingface.co/datasets/NiuTrans/ComMT\" alt=\"Dataset\"\u003e\u003cimg src=\"https://img.shields.io/badge/Dataset-ComMT-yellow?logo=huggingface\"/\u003e\u003c/a\u003e\n  \u003ca href=\"https://github.com/NiuTrans\" alt=\"NiuTrans\"\u003e\u003cimg src=\"https://img.shields.io/badge/NiuTrans-blue\"/\u003e\u003c/a\u003e\n  \u003ca href=\"http://team.neu.edu.cn/NEUNLPLab/zh_CN/index.htm\" alt=\"NEUNLP\"\u003e\u003cimg src=\"https://img.shields.io/badge/NEUNLP-blue\"/\u003e\u003c/a\u003e\n\u003c/p\u003e\n\n\n\u003cdiv align=\"center\"\u003e\n\u003cp align=\"center\" dir=\"auto\"\u003e\n\n• 📄 [Introduction](#-introduction) \n• 🤗 [Model and Dataset](#-model-and-dataset)\n• 🚀 [A Quick Start](#-a-quick-start)\n\u003c/p\u003e\n\u003cp align=\"center\" dir=\"auto\"\u003e\n\n• 🔥 [Training](#-training) \n• ⚡ [Inference](#-inference) \n• 📊 [Evaluation](#-evaluation)\n\u003c/p\u003e\n\u003c/div\u003e\n\n# 📄 Introduction\nLaMaTE is a high-performance and efficient translation model that utilizes large language models(LLMs) as machine translation(MT) encoders, paired with lightweight decoders. \nThe model integrates an adapter to bridge LLM representations with the decoder, employing a two-stage training strategy to enhance performance and efficiency.\n\n**Key Features of LaMaTE**\n- Enhanced Efficiency: Offers 2.4× to 6.5× faster decoding speeds.\n- Reduced Memory Usage: Reduces KV cache memory consumption by 75%.\n- Competitive Performance: Exhibits robust performance across diverse translation tasks.\n\nComMT is a comprehensive dataset suite designed to support the development and evaluation of universal translation models. \nIt includes diverse translation-related tasks, providing a well-curated data resource for training and testing LLM-based machine translation systems.\n\n\n# 🤗 Model and Dataset\nWe have made the following resources available:\n\n| Resource         | Description                                         | Link                                                      |\n|------------------|-----------------------------------------------------|-----------------------------------------------------------|\n| LaMaTE    | The LaMaTE model, developed using Llama-3-8B\t  | [🤗NiuTrans/LaMaTE](https://huggingface.co/NiuTrans/LaMaTE) |\n| ComMT    | Dataset suite, includes 239k high-quality, diverse SFT data\t  | [🤗NiuTrans/ComMT](https://huggingface.co/datasets/NiuTrans/ComMT) |\n\n\n# 🚀 A Quick Start\n**Note:** Our implementation is developed with transformers v4.39.2. \nWe recommend installing this version for best compatibility.\n\nTo deploy LaMaTE, utilize the ```from_pretrained()``` method followed by the ```generate()``` method for immediate use:\n\n```python\nfrom modeling_llama_seq2seq import LlamaCrossAttentionEncDec\nfrom transformers import AutoTokenizer, AutoConfig\n\ntokenizer = AutoTokenizer.from_pretrained(model_name_or_path)\nconfig = AutoConfig.from_pretrained(model_name_or_path, trust_remote_code=True)\nmodel = LlamaCrossAttentionEncDec.from_pretrained(model_name_or_path, config=config)\n\nprompt = \"Translate the following text from English into Chinese.\\nEnglish: The harder you work at it, the more progress you will make.\\nChinese: \",\ninput_ids = tokenizer(prompt, return_tensors=\"pt\")\noutputs_tokenized = model.generate(\n    **input_ids,\n    num_beams=5,\n    do_sample=False\n)\noutputs = tokenizer.batch_decode(outputs_tokenized, skip_special_tokens=True)\nprint(outputs) \n```\n\nThe prompt for general/doc/domain translation tasks:\n```\n\"Translate the following text from {src_lang} into {tgt_lang}.\\n{src_lang}: {src}\\n{tgt_lang}: \"\n```\n\nFor terminology-constrained translation tasks:\n\n```\n\"Translate the following text from {src_lang} into {tgt_lang} using the provided terminology pairs, ensuring the specified terms are accurately translated as indicated.\\nTerminology pairs: {term_text}\\n{src_lang}: {src}\\n{tgt_lang}: \"\n```\n\nFor Automatic Post-Editing (APE) tasks:\n```\n\"Improve the following machine-generated translation from {src_lang} to {tgt_lang}. Correct errors and generate a more accurate translation.\\n{src_lang}: {src}\\n{tgt_lang}: {mt_text}\\n{tgt_lang}: \"\n```\n\n# 🔥 Training \nTraining consists of two stages: first, the Adaptor and Decoder are trained using bilingual data; second, all model parameters are fine-tuned using ComMT translation data.\n\nPrepare your data directory as follows:\n\n```\nLaMaTE/\n├── data/\n│   ├── wmt23-sample10M/ # for stage1 training\n│   │   ├── zh-en/\n│   │   │   ├── train.zh-en.general_trans.jsonq\n│   │   │   ├── valid.zh-en.general_trans.json\n│   │   │   ├── test.en-zh.general_trans.wmt23.json\n│   │   │   └── test.zh-en.general_trans.wmt23.json\n│   │   └── de-en/\n│   │       └── xxx\n│   │\n│   │── ComMT/ # for stage2 training\n│   │   ├── zh-en/\n│   │   │   ├── train.zh-en.ape.json\n│   │   │   ├── train.zh-en.doc_trans.json\n│   │   │   ├── train.zh-en.general_trans.json\n│   │   │   └── xxx  # other translation task data\n│   │   └── de-en/\n│   │       └── xxx\n```\n\nMaintain a consistent file names: ```train/valid.${first_lang}-en.${task_type}.json```. \nTest sets should clearly specify the direction of translation. \n\nThe ```task_types``` values are:\n- general_trans\n- doc_trans\n- domain_medical,domain_law,domain_it,domain_literature,domain_colloquial\n- term_con_trans\n- ape\n- context_learning_trans\n\nEach line in the data files represents a sample, labeled according to the task_type key. \nFor more details, refer to [ComMT](https://huggingface.co/datasets/NiuTrans/ComMT).\n\nTo train:\n```\ncd scripts\nbash train_lamate_stage1.sh\nbash train_lamate_stage2.sh\n```\n\nFor training commands and configurations, please follow the provided ```scripts``` in the scripts directory.\n\n# ⚡ Inference \nAfter training, perform batch inference on the ComMT test set:\n\n```\nbash inference_lamate.sh\n```\nResults are saved in ```${model_dir}/decoder_result```.\n\n# 📊 Evaluation\nEvaluate using BLEU and COMET:\n\n```\nbash eval_commt.sh ${decoder_result_dir}\n```\nResults are stored in ```scripts/ComMT_result.xlsx```.\n\n# Reference\nFor more details, please refer to LaMaTE [paper](https://arxiv.org/abs/2503.06594).\n\nEmail: luoyingfeng_neu@outlook.com\n```\n@misc{luoyf2025lamate,\n      title={Beyond Decoder-only: Large Language Models Can be Good Encoders for Machine Translation}, \n      author={Yingfeng Luo and Tong Zheng and Yongyu Mu and Bei Li and Qinghong Zhang and Yongqi Gao and Ziqiang Xu and Peinan Feng and Xiaoqian Liu and Tong Xiao and Jingbo Zhu},\n      year={2025},\n      eprint={2503.06594},\n      archivePrefix={arXiv},\n      primaryClass={cs.CL}\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fniutrans%2Flamate","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fniutrans%2Flamate","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fniutrans%2Flamate/lists"}