{"id":50510423,"url":"https://github.com/Tencent-Hunyuan/HY-MT","last_synced_at":"2026-06-19T14:00:37.357Z","repository":{"id":331054037,"uuid":"1123126254","full_name":"Tencent-Hunyuan/HY-MT","owner":"Tencent-Hunyuan","description":null,"archived":false,"fork":false,"pushed_at":"2026-03-23T09:31:52.000Z","size":6164,"stargazers_count":540,"open_issues_count":17,"forks_count":50,"subscribers_count":12,"default_branch":"main","last_synced_at":"2026-03-24T06:58:49.758Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Tencent-Hunyuan.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"License.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-12-26T08:28:42.000Z","updated_at":"2026-03-24T04:39:19.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/Tencent-Hunyuan/HY-MT","commit_stats":null,"previous_names":["tencent-hunyuan/hy-mt"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/Tencent-Hunyuan/HY-MT","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Tencent-Hunyuan%2FHY-MT","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Tencent-Hunyuan%2FHY-MT/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Tencent-Hunyuan%2FHY-MT/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Tencent-Hunyuan%2FHY-MT/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Tencent-Hunyuan","download_url":"https://codeload.github.com/Tencent-Hunyuan/HY-MT/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Tencent-Hunyuan%2FHY-MT/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":34534278,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-19T02:00:06.005Z","response_time":61,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2026-06-02T20:00:26.252Z","updated_at":"2026-06-19T14:00:37.344Z","avatar_url":"https://github.com/Tencent-Hunyuan.png","language":"Python","funding_links":[],"categories":["🏭 Industrial / Production Model Reports"],"sub_categories":["🔁 Iterative Self-Bootstrapping"],"readme":"\n\u003cp align=\"left\"\u003e\n    \u003ca href=\"README_CN.md\"\u003e中文\u003c/a\u003e\u0026nbsp ｜ English\u003c/a\u003e\n\u003c/p\u003e\n\u003cbr\u003e\u003cbr\u003e\n\n\u003cp align=\"center\"\u003e\n \u003cimg src=\"imgs/hunyuanlogo.png\" width=\"400\"/\u003e \u003cbr\u003e\n\u003c/p\u003e\u003cp\u003e\u003c/p\u003e\n\n\n\u003cp align=\"center\"\u003e\n    🤗\u0026nbsp;\u003ca href=\"https://huggingface.co/collections/tencent/hy-mt15\"\u003e\u003cb\u003eHugging Face\u003c/b\u003e\u003c/a\u003e\u0026nbsp;\u0026nbsp;|\u0026nbsp;\u0026nbsp;\n    \u003cimg src=\"https://avatars.githubusercontent.com/u/109945100?s=200\u0026v=4\" width=\"16\"/\u003e\u0026nbsp;\u003ca href=\"https://modelscope.cn/collections/Tencent-Hunyuan/HY-MT15\"\u003e\u003cb\u003eModelScope\u003c/b\u003e\u003c/a\u003e\u0026nbsp;\u0026nbsp;|\u0026nbsp;\u0026nbsp;\n\u003c/p\u003e\n\n\u003cp align=\"center\"\u003e\n    🖥️\u0026nbsp;\u003ca href=\"https://hunyuan.tencent.com\" style=\"color: red;\"\u003e\u003cb\u003eOfficial Website\u003c/b\u003e\u003c/a\u003e\u0026nbsp;\u0026nbsp;|\u0026nbsp;\u0026nbsp;\n    🕹️\u0026nbsp;\u003ca href=\"https://hunyuan.tencent.com/chat/HunyuanDefault?from=modelSquare\u0026modelId=hunyuan-mt-1.8b\"\u003e\u003cb\u003eDemo\u003c/b\u003e\u003c/a\u003e\u0026nbsp;\u0026nbsp;\u0026nbsp;\u0026nbsp;\n\u003c/p\u003e\n\n\u003cp align=\"center\"\u003e\n    \u003ca href=\"https://github.com/Tencent-Hunyuan/HY-MT\"\u003e\u003cb\u003eGithub\u003c/b\u003e\u003c/a\u003e\n\u003c/p\u003e\n\n\n**NOTICE:**\n\nWe have released the Hy-MT2 series of translation models, offering improved performance and excellent instruction-following capabilities. The link to the new model collection is: https://huggingface.co/collections/tencent/hy-mt2\n\nWe are excited to announce our official partnership with WMT26. We welcome all participants to use our HY-MT model during the competition. Teams that use HY-MT and achieve notable results will be eligible for **cash prizes**. For more details, please contact us at hunyuan@tencent.com.\n\nTo help you get started with HY-MT training more quickly, we have provided a **Training Tutorial**. You can access it via the [link](#Train-with-LLaMA-Factory).\n\n## Model Introduction\n\nHunyuan Translation Model Version 1.5 includes a 1.8B translation model, HY-MT1.5-1.8B, and a 7B translation model, HY-MT1.5-7B. Both models focus on supporting mutual translation across 33 languages and incorporating 5 ethnic and dialect variations. Among them, HY-MT1.5-7B is an upgraded version of our WMT25 championship model, optimized for explanatory translation and mixed-language scenarios, with newly added support for terminology intervention, contextual translation, and formatted translation. Despite having less than one-third the parameters of HY-MT1.5-7B, HY-MT1.5-1.8B delivers translation performance comparable to its larger counterpart, achieving both high speed and high quality. After quantization, the 1.8B model can be deployed on edge devices and support real-time translation scenarios, making it widely applicable.\n\n## Key Features and Advantages\n\n- HY-MT1.5-1.8B achieves the industry-leading performance among models of the same size, surpassing most commercial translation APIs.\n- HY-MT1.5-1.8B supports deployment on edge devices and real-time translation scenarios, offering broad applicability.\n- HY-MT1.5-7B, compared to its September open-source version, has been optimized for annotated and mixed-language scenarios.\n- Both models support terminology intervention, contextual translation, and formatted translation.\n\n## Related News\n* 2025.12.30, we have open-sourced **HY-MT1.5-1.8B** and **HY-MT1.5-7B** on Hugging Face.\n* 2025.9.1, we have open-sourced  **Hunyuan-MT-7B** , **Hunyuan-MT-Chimera-7B** on Hugging Face.\n\u003cbr\u003e\n\n\n## Performance\n\n\u003cdiv align='center'\u003e\n\u003cimg src=\"imgs/overall_performance.png\" width = \"100%\" /\u003e\n\u003c/div\u003e\nYou can refer to our technical report for more experimental results and analysis.\n\n\u003ca href=./HY_MT1_5_Technical_Report.pdf\u003e\u003cb\u003eTechnical Report\u003c/b\u003e \u003c/a\u003e\n\n\u0026nbsp;\n\n## Model Links\n| Model Name  | Description | Download |\n| ----------- | ----------- |-----------\n| HY-MT1.5-1.8B  | Hunyuan 1.8B translation model |🤗 [Model](https://huggingface.co/tencent/HY-MT1.5-1.8B)|\n| HY-MT1.5-1.8B-FP8 | Hunyuan 1.8B translation model, fp8 quant    | 🤗 [Model](https://huggingface.co/tencent/HY-MT1.5-1.8B-FP8)|\n| HY-MT1.5-1.8B-GPTQ-Int4 | Hunyuan 1.8B translation model, int4 quant    | 🤗 [Model](https://huggingface.co/tencent/HY-MT1.5-1.8B-GPTQ-Int4)|\n| HY-MT1.5-1.8B-GGUF | Hunyuan 1.8B translation model, llama.cpp    | 🤗 [Model](https://huggingface.co/tencent/HY-MT1.5-1.8B-GGUF)|\n| HY-MT1.5-7B | Hunyuan 7B translation model    | 🤗 [Model](https://huggingface.co/tencent/HY-MT1.5-7B)|\n| HY-MT1.5-7B-FP8 | Hunyuan 7B translation model, fp8 quant     | 🤗 [Model](https://huggingface.co/tencent/HY-MT1.5-7B-FP8)|\n| HY-MT1.5-7B-GGUF | Hunyuan 7B translation model, llama.cpp    | 🤗 [Model](https://huggingface.co/tencent/HY-MT1.5-7B-GGUF)|\n\n## Prompts\n\n*Note: The following `source_language` and `target_language` should both use the full names of the languages; use the full Chinese names for Chinese instruction and the full English names for English instruction.*\n\n\n### Prompt Template for ZH\u003c=\u003eXX Translation.\n---\n```\n将以下文本翻译为{target_language}，注意只需要输出翻译后的结果，不要额外解释：\n\n{source_text}\n```\n---\n\n### Prompt Template for XX\u003c=\u003eXX Translation, excluding ZH\u003c=\u003eXX.\n---\n```\nTranslate the following segment into {target_language}, without additional explanation.\n\n{source_text}\n```\n---\n\n### Prompt Template for terminology intervention.\n---\n```\n参考下面的翻译：\n{source_term} 翻译成 {target_term}\n\n将以下文本翻译为{target_language}，注意只需要输出翻译后的结果，不要额外解释：\n{source_text}\n```\n---\n\n### Prompt Template for contextual translation.\n---\n```\n{context}\n参考上面的信息，把下面的文本翻译成{target_language}，注意不需要翻译上文，也不要额外解释：\n{source_text}\n\n```\n---\n\n###  Prompt Template for formatted translation.\n---\n```\n将以下\u003csource\u003e\u003c/source\u003e之间的文本翻译为中文，注意只需要输出翻译后的结果，不要额外解释，原文中的\u003csn\u003e\u003c/sn\u003e标签表示标签内文本包含格式信息，需要在译文中相应的位置尽量保留该标签。输出格式为：\u003ctarget\u003estr\u003c/target\u003e\n\n\u003csource\u003e{src_text_with_format}\u003c/source\u003e\n```\n---\n\n\u0026nbsp;\n\n### Use with transformers\nFirst, please install transformers, recommends v4.56.0\n```SHELL\npip install transformers==4.56.0\n```\n\n*!!! If you want to load fp8 model with transformers, you need to change the name\"ignored_layers\" in config.json to \"ignore\" and upgrade the compressed-tensors to compressed-tensors-0.11.0.*\n\nThe following code snippet shows how to use the transformers library to load and apply the model.\n\nwe use tencent/HY-MT1.5-1.8B for example\n\n```python\nfrom transformers import AutoModelForCausalLM, AutoTokenizer\nimport os\n\nmodel_name_or_path = \"tencent/HY-MT1.5-1.8B\"\n\ntokenizer = AutoTokenizer.from_pretrained(model_name_or_path)\nmodel = AutoModelForCausalLM.from_pretrained(model_name_or_path, device_map=\"auto\")  # You may want to use bfloat16 and/or move to GPU here\nmessages = [\n    {\"role\": \"user\", \"content\": \"Translate the following segment into Chinese, without additional explanation.\\n\\nIt’s on the house.\"},\n]\ntokenized_chat = tokenizer.apply_chat_template(\n    messages,\n    tokenize=True,\n    add_generation_prompt=False,\n    return_tensors=\"pt\"\n)\n\noutputs = model.generate(tokenized_chat.to(model.device), max_new_tokens=2048)\noutput_text = tokenizer.decode(outputs[0])\n```\n\nWe recommend using the following set of parameters for inference. Note that our model does not have the default system_prompt.\n\n```json\n{\n  \"top_k\": 20,\n  \"top_p\": 0.6,\n  \"repetition_penalty\": 1.05,\n  \"temperature\": 0.7\n}\n```\n\n\u0026nbsp;\n\nSupported languages:\n| Languages         | Abbr.   | Chinese Names   |\n|-------------------|---------|-----------------|\n| Chinese           | zh      | 中文            |\n| English           | en      | 英语            |\n| French            | fr      | 法语            |\n| Portuguese        | pt      | 葡萄牙语        |\n| Spanish           | es      | 西班牙语        |\n| Japanese          | ja      | 日语            |\n| Turkish           | tr      | 土耳其语        |\n| Russian           | ru      | 俄语            |\n| Arabic            | ar      | 阿拉伯语        |\n| Korean            | ko      | 韩语            |\n| Thai              | th      | 泰语            |\n| Italian           | it      | 意大利语        |\n| German            | de      | 德语            |\n| Vietnamese        | vi      | 越南语          |\n| Malay             | ms      | 马来语          |\n| Indonesian        | id      | 印尼语          |\n| Filipino          | tl      | 菲律宾语        |\n| Hindi             | hi      | 印地语          |\n| Traditional Chinese | zh-Hant| 繁体中文        |\n| Polish            | pl      | 波兰语          |\n| Czech             | cs      | 捷克语          |\n| Dutch             | nl      | 荷兰语          |\n| Khmer             | km      | 高棉语          |\n| Burmese           | my      | 缅甸语          |\n| Persian           | fa      | 波斯语          |\n| Gujarati          | gu      | 古吉拉特语      |\n| Urdu              | ur      | 乌尔都语        |\n| Telugu            | te      | 泰卢固语        |\n| Marathi           | mr      | 马拉地语        |\n| Hebrew            | he      | 希伯来语        |\n| Bengali           | bn      | 孟加拉语        |\n| Tamil             | ta      | 泰米尔语        |\n| Ukrainian         | uk      | 乌克兰语        |\n| Tibetan           | bo      | 藏语            |\n| Kazakh            | kk      | 哈萨克语        |\n| Mongolian         | mn      | 蒙古语          |\n| Uyghur            | ug      | 维吾尔语        |\n| Cantonese         | yue     | 粤语            |\n\n\n### Training Data Format\n\nIf you need to fine-tune our Instruct model, we recommend processing the data into the following format.\n\n```python\n\nmessages = [\n    {\"role\": \"system\", \"content\": \"You are a helpful assistant.\"},\n    {\"role\": \"user\", \"content\": \"Why is seawater salty?\" },\n    {\"role\": \"assistant\", \"content\": \"Seawater is primarily saline due to dissolved salts and minerals. These substances come from the chemical materials in rocks and soil on the Earth's surface, which are carried into the ocean over time. When seawater evaporates, the water vapor leaves, but the salts and minerals remain, making the seawater saltier. Therefore, the salinity of seawater is determined by the amount of salts and minerals it contains.\"}\n]\n\nfrom transformers import AutoTokenizer\ntokenizer = AutoTokenizer.from_pretrained(\"your_tokenizer_path\", trust_remote_code=True)\ntrain_ids = tokenizer.apply_chat_template(messages)\n```\n\n\u0026nbsp;\n\n### Train with LLaMA-Factory\n\nIn the following chapter, we will introduce how to use `LLaMA-Factory` to fine-tune the `Hunyuan` model.\n\n#### Prerequisites\n\nVerify installation of the following dependencies:\n- **LLaMA-Factory**: Follow [official installation guide](https://github.com/hiyouga/LLaMA-Factory)\n- **DeepSpeed** (optional): Follow [official installation guide](https://github.com/deepspeedai/DeepSpeed#installation)\n- **Transformer Library**: Use the companion branch (Hunyuan-submitted code is pending review)\n    ```\n    pip install git+https://github.com/huggingface/transformers@4970b23cedaf745f963779b4eae68da281e8c6ca\n    ```\n\n#### Data preparation\n\nWe need to prepare a custom dataset:\n1. Organize your data in `json` format and place it in the `data` directory in `LLaMA-Factory`. The current implementation uses the `sharegpt` dataset format, which requires the following structure:\n```\n[\n  {\n    \"messages\": [\n      {\n        \"role\": \"system\",\n        \"content\": \"System prompt (optional)\"\n      },\n      {\n        \"role\": \"user\",\n        \"content\": \"Human instruction\"\n      },\n      {\n        \"role\": \"assistant\",\n        \"content\": \"Model response\"\n      }\n    ]\n  }\n]\n```\nRefer to the [Data Format](#training-data-format) section mentioned earlier for details.\n\n2. Define your dataset in the data/dataset_info.json file using the following format:\n```\n\"dataset_name\": {\n  \"file_name\": \"dataset.json\",\n  \"formatting\": \"sharegpt\",\n  \"columns\": {\n    \"messages\": \"messages\"\n  },\n  \"tags\": {\n    \"role_tag\": \"role\",\n    \"content_tag\": \"content\",\n    \"user_tag\": \"user\",\n    \"assistant_tag\": \"assistant\",\n    \"system_tag\": \"system\"\n  }\n}\n```\n\n#### Training execution\n\n1. Copy all files from the `llama_factory_support/example_configs` directory to the `example/hunyuan` directory in `LLaMA-Factory`.\n2. Modify the model path and dataset name in the configuration file `hunyuan_full.yaml`. Adjust other configurations as needed:\n```\n### model\nmodel_name_or_path: [!!!add the model path here!!!]\n\n### dataset\ndataset: [!!!add the dataset name here!!!]\n```\n3. Execute training commands:\n    *​​Single-node training​​\n    Note: Set the environment variable DISABLE_VERSION_CHECK to 1 to avoid version conflicts.\n    ```\n    export DISABLE_VERSION_CHECK=1\n    llamafactory-cli train examples/hunyuan/hunyuan_full.yaml\n    ```\n    *Multi-node training​​\n    Execute the following command on each node. Configure NNODES, NODE_RANK, MASTER_ADDR, and MASTER_PORT according to your environment:\n    ```\n    export DISABLE_VERSION_CHECK=1\n    FORCE_TORCHRUN=1 NNODES=${NNODES} NODE_RANK=${NODE_RANK} MASTER_ADDR=${MASTER_ADDR} MASTER_PORT=${MASTER_PORT} \\\n    llamafactory-cli train examples/hunyuan/hunyuan_full.yaml\n    ```\n\n\u0026nbsp;\n\n\n## Quantization Compression\nWe used our own [AngelSlim](https://github.com/tencent/AngelSlim) compression tool to produce FP8 and INT4 quantization models. `AngelSlim` is a toolset dedicated to creating a more user-friendly, comprehensive and efficient model compression solution.\n\n### FP8 Quantization\nWe use FP8-static quantization, FP8 quantization adopts 8-bit floating point format, through a small amount of calibration data (without training) to pre-determine the quantization scale, the model weights and activation values will be converted to FP8 format, to improve the inference efficiency and reduce the deployment threshold. We you can use AngelSlim quantization, you can also directly download our quantization completed open source model to use [AngelSlim](https://huggingface.co/AngelSlim).\n\n\n## Deployment\n\nFor deployment, you can use frameworks such as **TensorRT-LLM**, **vLLM**, or **SGLang** to serve the model and create an OpenAI-compatible API endpoint.\n\nimage: https://hub.docker.com/r/hunyuaninfer/hunyuan-7B/tags\n\n\n### TensorRT-LLM\n\n#### Docker Image\n\nWe provide a pre-built Docker image based on the latest version of TensorRT-LLM.\n\nWe use tencent/Hunyuan-7B-MT for example\n- To get started:\n\n```\ndocker pull docker.cnb.cool/tencent/hunyuan/hunyuan-7b:hunyuan-7b-trtllm\n```\n```\ndocker run --privileged --user root --name hunyuanLLM_infer --rm -it --ipc=host --ulimit memlock=-1 --ulimit stack=67108864 --gpus=all hunyuaninfer/hunyuan-7b:hunyuan-7b-trtllm\n```\n\n- Prepare Configuration file:\n\n```\ncat \u003e/path/to/extra-llm-api-config.yml \u003c\u003cEOF\nuse_cuda_graph: true\ncuda_graph_padding_enabled: true\ncuda_graph_batch_sizes:\n- 1\n- 2\n- 4\n- 8\n- 16\n- 32\nprint_iter_log: true\nEOF\n```\n\n\n- Start the API server:\n\n\n```\ntrtllm-serve \\\n  /path/to/HunYuan-7b \\\n  --host localhost \\\n  --port 8000 \\\n  --backend pytorch \\\n  --max_batch_size 32 \\\n  --max_num_tokens 16384 \\\n  --tp_size 2 \\\n  --kv_cache_free_gpu_memory_fraction 0.6 \\\n  --trust_remote_code \\\n  --extra_llm_api_options /path/to/extra-llm-api-config.yml\n```\n\n\n### vllm\n\n#### Start\nPlease use vLLM version v0.10.0 or higher for inference.\n\nFirst, please install transformers. We will merge it into the main branch later.\n```SHELL\npip install git+https://github.com/huggingface/transformers@4970b23cedaf745f963779b4eae68da281e8c6ca\n```\n\nWe use tencent/Hunyuan-7B-MT for example\n- Download Model file:\n  - Huggingface:  will download automicly by vllm.\n  - ModelScope: `modelscope download --model Tencent-Hunyuan/Hunyuan-7B-MT`\n\n- model download by huggingface:\n```shell\nexport MODEL_PATH=tencent/Hunyuan-7B-MT\n```\n\n- model downloaded by modelscope:\n```shell\nexport MODEL_PATH=/root/.cache/modelscope/hub/models/Tencent-Hunyuan/Hunyuan-7B-MT/\n```\n\n- Start the API server:\n\n```shell\npython3 -m vllm.entrypoints.openai.api_server \\\n    --host 0.0.0.0 \\\n    --port 8000 \\\n    --trust-remote-code \\\n    --model ${MODEL_PATH} \\\n    --tensor-parallel-size 1 \\\n    --dtype bfloat16 \\\n    --quantization experts_int8 \\\n    --served-model-name hunyuan \\\n    2\u003e\u00261 | tee log_server.txt\n```\n- After running service script successfully, run the request script\n```shell\ncurl http://0.0.0.0:8000/v1/chat/completions -H 'Content-Type: application/json' -d '{\n\"model\": \"hunyuan\",\n\"messages\": [\n    {\n        \"role\": \"system\",\n        \"content\": [{\"type\": \"text\", \"text\": \"You are a helpful assistant.\"}]\n    },\n    {\n        \"role\": \"user\",\n        \"content\": [{\"type\": \"text\", \"text\": \"请按面积大小对四大洋进行排序，并给出面积最小的洋是哪一个？直接输出结果。\"}]\n    }\n],\n\"max_tokens\": 2048,\n\"temperature\":0.7,\n\"top_p\": 0.6,\n\"top_k\": 20,\n\"repetition_penalty\": 1.05,\n\"stop_token_ids\": [127960]\n}'\n```\n#### Quantitative model deployment\nThis section describes the process of deploying a post-quantization model using vLLM.\n\nDefault server in BF16.\n\n##### Int8 quantitative model deployment\nDeploying the Int8-weight-only version of the HunYuan-7B model only requires setting the environment variables\n\nNext we start the Int8 service. Run:\n```shell\npython3 -m vllm.entrypoints.openai.api_server \\\n    --host 0.0.0.0 \\\n    --port 8000 \\\n    --trust-remote-code \\\n    --model ${MODEL_PATH} \\\n    --tensor-parallel-size 1 \\\n    --dtype bfloat16 \\\n    --served-model-name hunyuan \\\n    --quantization experts_int8 \\\n    2\u003e\u00261 | tee log_server.txt\n```\n\n\n##### Int4 quantitative model deployment\nDeploying the Int4-weight-only version of the HunYuan-7B model only requires setting the environment variables , using the GPTQ method\n```shell\nexport MODEL_PATH=PATH_TO_INT4_MODEL\n```\nNext we start the Int4 service. Run\n```shell\npython3 -m vllm.entrypoints.openai.api_server \\\n    --host 0.0.0.0 \\\n    --port 8000 \\\n    --trust-remote-code \\\n    --model ${MODEL_PATH} \\\n    --tensor-parallel-size 1 \\\n    --dtype bfloat16 \\\n    --served-model-name hunyuan \\\n    --quantization gptq_marlin \\\n    2\u003e\u00261 | tee log_server.txt\n```\n\n##### FP8 quantitative model deployment\nDeploying the W8A8C8 version of the HunYuan-7B model only requires setting the environment variables\n\n\nNext we start the FP8 service. Run\n```shell\npython3 -m vllm.entrypoints.openai.api_server \\\n    --host 0.0.0.0 \\\n    --port 8000 \\\n    --trust-remote-code \\\n    --model ${MODEL_PATH} \\\n    --tensor-parallel-size 1 \\\n    --dtype bfloat16 \\\n    --served-model-name hunyuan \\\n    --kv-cache-dtype fp8 \\\n    2\u003e\u00261 | tee log_server.txt\n```\n\n\n\n\n### SGLang\n\n#### Docker Image\n\nWe also provide a pre-built Docker image based on the latest version of SGLang.\n\nWe use tencent/Hunyuan-7B-MT for example\n\nTo get started:\n\n- Pull the Docker image\n\n```\ndocker pull lmsysorg/sglang:latest\n```\n\n- Start the API server:\n\n```\ndocker run --entrypoint=\"python3\" --gpus all \\\n    --shm-size 32g \\\n    -p 30000:30000 \\\n    --ulimit nproc=10000 \\\n    --privileged \\\n    --ipc=host \\\n     lmsysorg/sglang:latest \\\n    -m sglang.launch_server --model-path hunyuan/huanyuan_7B --tp 4 --trust-remote-code --host 0.0.0.0 --port 30000\n```\n\nCiting HY-MT1.5:\n\n```bibtex\n@misc{hy-mt1.5,\n      title={HY-MT1.5 Technical Report}, \n      author={Mao Zheng and Zheng Li and Tao Chen and Mingyang Song and Di Wang},\n      year={2025},\n      eprint={2512.24092},\n      archivePrefix={arXiv},\n      primaryClass={cs.CL},\n      url={https://arxiv.org/abs/2512.24092}, \n}\n```\n\n## Contact Us\n\nIf you would like to leave a message for our R\u0026D and product teams, Welcome to contact our open-source team . You can also contact us via email (hunyuan_opensource@tencent.com).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FTencent-Hunyuan%2FHY-MT","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FTencent-Hunyuan%2FHY-MT","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FTencent-Hunyuan%2FHY-MT/lists"}