{"id":13589770,"url":"https://github.com/deepseek-ai/DeepSeek-V2","last_synced_at":"2025-04-08T09:34:13.710Z","repository":{"id":238506998,"uuid":"790044051","full_name":"deepseek-ai/DeepSeek-V2","owner":"deepseek-ai","description":"DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model","archived":false,"fork":false,"pushed_at":"2024-09-25T10:23:55.000Z","size":2375,"stargazers_count":3509,"open_issues_count":64,"forks_count":148,"subscribers_count":29,"default_branch":"main","last_synced_at":"2024-10-23T04:40:33.579Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":null,"has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/deepseek-ai.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE-CODE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-04-22T06:55:47.000Z","updated_at":"2024-10-23T03:51:30.000Z","dependencies_parsed_at":"2024-06-07T14:52:29.807Z","dependency_job_id":"9e9fc1cf-c827-4dbf-8623-320064835bbd","html_url":"https://github.com/deepseek-ai/DeepSeek-V2","commit_stats":null,"previous_names":["deepseek-ai/deepseek-v2"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/deepseek-ai%2FDeepSeek-V2","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/deepseek-ai%2FDeepSeek-V2/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/deepseek-ai%2FDeepSeek-V2/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/deepseek-ai%2FDeepSeek-V2/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/deepseek-ai","download_url":"https://codeload.github.com/deepseek-ai/DeepSeek-V2/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":223314363,"owners_count":17125064,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-08-01T16:00:34.244Z","updated_at":"2024-11-06T09:31:35.303Z","avatar_url":"https://github.com/deepseek-ai.png","language":null,"funding_links":[],"categories":["⭐ Overview of Mainstream LLMs","A01_文本生成_文本对话","Models","📦 Legacy \u0026 Inactive Projects","Others","开源开放的基础大模型列表","Repos"],"sub_categories":["大语言对话模型及数据","Foundation Models"],"readme":"\u003c!-- markdownlint-disable first-line-h1 --\u003e\n\u003c!-- markdownlint-disable html --\u003e\n\u003c!-- markdownlint-disable no-duplicate-header --\u003e\n\n\u003cdiv align=\"center\"\u003e\n  \u003cimg src=\"https://github.com/deepseek-ai/DeepSeek-V2/blob/main/figures/logo.svg?raw=true\" width=\"60%\" alt=\"DeepSeek-V2\" /\u003e\n\u003c/div\u003e\n\u003chr\u003e\n\u003cdiv align=\"center\" style=\"line-height: 1;\"\u003e\n  \u003ca href=\"https://www.deepseek.com/\" target=\"_blank\" style=\"margin: 2px;\"\u003e\n    \u003cimg alt=\"Homepage\" src=\"https://github.com/deepseek-ai/DeepSeek-V2/blob/main/figures/badge.svg?raw=true\" style=\"display: inline-block; vertical-align: middle;\"/\u003e\n  \u003c/a\u003e\n  \u003ca href=\"https://chat.deepseek.com/\" target=\"_blank\" style=\"margin: 2px;\"\u003e\n    \u003cimg alt=\"Chat\" src=\"https://img.shields.io/badge/🤖%20Chat-DeepSeek%20V2-536af5?color=536af5\u0026logoColor=white\" style=\"display: inline-block; vertical-align: middle;\"/\u003e\n  \u003c/a\u003e\n  \u003ca href=\"https://huggingface.co/deepseek-ai\" target=\"_blank\" style=\"margin: 2px;\"\u003e\n    \u003cimg alt=\"Hugging Face\" src=\"https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-DeepSeek%20AI-ffc107?color=ffc107\u0026logoColor=white\" style=\"display: inline-block; vertical-align: middle;\"/\u003e\n  \u003c/a\u003e\n\u003c/div\u003e\n\n\u003cdiv align=\"center\" style=\"line-height: 1;\"\u003e\n  \u003ca href=\"https://discord.gg/Tc7c45Zzu5\" target=\"_blank\" style=\"margin: 2px;\"\u003e\n    \u003cimg alt=\"Discord\" src=\"https://img.shields.io/badge/Discord-DeepSeek%20AI-7289da?logo=discord\u0026logoColor=white\u0026color=7289da\" style=\"display: inline-block; vertical-align: middle;\"/\u003e\n  \u003c/a\u003e\n  \u003ca href=\"https://github.com/deepseek-ai/DeepSeek-V2/blob/main/figures/qr.jpeg?raw=true\" target=\"_blank\" style=\"margin: 2px;\"\u003e\n    \u003cimg alt=\"Wechat\" src=\"https://img.shields.io/badge/WeChat-DeepSeek%20AI-brightgreen?logo=wechat\u0026logoColor=white\" style=\"display: inline-block; vertical-align: middle;\"/\u003e\n  \u003c/a\u003e\n  \u003ca href=\"https://twitter.com/deepseek_ai\" target=\"_blank\" style=\"margin: 2px;\"\u003e\n    \u003cimg alt=\"Twitter Follow\" src=\"https://img.shields.io/badge/Twitter-deepseek_ai-white?logo=x\u0026logoColor=white\" style=\"display: inline-block; vertical-align: middle;\"/\u003e\n  \u003c/a\u003e\n\u003c/div\u003e\n\n\u003cdiv align=\"center\" style=\"line-height: 1;\"\u003e\n  \u003ca href=\"https://github.com/deepseek-ai/DeepSeek-V2/blob/main/LICENSE-CODE\" style=\"margin: 2px;\"\u003e\n    \u003cimg alt=\"Code License\" src=\"https://img.shields.io/badge/Code_License-MIT-f5de53?\u0026color=f5de53\" style=\"display: inline-block; vertical-align: middle;\"/\u003e\n  \u003c/a\u003e\n  \u003ca href=\"https://github.com/deepseek-ai/DeepSeek-V2/blob/main/LICENSE-MODEL\" style=\"margin: 2px;\"\u003e\n    \u003cimg alt=\"Model License\" src=\"https://img.shields.io/badge/Model_License-Model_Agreement-f5de53?\u0026color=f5de53\" style=\"display: inline-block; vertical-align: middle;\"/\u003e\n  \u003c/a\u003e\n\u003c/div\u003e\n\n\u003cp align=\"center\"\u003e\n  \u003ca href=\"#2-model-downloads\"\u003eModel Download\u003c/a\u003e |\n  \u003ca href=\"#3-evaluation-results\"\u003eEvaluation Results\u003c/a\u003e |\n  \u003ca href=\"#4-model-architecture\"\u003eModel Architecture\u003c/a\u003e |\n  \u003ca href=\"#6-api-platform\"\u003eAPI Platform\u003c/a\u003e |\n  \u003ca href=\"#8-license\"\u003eLicense\u003c/a\u003e |\n  \u003ca href=\"#9-citation\"\u003eCitation\u003c/a\u003e\n\u003c/p\u003e\n\n\u003cp align=\"center\"\u003e\n  \u003ca href=\"https://arxiv.org/abs/2405.04434\"\u003e\u003cb\u003ePaper Link\u003c/b\u003e👁️\u003c/a\u003e\n\u003c/p\u003e\n\n# DeepSeek-V2:  A Strong, Economical, and Efficient Mixture-of-Experts Language Model\n\n## 1. Introduction\nToday, we’re introducing DeepSeek-V2, a strong Mixture-of-Experts (MoE) language model characterized by economical training and efficient inference. It comprises 236B total parameters, of which 21B are activated for each token. Compared with DeepSeek 67B, DeepSeek-V2 achieves stronger performance, and meanwhile saves 42.5% of training costs, reduces the KV cache by 93.3%, and boosts the maximum generation throughput to 5.76 times. \n\n\u003cp align=\"center\"\u003e\n\u003cdiv style=\"display: flex; justify-content: center;\"\u003e\n    \u003cimg src=\"https://github.com/deepseek-ai/DeepSeek-V2/blob/main/figures/activationparameters.png?raw=true\" style=\"height:300px; width:auto; margin-right:10px\"\u003e\n    \u003cimg src=\"https://github.com/deepseek-ai/DeepSeek-V2/blob/main/figures/trainingcost.png?raw=true\" style=\"height:300px; width:auto; margin-left:10px\"\u003e\n\u003c/div\u003e\n\u003c/p\u003e\n\nWe pretrained DeepSeek-V2 on a diverse and high-quality corpus comprising 8.1 trillion tokens. This comprehensive pretraining was followed by a process of Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL) to fully unleash the model's capabilities. The evaluation results validate the effectiveness of our approach as DeepSeek-V2 achieves remarkable performance on both standard benchmarks and open-ended generation evaluation.\n\n## 2. News\n\n- 2024.05.16: We released the DeepSeek-V2-Lite.\n- 2024.05.06: We released the DeepSeek-V2.\n\n## 3. Model Downloads\n\n\u003cdiv align=\"center\"\u003e\n\n| **Model** | **#Total Params** | **#Activated Params** | **Context Length** | **Download** |\n| :------------: | :------------: | :------------: | :------------: | :------------: |\n| DeepSeek-V2-Lite | 16B | 2.4B | 32k   | [🤗 HuggingFace](https://huggingface.co/deepseek-ai/DeepSeek-V2-Lite)   |\n| DeepSeek-V2-Lite-Chat (SFT)   | 16B | 2.4B | 32k   | [🤗 HuggingFace](https://huggingface.co/deepseek-ai/DeepSeek-V2-Lite-Chat)   |\n| DeepSeek-V2   | 236B | 21B |  128k   | [🤗 HuggingFace](https://huggingface.co/deepseek-ai/DeepSeek-V2)   |\n| DeepSeek-V2-Chat (RL)   | 236B | 21B |  128k   | [🤗 HuggingFace](https://huggingface.co/deepseek-ai/DeepSeek-V2-Chat)   |\n\n\u003c/div\u003e\n\nDue to the constraints of HuggingFace, the open-source code currently experiences slower performance than our internal codebase when running on GPUs with Huggingface. To facilitate the efficient execution of our model, we offer a dedicated vllm solution that optimizes performance for running our model effectively.\n\n## 4. Evaluation Results\n### Base Model\n#### Standard Benchmark (Models larger than 67B)\n\n\u003cdiv align=\"center\"\u003e\n\n| **Benchmark** | **Domain** | **LLaMA3 70B** | **Mixtral 8x22B** | **DeepSeek-V1 (Dense-67B)** | **DeepSeek-V2 (MoE-236B)** |\n|:-----------:|:--------:|:------------:|:---------------:|:-------------------------:|:------------------------:|\n| **MMLU** | English | 78.9 | 77.6 | 71.3 | 78.5 |\n| **BBH** | English | 81.0 | 78.9 | 68.7 | 78.9 |\n| **C-Eval** | Chinese | 67.5 | 58.6 | 66.1 | 81.7 |\n| **CMMLU** | Chinese | 69.3 | 60.0 | 70.8 | 84.0 |\n| **HumanEval** | Code | 48.2\t| 53.1 | 45.1 | 48.8 |\n| **MBPP** | Code | 68.6 | 64.2 | 57.4 | 66.6 |\n| **GSM8K** | Math | 83.0 | 80.3 | 63.4 | 79.2 |\n| **Math** | Math | 42.2 | 42.5 | 18.7 | 43.6 |\n\n\u003c/div\u003e\n\n#### Standard Benchmark (Models smaller than 16B)\n\u003cdiv align=\"center\"\u003e\n\n| **Benchmark** | **Domain** | **DeepSeek 7B (Dense)** | **DeepSeekMoE 16B** | **DeepSeek-V2-Lite (MoE-16B)** |\n|:-------------:|:----------:|:--------------:|:-----------------:|:--------------------------:|\n| **Architecture**      | -    | MHA+Dense           | MHA+MoE              | MLA+MoE                       |\n| **MMLU**      | English    | 48.2           | 45.0              | 58.3                       |\n| **BBH**       | English    | 39.5           | 38.9              | 44.1                       |\n| **C-Eval**    | Chinese    | 45.0           | 40.6              | 60.3                       |\n| **CMMLU**     | Chinese    | 47.2           | 42.5              | 64.3                       |\n| **HumanEval** | Code       | 26.2           | 26.8              | 29.9                       |\n| **MBPP**      | Code       | 39.0           | 39.2              | 43.2                       |\n| **GSM8K**     | Math       | 17.4           | 18.8              | 41.1                       |\n| **Math**      | Math       | 3.3            | 4.3               | 17.1                       |\n\n\u003c/div\u003e\nFor more evaluation details, such as few-shot settings and prompts, please check our paper. \n\n#### Context Window\n\u003cp align=\"center\"\u003e\n  \u003cimg width=\"80%\" src=\"https://github.com/deepseek-ai/DeepSeek-V2/blob/main/figures/niah.png?raw=true\"\u003e\n\u003c/p\u003e\n\nEvaluation results on the ``Needle In A Haystack`` (NIAH) tests.  DeepSeek-V2 performs well across all context window lengths up to **128K**. \n\n### Chat Model\n#### Standard Benchmark (Models larger than 67B)\n\u003cdiv align=\"center\"\u003e\n\n| Benchmark | Domain         | QWen1.5 72B Chat | Mixtral 8x22B | LLaMA3 70B Instruct | DeepSeek-V1 Chat (SFT) | DeepSeek-V2 Chat (SFT) | DeepSeek-V2 Chat (RL) |\n|:-----------:|:----------------:|:------------------:|:---------------:|:---------------------:|:-------------:|:-----------------------:|:----------------------:|\n| **MMLU**      | English        | 76.2             | 77.8          | 80.3                | 71.1        | 78.4                 | 77.8                 |\n| **BBH**       | English        | 65.9             | 78.4          | 80.1                | 71.7        | 81.3                 | 79.7                 |\n| **C-Eval**    | Chinese        | 82.2             | 60.0          | 67.9                | 65.2        | 80.9                 | 78.0                 |\n| **CMMLU**     | Chinese        | 82.9             | 61.0          | 70.7                | 67.8        | 82.4                 | 81.6                 |\n| **HumanEval** | Code           | 68.9             | 75.0          | 76.2                | 73.8        | 76.8                 | 81.1                 |\n| **MBPP**      | Code           | 52.2             | 64.4          | 69.8                | 61.4        | 70.4                 | 72.0                 |\n|   **LiveCodeBench  (0901-0401)**     | Code       | 18.8          | 25.0                | 30.5        | 18.3                 | 28.7                 | 32.5                 |\n| **GSM8K**     | Math           | 81.9             | 87.9          | 93.2                | 84.1        | 90.8                 | 92.2                 |\n| **Math**      | Math           | 40.6             | 49.8          | 48.5                | 32.6        | 52.7                 | 53.9                 |\n\n\u003c/div\u003e\n\n#### Standard Benchmark (Models smaller than 16B)\n\n\u003cdiv align=\"center\"\u003e\n\n| Benchmark | Domain         | DeepSeek 7B Chat (SFT) | DeepSeekMoE 16B Chat (SFT) | DeepSeek-V2-Lite 16B Chat (SFT) |\n|:-----------:|:----------------:|:------------------:|:---------------:|:---------------------:|\n| **MMLU**      | English        | 49.7             | 47.2          | 55.7                |\n| **BBH**       | English        | 43.1             | 42.2          | 48.1                |\n| **C-Eval**    | Chinese        | 44.7             | 40.0          | 60.1                |\n| **CMMLU**     | Chinese        | 51.2             | 49.3          | 62.5                |\n| **HumanEval** | Code           | 45.1             | 45.7          | 57.3                |\n| **MBPP**      | Code           | 39.0             | 46.2          | 45.8                |\n| **GSM8K**     | Math           | 62.6             | 62.2          | 72.0                |\n| **Math**      | Math           | 14.7             | 15.2          | 27.9                |\n\n\u003c/div\u003e\n\n#### English Open Ended Generation Evaluation\nWe evaluate our model on AlpacaEval 2.0 and MTBench, showing the competitive performance of DeepSeek-V2-Chat-RL on English conversation generation. \n\u003cp align=\"center\"\u003e\n  \u003cimg width=\"50%\" src=\"https://github.com/deepseek-ai/DeepSeek-V2/blob/main/figures/mtbench.png?raw=true\" /\u003e\n\u003c/p\u003e\n\n#### Chinese Open Ended Generation Evaluation\n**Alignbench** (https://arxiv.org/abs/2311.18743)\n\u003cdiv align=\"center\"\u003e\n\n| **模型** | **开源/闭源** | **总分** | **中文推理** | **中文语言** |\n| :---: | :---: | :---: | :---: | :---: |\n| gpt-4-1106-preview | 闭源 | 8.01 | 7.73 | 8.29 |\n| DeepSeek-V2 Chat (RL) | 开源 | 7.91 | 7.45 | 8.36 |\n| erniebot-4.0-202404 (文心一言) | 闭源 | 7.89 | 7.61 | 8.17 |\n| DeepSeek-V2 Chat (SFT) | 开源 | 7.74 | 7.30 | 8.17 |\n| gpt-4-0613 | 闭源 | 7.53 | 7.47 | 7.59 |\n| erniebot-4.0-202312 (文心一言) | 闭源 | 7.36 | 6.84 | 7.88 |\n| moonshot-v1-32k-202404 (月之暗面) | 闭源 | 7.22 | 6.42 | 8.02 |\n| Qwen1.5-72B-Chat (通义千问) | 开源 | 7.19 | 6.45 | 7.93 |\n| DeepSeek-67B-Chat | 开源 | 6.43 | 5.75 | 7.11 |\n| Yi-34B-Chat (零一万物) | 开源 | 6.12 | 4.86 | 7.38 |\n| gpt-3.5-turbo-0613 | 闭源 | 6.08 | 5.35 | 6.71 |\n| DeepSeek-V2-Lite 16B Chat | 开源 | 6.01 | 4.71 | 7.32 |\n\n\u003c/div\u003e\n\n#### Coding Benchmarks\nWe evaluate our model on LiveCodeBench (0901-0401), a benchmark designed for live coding challenges. As illustrated, DeepSeek-V2 demonstrates considerable proficiency in LiveCodeBench, achieving a Pass@1 score that surpasses several other sophisticated models. This performance highlights the model's effectiveness in tackling live coding tasks.\n\n\u003cp align=\"center\"\u003e\n  \u003cimg width=\"50%\" src=\"https://github.com/deepseek-ai/DeepSeek-V2/blob/main/figures/code_benchmarks.png?raw=true\"\u003e\n\u003c/p\u003e\n\n## 5. Model Architecture\nDeepSeek-V2 adopts innovative architectures to guarantee economical training and efficient inference： \n- For attention, we design MLA (Multi-head Latent Attention), which utilizes low-rank key-value union compression to eliminate the bottleneck of inference-time key-value cache, thus supporting efficient inference. \n- For Feed-Forward Networks (FFNs), we adopt DeepSeekMoE architecture, a high-performance MoE architecture that enables training stronger models at lower costs. \n\n\u003cp align=\"center\"\u003e\n  \u003cimg width=\"90%\" src=\"https://github.com/deepseek-ai/DeepSeek-V2/blob/main/figures/architecture.png?raw=true\" /\u003e\n\u003c/p\u003e\n\n## 6. Chat Website\nYou can chat with the DeepSeek-V2 on DeepSeek's official website: [chat.deepseek.com](https://chat.deepseek.com/sign_in)\n\n## 7. API Platform\nWe also provide OpenAI-Compatible API at DeepSeek Platform: [platform.deepseek.com](https://platform.deepseek.com/). Sign up for over millions of free tokens. And you can also pay-as-you-go at an unbeatable price.\n\n\n\u003cp align=\"center\"\u003e\n  \u003cimg width=\"40%\" src=\"https://github.com/deepseek-ai/DeepSeek-V2/blob/main/figures/model_price.png?raw=true\"\u003e\n\u003c/p\u003e\n\n## 8. How to run locally\n**To utilize DeepSeek-V2 in BF16 format for inference, 80GB*8 GPUs are required.**\n### Inference with Huggingface's Transformers\nYou can directly employ [Huggingface's Transformers](https://github.com/huggingface/transformers) for model inference.\n\n#### Text Completion\n```python\nimport torch\nfrom transformers import AutoTokenizer, AutoModelForCausalLM, GenerationConfig\n\nmodel_name = \"deepseek-ai/DeepSeek-V2\"\ntokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)\n# `max_memory` should be set based on your devices\nmax_memory = {i: \"75GB\" for i in range(8)}\n# `device_map` cannot be set to `auto`\nmodel = AutoModelForCausalLM.from_pretrained(model_name, trust_remote_code=True, device_map=\"sequential\", torch_dtype=torch.bfloat16, max_memory=max_memory, attn_implementation=\"eager\")\nmodel.generation_config = GenerationConfig.from_pretrained(model_name)\nmodel.generation_config.pad_token_id = model.generation_config.eos_token_id\n\ntext = \"An attention function can be described as mapping a query and a set of key-value pairs to an output, where the query, keys, values, and output are all vectors. The output is\"\ninputs = tokenizer(text, return_tensors=\"pt\")\noutputs = model.generate(**inputs.to(model.device), max_new_tokens=100)\n\nresult = tokenizer.decode(outputs[0], skip_special_tokens=True)\nprint(result)\n```\n\n#### Chat Completion\n```python\nimport torch\nfrom transformers import AutoTokenizer, AutoModelForCausalLM, GenerationConfig\n\nmodel_name = \"deepseek-ai/DeepSeek-V2-Chat\"\ntokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)\n# `max_memory` should be set based on your devices\nmax_memory = {i: \"75GB\" for i in range(8)}\n# `device_map` cannot be set to `auto`\nmodel = AutoModelForCausalLM.from_pretrained(model_name, trust_remote_code=True, device_map=\"sequential\", torch_dtype=torch.bfloat16, max_memory=max_memory, attn_implementation=\"eager\")\nmodel.generation_config = GenerationConfig.from_pretrained(model_name)\nmodel.generation_config.pad_token_id = model.generation_config.eos_token_id\n\nmessages = [\n    {\"role\": \"user\", \"content\": \"Write a piece of quicksort code in C++\"}\n]\ninput_tensor = tokenizer.apply_chat_template(messages, add_generation_prompt=True, return_tensors=\"pt\")\noutputs = model.generate(input_tensor.to(model.device), max_new_tokens=100)\n\nresult = tokenizer.decode(outputs[0][input_tensor.shape[1]:], skip_special_tokens=True)\nprint(result)\n```\n\nThe complete chat template can be found within `tokenizer_config.json` located in the huggingface model repository.\n\nAn example of chat template is as belows:\n\n```bash\n\u003c｜begin▁of▁sentence｜\u003eUser: {user_message_1}\n\nAssistant: {assistant_message_1}\u003c｜end▁of▁sentence｜\u003eUser: {user_message_2}\n\nAssistant:\n```\n\nYou can also add an optional system message:\n\n```bash\n\u003c｜begin▁of▁sentence｜\u003e{system_message}\n\nUser: {user_message_1}\n\nAssistant: {assistant_message_1}\u003c｜end▁of▁sentence｜\u003eUser: {user_message_2}\n\nAssistant:\n```\n### Inference with SGLang (recommended)\n\n[SGLang](https://github.com/sgl-project/sglang) currently supports MLA optimizations, FP8 (W8A8), FP8 KV Cache, and Torch Compile, offering the best latency and throughput among open-source frameworks. Here are some example commands to launch an OpenAI API-compatible server:\n\n```bash\n# BF16, tensor parallelism = 8\npython3 -m sglang.launch_server --model deepseek-ai/DeepSeek-V2-Chat --tp 8 --trust-remote-code\n\n# BF16, w/ torch.compile (The compilation can take several minutes)\npython3 -m sglang.launch_server --model deepseek-ai/DeepSeek-V2-Lite-Chat --trust-remote-code --enable-torch-compile\n\n# FP8, tensor parallelism = 8, FP8 KV cache\npython3 -m sglang.launch_server --model deepseek-ai/DeepSeek-V2-Chat --tp 8 --trust-remote-code --quant fp8 --kv-cache-dtype fp8_e5m2\n```\n\nAfter launching the server, you can query it with OpenAI API\n\n```\nimport openai\nclient = openai.Client(\n    base_url=\"http://127.0.0.1:30000/v1\", api_key=\"EMPTY\")\n\n# Chat completion\nresponse = client.chat.completions.create(\n    model=\"default\",\n    messages=[\n        {\"role\": \"system\", \"content\": \"You are a helpful AI assistant\"},\n        {\"role\": \"user\", \"content\": \"List 3 countries and their capitals.\"},\n    ],\n    temperature=0,\n    max_tokens=64,\n)\nprint(response)\n```\n\n### Inference with vLLM (recommended)\nTo utilize [vLLM](https://github.com/vllm-project/vllm) for model inference, please merge this Pull Request into your vLLM codebase: https://github.com/vllm-project/vllm/pull/4650.\n\n```python\nfrom transformers import AutoTokenizer\nfrom vllm import LLM, SamplingParams\n\nmax_model_len, tp_size = 8192, 8\nmodel_name = \"deepseek-ai/DeepSeek-V2-Chat\"\ntokenizer = AutoTokenizer.from_pretrained(model_name)\nllm = LLM(model=model_name, tensor_parallel_size=tp_size, max_model_len=max_model_len, trust_remote_code=True, enforce_eager=True)\nsampling_params = SamplingParams(temperature=0.3, max_tokens=256, stop_token_ids=[tokenizer.eos_token_id])\n\nmessages_list = [\n    [{\"role\": \"user\", \"content\": \"Who are you?\"}],\n    [{\"role\": \"user\", \"content\": \"Translate the following content into Chinese directly: DeepSeek-V2 adopts innovative architectures to guarantee economical training and efficient inference.\"}],\n    [{\"role\": \"user\", \"content\": \"Write a piece of quicksort code in C++.\"}],\n]\n\nprompt_token_ids = [tokenizer.apply_chat_template(messages, add_generation_prompt=True) for messages in messages_list]\n\noutputs = llm.generate(prompt_token_ids=prompt_token_ids, sampling_params=sampling_params)\n\ngenerated_text = [output.outputs[0].text for output in outputs]\nprint(generated_text)\n```\n\n### LangChain Support\nSince our API is compatible with OpenAI, you can easily use it in [langchain](https://www.langchain.com/).\nHere is an example:\n\n```\nfrom langchain_openai import ChatOpenAI\nllm = ChatOpenAI(\n    model='deepseek-chat',\n    openai_api_key=\u003cyour-deepseek-api-key\u003e,\n    openai_api_base='https://api.deepseek.com/v1',\n    temperature=0.85,\n    max_tokens=8000)\n``` \n## 9. License\nThis code repository is licensed under [the MIT License](LICENSE-CODE). The use of DeepSeek-V2 Base/Chat models is subject to [the Model License](LICENSE-MODEL). DeepSeek-V2 series (including Base and Chat) supports commercial use.\n\n## 10. Citation\n```\n@misc{deepseekv2,\n      title={DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model}, \n      author={DeepSeek-AI},\n      year={2024},\n      eprint={2405.04434},\n      archivePrefix={arXiv},\n      primaryClass={cs.CL}\n}\n```\n\n## 11. Contact\nIf you have any questions, please raise an issue or contact us at [service@deepseek.com](service@deepseek.com).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdeepseek-ai%2FDeepSeek-V2","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdeepseek-ai%2FDeepSeek-V2","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdeepseek-ai%2FDeepSeek-V2/lists"}