{"id":13584204,"url":"https://github.com/SunLemuria/OpenGPTAndBeyond","last_synced_at":"2025-04-07T01:31:39.602Z","repository":{"id":159920702,"uuid":"621837511","full_name":"SunLemuria/OpenGPTAndBeyond","owner":"SunLemuria","description":"Open efforts to implement ChatGPT-like models and beyond.","archived":false,"fork":false,"pushed_at":"2024-07-23T09:17:34.000Z","size":247,"stargazers_count":107,"open_issues_count":0,"forks_count":15,"subscribers_count":4,"default_branch":"main","last_synced_at":"2025-03-29T00:03:09.363Z","etag":null,"topics":["alpaca","chatbot","chatglm","chatgpt","large-language-models","llm","nlp","openai","opensource"],"latest_commit_sha":null,"homepage":"","language":null,"has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/SunLemuria.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-03-31T13:48:45.000Z","updated_at":"2025-01-07T11:35:21.000Z","dependencies_parsed_at":"2023-10-20T13:09:14.169Z","dependency_job_id":"b9c3ddfc-f85b-4916-824b-4912af9477ad","html_url":"https://github.com/SunLemuria/OpenGPTAndBeyond","commit_stats":null,"previous_names":["sunlemuria/opengptandbeyond"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SunLemuria%2FOpenGPTAndBeyond","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SunLemuria%2FOpenGPTAndBeyond/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SunLemuria%2FOpenGPTAndBeyond/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SunLemuria%2FOpenGPTAndBeyond/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/SunLemuria","download_url":"https://codeload.github.com/SunLemuria/OpenGPTAndBeyond/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247577934,"owners_count":20961197,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["alpaca","chatbot","chatglm","chatgpt","large-language-models","llm","nlp","openai","opensource"],"created_at":"2024-08-01T15:04:04.991Z","updated_at":"2025-04-07T01:31:34.576Z","avatar_url":"https://github.com/SunLemuria.png","language":null,"funding_links":[],"categories":["Others","Building"],"sub_categories":["LLM Models"],"readme":"# ChatGPT：开源与超越\n\n\u003cp align=\"center\"\u003e 简体中文 | \u003ca href=\"README_EN.md\"\u003e English \u003c/a\u003e\u003c/p\u003e\n\n开源类ChatGPT模型的实现与超越之路\n\nLLaMA权重意外泄露、以及斯坦福小羊驼用以self-instruct方式从gpt-3 api构建的数据对LLaMA进行指令微调取得令人印象深刻的表现以来，开源社区对实现ChatGPT水平的大语言模型感到越来越有希望。\n\n这个repo就是记录这个复刻与超越的过程，为社区提供一个概览。\n\n包括：相关技术进展、基础模型、领域模型、训练、推理、技术、数据、多语言、多模态，等等\n\n\u003cdetails\u003e\n\n\u003csummary\u003e# 目录\u003c/summary\u003e\n\n- [Base Models](#base-models)\n- [Domain Models](#domain-models)\n- [General Domain Instruction Models](#general-domain-instruction-models)\n- [Model Merging](#model-merging)\n- [Alternatives To Transformer](#alternatives-to-transformer)\n- [Multi-Modal](#multi-modal)\n- [MoE](#moe)\n- [Data](#data)\n  - [Pretrain Data](#pretrain-data)\n  - [Instruction Data](#instruction-data)\n  - [Synthetic Data Generation](#synthetic-data-generation)\n- [Evaluation](#evaluation)\n  - [Benchmark](#enchmark)\n  - [LeaderBoard](#leaderboard)\n- [Framework/ToolKit/Platform](#frameworktoolkitplatform)\n- [Alignment](#alignment)\n- [Multi-Language](#multi-language)\n  - [vocabulary expansion](#vocabulary-expansion)\n- [Efficient Training/Fine-Tuning](#efficient-trainingfine-tuning)\n- [Low-Cost Inference](#low-cost-inference)\n  - [quantization](#quantization)\n  - [projects](#projects)\n  - [Prompt Compression](#prompt-compression)\n- [Prompting](#prompting)\n- [Safety](#safety)\n- [Truthfulness](#truthfulness)\n- [Exceeding Context Window](#exceeding-context-window)\n- [Knowledge Editing](#knowledge-editing)\n  - [Implementations](#implementations)\n- [External Knowledge](#external-knowledge)\n  - [AI搜索引擎](#ai搜索引擎)\n  - [Chat with Docs](#chat-with-docs)\n  - [内容解析](#内容解析)\n  - [Vector DataBase](#vector-database)\n- [External Tools](#external-tools)\n  - [Using Existing Tools](#using-existing-tools)\n  - [Make New Tools](#make-new-tools)\n- [Agent](#agent)\n- [LLMs as XXX](#llms-as-xxx)\n- [Similar Collections](#similar-collections)\n\n\u003c/details\u003e\n\n# Base Models\n\n| contributor                                | model/project                                                                                                               | license                                                                                                                                                                                                                                                                                           | language | main feature                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 |\n| ------------------------------------------ | --------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |\n| Meta                                       | [LLaMA/LLaMA2](https://github.com/facebookresearch/llama)                                                                      |                                                                                                                                                                                                                                                                                                   | multi    | LLaMA-13B outperforms GPT-3(175B) and LLaMA-65B is competitive to PaLM-540M.\u003cbr /\u003eBase model for most follow-up works.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       |\n| HuggingFace-BigScience                     | [BLOOM](https://huggingface.co/bigscience/bloom)                                                                               |                                                                                                                                                                                                                                                                                                   | multi    | an autoregressive Large Language Model (LLM) trained by HuggingFace BigScience.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              |\n| HuggingFace-BigScience                     | [BLOOMZ](https://huggingface.co/bigscience/bloomz)                                                                             |                                                                                                                                                                                                                                                                                                   | multi    | instruction-finetuned version of BLOOM \u0026 mT5 pretrained multilingual language models on crosslingual task mixture.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          |\n| EleutherAI                                 | [GPT-J](https://huggingface.co/EleutherAI/gpt-j-6b)                                                                            |                                                                                                                                                                                                                                                                                                   | en       | transformer model trained using Ben Wang's[Mesh Transformer JAX](https://github.com/kingoflolz/mesh-transformer-jax/).                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          |\n| Meta                                       | [OPT](https://huggingface.co/facebook/opt-66b)                                                                                 |                                                                                                                                                                                                                                                                                                   | en       | Open Pre-trained Transformer Language Models, aim in developing this suite of OPT models is to enable reproducible\u003cbr /\u003e and responsible research at scale, and to bring more voices to the table in studying the impact of these LLMs.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     |\n| [Cerebras Systems](https://www.cerebras.net/) | [Cerebras-GPT](https://huggingface.co/cerebras/Cerebras-GPT-13B)                                                               |                                                                                                                                                                                                                                                                                                   | en       | Pretrained LLM, GPT-3 like, Commercially available, efficiently trained on the[Andromeda](https://www.cerebras.net/andromeda/) AI supercomputer,\u003cbr /\u003etrained in accordance with[Chinchilla scaling laws](https://arxiv.org/abs/2203.15556) (20 tokens per model parameter) which is compute-optimal.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |\n| EleutherAI                                 | [pythia](https://github.com/EleutherAI/pythia)                                                                                 |                                                                                                                                                                                                                                                                                                   | en       | combine interpretability analysis and scaling laws to understand how knowledge develops\u003cbr /\u003eand evolves during training in autoregressive transformers.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     |\n| Stability-AI                               | [StableLM](https://github.com/Stability-AI/StableLM)                                                                           |                                                                                                                                                                                                                                                                                                   | en       | Stability AI Language Models                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 |\n| FDU                                        | [MOSS](https://github.com/OpenLMLab/MOSS)                                                                                      |                                                                                                                                                                                                                                                                                                   | en/zh    | An open-source tool-augmented conversational language model from Fudan University.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           |\n| ssymmetry \u0026 FDU                            | [BBT-2](https://bbt.ssymmetry.com/)                                                                                            |                                                                                                                                                                                                                                                                                                   | zh       | 12B open-source LM.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          |\n| @mlfoundations                             | [OpenFlamingo](https://github.com/mlfoundations/open_flamingo)                                                                 |                                                                                                                                                                                                                                                                                                   | en       | An open-source framework for training large multimodal models.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               |\n| EleutherAI                                 | [GPT-NeoX-20B](https://huggingface.co/EleutherAI/gpt-neox-20b)                                                                 |                                                                                                                                                                                                                                                                                                   | en       | Its architecture intentionally resembles that of GPT-3, and is almost identical to that of[GPT-J- 6B](https://huggingface.co/EleutherAI/gpt-j-6B).                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              |\n| UCB                                        | [OpenLLaMA](https://github.com/openlm-research/open_llama)                                                                     | Apache-2.0                                                                                                                                                                                                                                                                                        | en       | An Open Reproduction of LLaMA.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               |\n| MosaicML                                   | [MPT](https://github.com/mosaicml/llm-foundry)                                                                                 | Apache-2.0                                                                                                                                                                                                                                                                                        | en       | MPT-7B is a GPT-style model, and the first in the MosaicML Foundation Series of models.\u003cbr /\u003e Trained on 1T tokens of a MosaicML-curated dataset, MPT-7B is open-source,\u003cbr /\u003e commercially usable, and equivalent to LLaMa 7B on evaluation metrics.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        |\n| TogetherComputer                           | [RedPajama-INCITE-Base-3B-v1](https://huggingface.co/togethercomputer/RedPajama-INCITE-Base-3B-v1)                             | Apache-2.0                                                                                                                                                                                                                                                                                        | en       | A 2.8B parameter pretrained language model, pretrained on[RedPajama-Data-1T](https://huggingface.co/models?dataset=dataset:togethercomputer/RedPajama-Data-1T),\u003cbr /\u003e together with an [Instruction-tuned Version](https://huggingface.co/togethercomputer/RedPajama-INCITE-Instruct-3B-v1) and a [Chat Version](https://huggingface.co/togethercomputer/RedPajama-INCITE-Chat-3B-v1).                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              |\n| Lightning-AI                               | [Lit-LLaMA](https://github.com/Lightning-AI/lit-llama)                                                                         | Apache-2.0                                                                                                                                                                                                                                                                                        | -        | Independent implementation of[LLaMA](https://github.com/facebookresearch/llama) that is fully open source under the **Apache 2.0 license.**                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               |\n| @conceptofmind                             | [PaLM](https://github.com/conceptofmind/PaLM)                                                                                  | MIT License                                                                                                                                                                                                                                                                                       | en       | An open-source implementation of Google PaLM models.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        |\n| [TII](https://www.tii.ae/)                    | [Falcon-7B](https://huggingface.co/tiiuae/falcon-7b)                                                                           | [TII Falcon LLM License](https://huggingface.co/tiiuae/falcon-7b/blob/main/LICENSE.txt)                                                                                                                                                                                                              | en       | a 7B parameters causal decoder-only model built by[TII](https://www.tii.ae/) and trained on 1,500B tokens of [RefinedWeb](https://huggingface.co/datasets/tiiuae/falcon-refinedweb) enhanced with curated corpora.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 |\n| [TII](https://www.tii.ae/)                    | [Falcon-40B](https://huggingface.co/tiiuae/falcon-40b)                                                                         | [TII Falcon LLM License](https://huggingface.co/tiiuae/falcon-7b/blob/main/LICENSE.txt)                                                                                                                                                                                                              | multi    | a 40B parameters causal decoder-only model built by[TII](https://www.tii.ae/) and trained on 1,000B tokens of [RefinedWeb](https://huggingface.co/datasets/tiiuae/falcon-refinedweb) enhanced with curated corpora.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |\n| TigerResearch                              | [TigerBot](https://github.com/TigerResearch/TigerBot)                                                                          | Apache-2.0                                                                                                                                                                                                                                                                                        | en/zh    | a multi-language and multitask LLM.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          |\n| BAAI                                       | [Aquila](https://github.com/FlagAI-Open/FlagAI/tree/master/examples/Aquila) / [Aquila2](https://github.com/FlagAI-Open/Aquila2) | [BAAI_Aquila_Model_License](https://github.com/FlagAI-Open/FlagAI/blob/master/BAAI_Aquila_Model_License.pdf)                                                                                                                                                                                         | en/zh    | The Aquila language model inherits the architectural design advantages of GPT-3 and LLaMA, replacing a batch of more efficient underlying\u003cbr /\u003e operator implementations and redesigning the tokenizer for Chinese-English bilingual support.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |\n| OpenBMB                                    | [CPM-Bee](https://github.com/OpenBMB/CPM-Bee)                                                                                  | [通用模型许可协议-来源说明-宣传限制-商业授权](https://github.com/OpenBMB/General-Model-License/blob/main/%E9%80%9A%E7%94%A8%E6%A8%A1%E5%9E%8B%E8%AE%B8%E5%8F%AF%E5%8D%8F%E8%AE%AE-%E6%9D%A5%E6%BA%90%E8%AF%B4%E6%98%8E-%E5%AE%A3%E4%BC%A0%E9%99%90%E5%88%B6-%E5%95%86%E4%B8%9A%E6%8E%88%E6%9D%83.md) | en/zh    | **CPM-Bee** is a fully open-source, commercially-usable Chinese-English bilingual base model with a capacity of ten billion parameters.\u003cbr /\u003eAnd has been pre-trained on an extensive corpus of trillion-scale tokens.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |\n| Baichuan                                   | [baichuan-7B](https://github.com/baichuan-inc/baichuan-7B)                                                                     | Apache-2.0                                                                                                                                                                                                                                                                                        | en/zh    | It has achieved the best performance among models of the same size on standard\u003cbr /\u003e Chinese and English authoritative benchmarks (C-EVAL, MMLU, etc).                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       |\n| Tencent                                    | [lyraChatGLM](https://huggingface.co/TMElyralab/lyraChatGLM)                                                                   | MIT License                                                                                                                                                                                                                                                                                       | en/zh    | To the best of our knowledge, it is the**first accelerated version of ChatGLM-6B**.\u003cbr /\u003eThe inference speed of lyraChatGLM has achieved **300x** acceleration upon the early original version.\u003cbr /\u003e We are still working hard to further improve the performance.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              |\n| SalesForce                                 | [XGen](https://github.com/salesforce/xgen)                                                                                     | Apache-2.0                                                                                                                                                                                                                                                                                        | multi    | Salesforce open-source LLMs with 8k sequence length                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          |\n| Shanghai AI Lab                            | [InternLM](https://github.com/InternLM/InternLM)                                                                               | Apache-2.0                                                                                                                                                                                                                                                                                        | en/zh    | InternLM has open-sourced a 7 billion parameter base model and a chat model tailored for practical scenarios. The model has the following characteristics:\u003cbr /\u003eIt leverages trillions of high-quality tokens for training to establish a powerful knowledge base.\u003cbr /\u003eIt supports an 8k context window length, enabling longer input sequences and stronger reasoning capabilities.\u003cbr /\u003eIt provides a versatile toolset for users to flexibly build their own workflows.                                                                                                                                                                                                                                                                                                                                                                                                                  |\n| xverse-ai                                  | [XVERSE](https://github.com/xverse-ai)                                                                                         | Apache-2.0                                                                                                                                                                                                                                                                                        | multi    | Multilingual LLMs developed by XVERSE Technology Inc.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       |\n| Writer                                     | [palmyra](https://huggingface.co/Writer/palmyra-base)                                                                          | Apache-2.0                                                                                                                                                                                                                                                                                        | en       | extremely powerful while being extremely fast. This model excels at many nuanced tasks\u003cbr /\u003e such as sentiment classification and summarization.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |\n| Mistral AI                                 | [Mistral](https://huggingface.co/mistralai/Mistral-7B-v0.1)                                                                    | Apache-2.0                                                                                                                                                                                                                                                                                        | en       | Mistral 7B is a 7.3B parameter model that:\u003cbr /\u003e1. Outperforms Llama 2 13B on all benchmarks\u003cbr /\u003e2. Outperforms Llama 1 34B on many benchmarks\u003cbr /\u003e3. Approaches CodeLlama 7B performance on code, while remaining good at English tasks\u003cbr /\u003e4. Uses Grouped-query attention (GQA) for faster inference\u003cbr /\u003e5. Uses Sliding Window Attention (SWA) to handle longer sequences at smaller cost                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            |\n| SkyworkAI                                  | [Skywork](https://github.com/SkyworkAI/Skywork)                                                                                | -                                                                                                                                                                                                                                                                                                 | en/zh    | In major evaluation benchmarks, Skywork-13B is at the forefront of Chinese open source models and is the optimal level under the same parameter scale;\u003cbr /\u003e it can be used commercially without application; it has also open sourced a 600G (150 billion tokens) Chinese data set.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         |\n| [01.AI](https://01.ai/)                       | [Yi](https://github.com/01-ai/Yi)                                                                                              | -                                                                                                                                                                                                                                                                                                 | en/zh    | The**Yi** series models are large language models trained from scratch by developers at [01.AI](https://01.ai/).                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          |\n| IEIT Systems                               | [Yuan-2.0](https://github.com/IEIT-Yuan/Yuan-2.0)                                                                              | -                                                                                                                                                                                                                                                                                                 | en/zh    | In this work, the Localized Filtering-based Attention (LFA) is introduced to incorporate prior knowledge of local dependencies of natural language into Attention.\u003cbr /\u003e Based on LFA, we develop and release Yuan 2.0, a large language model with parameters ranging from 2.1 billion to 102.6 billion. A data filtering and generation method\u003cbr /\u003e is presented to build pretraining and fine-tuning dataset in high quality. A distributed training method with non-uniform pipeline parallel, data parallel, and optimizer parallel is proposed,\u003cbr /\u003e which greatly reduces the bandwidth requirements of intra-node communication, and achieves good performance in large-scale distributed training.\u003cbr /\u003e Yuan 2.0 models display impressive ability in code generation, math problem-solving, and chat compared with existing models.                                            |\n| Nanbeige                                   | [Nanbeige](https://github.com/Nanbeige/Nanbeige)                                                                               | Apache-2.0                                                                                                                                                                                                                                                                                        | en/zh    | Nanbeige-16B is a 16 billion parameter language model developed by Nanbeige LLM Lab. It uses 2.5T Tokens for pre-training. The training data includes a large amount of high-quality internet corpus, various books, code, etc. It has achieved good results on various authoritative evaluation data sets. This release includes the Base, Chat, Base-32k and Chat-32k.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     |\n| deepseek-ai                                | [deepseek-LLM](https://github.com/deepseek-ai/deepseek-LLM)                                                                    | MIT License                                                                                                                                                                                                                                                                                       | en/zh    | an advanced language model comprising 67 billion parameters. It has been trained from scratch on a vast dataset of 2 trillion tokens in both English and Chinese.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            |\n| LLM360                                     | [LLM360](https://github.com/LLM360)                                                                                            | -                                                                                                                                                                                                                                                                                                 | -        | Most open-source LLM releases include model weights and evaluation results. However, additional information is often needed to genuinely understand a model's behavior—and this information is not typically available to most researchers. Hence, we commit to releasing all of the intermediate checkpoints (up to 360!) collected during training, all of the training data (and its mapping to checkpoints), all collected metrics (e.g., loss, gradient norm, evaluation results), and all source code for preprocessing data and model training. These additional artifacts can help researchers and practitioners to have a deeper look into LLM’s construction process and conduct research such as analyzing model dynamics. We hope that LLM360 can help make advanced LLMs more transparent, foster research in smaller-scale labs, and improve reproducibility in AI research. |\n| FDU, etc.                                  | [CT-LLM](https://github.com/Chinese-Tiny-LLM/Chinese-Tiny-LLM)                                                                 | -                                                                                                                                                                                                                                                                                                 | zh/en    | focusing on the Chinese language. Starting from scratch, CT-LLM primarily uses Chinese data from a 1,200 billion token corpus, including 800 billion Chinese, 300 billion English, and 100 billion code tokens. By open-sourcing CT-LLM's training process, including data processing and the Massive Appropriate Pretraining Chinese Corpus (MAP-CC), and introducing the Chinese Hard Case Benchmark (CHC-Bench), we encourage further research and innovation, aiming for more inclusive and adaptable language models.                                                                                                                                                                                                                                                                                                                                                                 |\n| TigerLab                                   | [MAP-NEO](https://github.com/multimodal-art-projection/MAP-NEO)                                                                | -                                                                                                                                                                                                                                                                                                 | zh/en    | 第一个从数据处理到模型训练过程、模型权重全流程开源的大模型。                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 |\n| DataCamp                                   | [DCLM](https://github.com/mlfoundations/dclm)                                                                                  | -                                                                                                                                                                                                                                                                                                 | -        | 提供了用于处理原始数据、标记化、数据打乱、模型训练以及性能评估的工具和指南。基础baseline 7B模型性能优异。                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    |\n\n# Domain Models\n\n| contributor                        | model                                                                                                              | domain          | language | base model                                                                                                  | main feature                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          |\n| ---------------------------------- | ------------------------------------------------------------------------------------------------------------------ | --------------- | -------- | ------------------------------------------------------------------------------------------------------------ | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |\n| UT Southwestern/\u003cbr /\u003eUIUC/OSU/HDU | [ChatDoctor](https://github.com/Kent0n-Li/ChatDoctor)                                                                 | medical         | en       | LLaMA                                                                                                        | Maybe the first domain-specific chat model tuned on LLaMA.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            |\n| Cambridge                          | [Visual Med-Alpaca](https://github.com/cambridgeltl/visual-med-alpaca)                                                | biomedical      | en       | LLaMA-7B                                                                                                     | a multi-modal foundation model designed specifically for the biomedical domain.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       |\n| HIT                                | [BenTsao](https://github.com/SCIR-HI/Huatuo-Llama-Med-Chinese) / [ChatGLM-Med](https://github.com/SCIR-HI/Med-ChatGLM) | medical         | zh       | LLaMA/ChatGLM                                                                                                | fine-tuned with Chinese medical knowledge dataset, which is generated by using gpt3.5 api.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            |\n| ShanghaiTech, etc.                 | [DoctorGLM](https://github.com/xionghonglin/DoctorGLM)                                                                | medical         | en/zh    | ChatGLM-6B                                                                                                   | Chinese medical consultation model fine-tuned on ChatGLM-6B.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         |\n| THU AIR                            | [BioMedGPT-1.6B](https://github.com/BioFM/OpenBioMed)                                                                 | biomedical      | en/zh    | -                                                                                                            | a pre-trained multi-modal molecular foundation model with 1.6B parameters that associates 2D molecular graphs with texts.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |\n| @LiuHC0428                         | [LawGPT_zh](https://github.com/LiuHC0428/LAW-GPT)                                                                     | legal           | zh       | ChatGLM-6B                                                                                                   | a general model in Chinese legal domain, trained on data generated via Reliable-Self-Instruction.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   |\n| SJTU                               | [MedicalGPT-zh](https://github.com/MediaBrain-SJTU/MedicalGPT-zh)                                                     | medical         | zh       | ChatGLM-6B                                                                                                   | a general model in Chinese medical domain, a diverse data generated via self-instruct.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               |\n| SJTU                               | [PMC-LLaMA](https://github.com/chaoyi-wu/PMC-LLaMA)                                                                   | medical         | zh       | LLaMA                                                                                                        | Continue Training LLaMA on Medical Papers.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            |\n| HuggingFace                        | [StarCoder](https://github.com/bigcode-project/starcoder)                                                             | code generation | en       | -                                                                                                            | a language model (LM) trained on source code and natural language text. Its training data incorporates more than\u003cbr /\u003e 80 different programming languages as well as text extracted from GitHub issues and commits and from notebooks.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |\n| @CogStack                          | [NHS-LLM](https://github.com/CogStack/opengpt#nhs-llm)                                                                | medical         | en       | not clear                                                                                                    | A conversational model for healthcare trained using[OpenGPT](https://github.com/CogStack/opengpt).                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       |\n| @pengxiao-song                     | [LaWGPT](https://github.com/pengxiao-song/LaWGPT)                                                                     | legal           | zh       | LLaMA/ChatGLM                                                                                                | expand the vocab with Chinese legal terminologies, instruction fine-tuned on data generated using self-instruct.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     |\n| Duxiaoman                          | [XuanYuan](https://github.com/Duxiaoman-DI/XuanYuan)                                                                  | finance         | zh       | BLOOM-176B                                                                                                   | A Large Chinese Financial Chat Model with Hundreds of Billions Parameters.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            |\n| CUHK                               | [HuatuoGPT](https://github.com/FreedomIntelligence/HuatuoGPT)                                                         | medical         | zh       | not clear                                                                                                    | HuatuoGPT, a large language model (LLM) trained on a vast Chinese medical corpus. Our objective with HuatuoGPT is\u003cbr /\u003e to construct a more professional ‘ChatGPT’ for medical consultation scenarios.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              |\n| PKU                                | [Lawyer LLaMA](https://github.com/AndrewZhe/lawyer-llama)                                                             | legal           | zh       | LLaMA                                                                                                        | continue pretraining on Chinese legal data, insturction tuned on legal exams and legal consulting qa pairs.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          |\n| THU                                | [LexiLaw](https://github.com/CSHaitao/LexiLaw)                                                                        | legal           | zh       | ChatGLM-6B                                                                                                   | trained on a mixture of general data ([BELLE](https://github.com/LianjiaTech/BELLE) 1.5M) and legal data                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 |\n| THU, etc.                          | [taoli](https://github.com/blcuicall/taoli)                                                                           | education       | zh       | LLaMA                                                                                                        | A large model for international Chinese education. It extends specific vocabulary on the base model,\u003cbr /\u003e and uses the domain's proprietary data set for instruction fine-tuning.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   |\n| NUS                                | [Goat](https://github.com/liutiedong/goat)                                                                            | arithmetic      | en       | LLaMA                                                                                                        | a fine-tuned LLaMA model that significantly outperforms GPT-4 on a range of arithmetic tasks.\u003cbr /\u003e Fine-tuned on a synthetically generated dataset, Goat achieves state-ofthe-art performance on BIG-bench arithmetic sub-task.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   |\n| CU/NYU                             | [FinGPT](https://github.com/AI4Finance-Foundation/FinGPT)                                                             | finance         | en       | -                                                                                                            | an end-to-end open-source framework for financial large language models (FinLLMs).                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   |\n| microsoft                          | [WizardCoder](https://github.com/nlpxucan/WizardLM/tree/main/WizardCoder)                                             | code generation | en       | StarCoder                                                                                                    | trained with**78k** evolved code instructions. surpasses  **Claude-Plus (+6.8)** , **Bard (+15.3)** and **InstructCodeT5+ (+22.3)** on the [HumanEval Benchmarks](https://github.com/openai/human-eval).                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        |\n| UCAS                               | [Cornucopia](https://github.com/jerry1993-tech/Cornucopia-LLaMA-Fin-Chinese)                                          | finance         | zh       | LLaMA                                                                                                        | finetune LLaMA on Chinese financial knowledge,                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        |\n| PKU                                | [ChatLaw](https://github.com/PKU-YuanGroup/ChatLaw)                                                                   | legal           | zh       | [Ziya](https://huggingface.co/IDEA-CCNL/Ziya-LLaMA-13B-Pretrain-v1) / [Anima](https://github.com/lyogavin/Anima) | Chinese legal domain model.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           |\n| @michael-wzhu                      | [ChatMed](https://github.com/michael-wzhu/ChatMed)                                                                    | medical         | zh       | LLaMA                                                                                                        | Chinese medical LLM based on LLaMA-7B.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               |\n| SCUT                               | [SoulChat](https://github.com/scutcyr/SoulChat)                                                                       | mental health   | zh       | ChatGLM-6B                                                                                                   | Chinese dialogue LLM in mental health domain, based on ChatGLM-6B.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   |\n| @shibing624                        | [MedicalGPT](https://github.com/shibing624/MedicalGPT)                                                                | medical         | zh       | ChatGLM-6B                                                                                                   | Training Your Own Medical GPT Model with ChatGPT Training Pipeline.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   |\n| BJTU                               | [TransGPT](https://github.com/DUOMO/TransGPT)                                                                         | transportation  | zh       | LLaMA-7B                                                                                                     | Chinese transportation model.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        |\n| BAAI                               | [AquilaCode](https://github.com/FlagAI-Open/FlagAI/tree/master/examples/Aquila/Aquila-code)                           | code generation | multi    | Aquila                                                                                                       | AquilaCode-multi is a multi-language model that supports high-accuracy code generation for various programming languages, including Python/C++/Java/Javascript/Go, etc.\u003cbr /\u003e It has achieved impressive results in HumanEval (Python) evaluation, with Pass@1, Pass@10, and Pass@100 scores of 26/45.7/71.6, respectively. In the HumanEval-X\u003cbr /\u003e multi-language code generation evaluation, it significantly outperforms other open-source models with similar parameters (as of July 19, 2023).\u003cbr /\u003eAquilaCode-py, on the other hand, is a single-language Python version of the model that focuses on Python code generation. \u003cbr /\u003eIt has also demonstrated excellent performance in HumanEval evaluation, with Pass@1, Pass@10, and Pass@100 scores of 28.8/50.6/76.9 (as of July 19, 2023). |\n| Meta                               | [CodeLLaMA](https://github.com/facebookresearch/codellama)                                                            | code generation | multi    | LLaMA-2                                                                                                      | a family of large language models for code based on[Llama 2](https://github.com/facebookresearch/llama) providing state-of-the-art performance among open models, infilling capabilities,\u003cbr /\u003e support for large input contexts, and zero-shot instruction following ability for programming tasks.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     |\n| UNSW, etc                          | [Darwin](https://github.com/MasterAI-EAM/Darwin)                                                                      | natural science | en       | LLaMA-7B                                                                                                     | the first open-source LLM for natural science, mainly in physics, chemistry and material science.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   |\n| alibaba                            | [EcomGPT](https://github.com/Alibaba-NLP/EcomGPT)                                                                     | e-commerce      | en/zh    | BLOOMZ                                                                                                       | An Instruction-tuned Large Language Model for E-commerce.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |\n| TIGER-AI-Lab                       | [MAmmoTH](https://github.com/TIGER-AI-Lab/MAmmoTH)                                                                    | math            | en       | LLaMA2/CodeLLaMA                                                                                             | a series of open-source large language models (LLMs) specifically tailored for general math problem-solving. The MAmmoTH models are trained on MathInstruct,\u003cbr /\u003e a meticulously curated instruction tuning dataset that is lightweight yet generalizable. MathInstruct is compiled from 13 math rationale datasets,\u003cbr /\u003e six of which are newly curated by this work. It uniquely focuses on the hybrid use of chain-of-thought (CoT) and program-of-thought (PoT) rationales,\u003cbr /\u003e and ensures extensive coverage of diverse mathematical fields.                                                                                                                                                                                                                                                |\n| SJTU                               | [abel](https://github.com/GAIR-NLP/abel)                                                                              | math            | en       | LLaMA2                                                                                                       | We propose**Parental Oversight*** , A ***Babysitting Strategy*** for Supervised Fine-tuning, `Parental Oversight` is not limited to any specific data processing method. Instead, it defines the data processing philosophy that should guide supervised fine-tuning in the era of Generative AI GAI).                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |\n| FDU                                | [DISC-LawLLM](https://github.com/FudanDISC/DISC-LawLLM)                                                               | legal           | zh       | Baichuan-13B                                                                                                 | FudanDISC has released DISC-LawLLM, a Chinese intelligent legal system driven by a large language model.\u003cbr /\u003e The system can provide various legal services for different user groups. In addition, DISC-Law-Eval is constructed to evaluate the large legal language model from both objective and subjective aspects.\u003cbr /\u003e The model has obvious advantages compared with the existing large legal models.\u003cbr /\u003eThe team also made available a high-quality Supervised fine-tuning (SFT) dataset of 300,000, DISC-Law-SFT.                                                                                                                                                                                                                                                                        |\n| HKU, etc                           | [ChatPsychiatrist](https://github.com/EmoCareAI/ChatPsychiatrist)                                                     | mental health   | en       | LLaMA-7B                                                                                                     | This repo open-sources the Instruct-tuned LLaMA-7B model that has been fine-tuned with counseling domian instruction data.\u003cbr /\u003e To construct our 8K size instruct-tuning dataset, we collected real-world counseling dialogue examples and employed GPT-4 as an extractor and filter.\u003cbr /\u003e In addition, we have introduced a comprehensive set of metrics, specifically tailored to the LLM+Counseling domain, by incorporating counseling domain evaluation criteria.\u003cbr /\u003e These metrics enable the assessment of performance in generating language content that involves multi-dimensional counseling skills.                                                                                                                                                                                   |\n| CAS                                | [StarWhisper](https://wisemodel.cn/models/LiYuYang/StarWhisper)                                                       | astronomical    | zh       | -                                                                                                            | StarWhisper, a large astronomical model, significantly improves the reasoning logic and integrity of the model through the fine-tuning of astrophysical corpus labeled by experts,\u003cbr /\u003e logical long text training, and direct preference optimization. In the CG-Eval jointly published by the Keguei AI Research Institute and LanguageX AI Lab, it reached the second place overall,\u003cbr /\u003e just below GPT-4, and its mathematical reasoning and astronomical capabilities are close to or exceed the GPT 3.5 Turbo.                                                                                                                                                                                                                                                                               |\n| ZhiPuAI                            | [FinGLM](https://github.com/MetaGLM/FinGLM)                                                                           | finance         | zh       | ChatGLM                                                                                                      | solutions of SMP2023-ELMFT(The Evaluation of Large Model of Finance Technology).                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      |\n| PKU, etc                           | [CodeShell](https://github.com/WisdomShell/codeshell)                                                                 | code generation | en/zh    | -                                                                                                            | CodeShell is a code large language model (LLM) developed jointly by the[Knowledge Computing Lab at Peking University](http://se.pku.edu.cn/kcl/) and the AI team of Sichuan Tianfu Bank. CodeShell has 7 billion parameters,\u003cbr /\u003e was trained on 500 billion tokens, and has a context window length of 8192. On authoritative code evaluation benchmarks (HumanEval and MBPP), CodeShell achieves the best performance for models of its scale.                                                                                                                                                                                                                                                                                                                                                        |\n| FDU                                | [DISC-FinLLM](https://github.com/FudanDISC/DISC-FinLLM)                                                               | finance         | zh       | Baichuan-13B-Chat                                                                                            | DISC-FinLLM is a large language model in the financial field. It is a multi-expert intelligent financial system composed of four modules for different financial scenarios: financial consulting,\u003cbr /\u003e financial text analysis, financial calculation, and financial knowledge retrieval and question answering.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     |\n| Deepseek                           | [Deepseek Coder](https://huggingface.co/deepseek-ai/deepseek-coder-33b-instruct)                                      | code generation | en/zh    | -                                                                                                            | Deepseek Coder comprises a series of code language models trained on both 87% code and 13% natural language in English and Chinese, with each model pre-trained on 2T tokens.\u003cbr /\u003eFor coding capabilities, Deepseek Coder achieves state-of-the-art performance among open-source code models on multiple programming languages and various benchmarks.                                                                                                                                                                                                                                                                                                                                                                                                                                              |\n| microsoft                          | [MathOctopus](https://github.com/microsoft/MathOctopus)                                                               | math            | multi    | LLaMA2                                                                                                       | This work pioneers exploring and building powerful Multilingual Math Reasoning (xMR) LLMs. To accomplish this, we make the following works:\u003cbr /\u003e1. **MGSM8KInstruct**, the first multilingual math reasoning instruction dataset, encompassing ten distinct languages, thus addressing the issue of training data scarcity in xMR tasks.\u003cbr /\u003e2. **MSVAMP**, an out-of-domain xMR test dataset, to conduct a more exhaustive and comprehensive evaluation of the model’s multilingual mathematical capabilities.\u003cbr /\u003e3. **MathOctopus**, our effective Multilingual Math Reasoning LLMs, training with different strategies, which notably outperform conventional open-source LLMs and ex","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FSunLemuria%2FOpenGPTAndBeyond","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FSunLemuria%2FOpenGPTAndBeyond","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FSunLemuria%2FOpenGPTAndBeyond/lists"}