{"id":13407464,"url":"https://github.com/togethercomputer/OpenChatKit","last_synced_at":"2025-03-14T12:31:01.787Z","repository":{"id":136931000,"uuid":"608892940","full_name":"togethercomputer/OpenChatKit","owner":"togethercomputer","description":null,"archived":false,"fork":false,"pushed_at":"2024-04-09T19:09:58.000Z","size":194,"stargazers_count":9002,"open_issues_count":83,"forks_count":1013,"subscribers_count":121,"default_branch":"main","last_synced_at":"2024-10-29T15:03:44.098Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/togethercomputer.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-03-03T00:12:53.000Z","updated_at":"2024-10-25T23:44:45.000Z","dependencies_parsed_at":null,"dependency_job_id":"0db49954-5e76-408d-9f02-730ea62cc694","html_url":"https://github.com/togethercomputer/OpenChatKit","commit_stats":{"total_commits":115,"total_committers":23,"mean_commits":5.0,"dds":0.8608695652173913,"last_synced_commit":"a7094aa583d4ac9ecbe700f0c5b11e6bb28cb454"},"previous_names":[],"tags_count":2,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/togethercomputer%2FOpenChatKit","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/togethercomputer%2FOpenChatKit/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/togethercomputer%2FOpenChatKit/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/togethercomputer%2FOpenChatKit/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/togethercomputer","download_url":"https://codeload.github.com/togethercomputer/OpenChatKit/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":243577759,"owners_count":20313697,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-07-30T20:00:40.828Z","updated_at":"2025-03-14T12:31:01.421Z","avatar_url":"https://github.com/togethercomputer.png","language":"Python","readme":"# OpenChatKit\n\nOpenChatKit provides a powerful, open-source base to create both specialized and general purpose models for various applications. The kit includes an instruction-tuned language models, a moderation model, and an extensible retrieval system for including up-to-date responses from custom repositories. OpenChatKit models were trained on the OIG-43M training dataset, which was a collaboration between [Together](https://www.together.xyz/), [LAION](https://laion.ai), and [Ontocord.ai](https://ontocord.ai). \n\nIn this repo, you'll find code for:\n- Training GPT-NeoXT-Chat-Base-20B, a 20B parameter chat model (see [docs/GPT-NeoXT-Chat-Base-20B.md](docs/GPT-NeoXT-Chat-Base-20B.md))\n- Fine-tuning Llama-2-7B-32K-beta, a 7B parameter long context model\n- Training Pythia-Chat-Base-7B, a 7B parameter chat model\n- Testing inference using either of the chat models\n- Augmenting the model with additional context from a retrieval index\n\n# Contents\n\n- [Getting Started](#getting-started)\n  * [Requirements](#requirements)\n  * [Chatting with Pythia-Chat-Base-7B](#chatting-with-pythia-chat-base-7b)\n- [Fine-tuning Llama-2-7B-32K-beta](#fine-tuning-llama-2-7b-32k-beta)\n  * [Downloading and converting the base model](#downloading-and-converting-the-base-model)\n  * [Fine-tuning the model](#fine-tuning-the-model)\n  * [Converting trained weights to Hugging Face format](#converting-trained-weights-to-hugging-face-format)\n- [Reproducing Pythia-Chat-Base-7B](#reproducing-pythia-chat-base-7b)\n  * [Downloading training data and the base model](#downloading-training-data-and-the-base-model)\n  * [(Optional) 8bit Adam](#optional-8bit-adam)\n  * [Training the model](#training-the-model)\n  * [Converting weights to Hugging Face format](#converting-weights-to-hugging-face-format)\n  * [Testing the new model](#testing-the-new-model)\n- [Monitoring](#monitoring)\n  * [Loguru](#loguru)\n  * [Weights \u0026 Biases](#weights--biases)\n- [Experimental: Retrieval-Augmented Models](#experimental-retrieval-augmented-models)\n- [See Also](#see-also)\n- [License](#license)\n- [Citing OpenChatKit](#citing-openchatkit)\n- [Acknowledgements](#acknowledgements)\n\n# Getting Started\n\nIn this tutorial, you will download Pythia-Chat-Base-7B, an instruction-tuned language model, and run some some inference requests against it using a command-line tool.\n\nPythia-Chat-Base-7B is a 7B-parameter fine-tuned variant of Pythia-6.9B-deduped from Eleuther AI. Pre-trained weights for this model are available on Hugging Face as [togethercomputer/Pythia-Chat-Base-7B](https://huggingface.co/togethercomputer/Pythia-Chat-Base-7B) under an Apache 2.0 license.\n\nMore details can be found on the model card for [Pythia-Chat-Base-7B](https://huggingface.co/togethercomputer/Pythia-Chat-Base-7B) on Hugging Face.\n\n## Requirements\n\nBefore you begin, you need to install PyTorch and other dependencies.\n\n1. Install [Miniconda](https://docs.conda.io/en/latest/miniconda.html) from their website.\n\n2. Install [Git LFS](https://git-lfs.com/) from their website.\n\n3. Install the `git lfs` hooks.\n\n```shell\ngit lfs install\n```\n\n4. Install mamba in the `base` environment so it's available in all environments.\n\n```shell\nconda install mamba -n base -c conda-forge\n```\n\n5. Create an environment called OpenChatKit using the `environment.yml` file at the root of this repo.\n\n\u003e **Note**\n\u003e Use `mamba` to create the environment. It's **much** faster than using `conda`.\n\n```shell\nmamba env create -f environment.yml \n```\n\n6. Activate the new conda environment.\n\n```shell\nconda activate OpenChatKit\n```\n\n## Chatting with Pythia-Chat-Base-7B\n\nTo help you try the model, [`inference/bot.py`](inference/bot.py) is a simple command-line test harness that provides a shell inferface enabling you to chat with the model. Simply enter text at the prompt and the model replies. The test harness also maintains conversation history to provide the model with context.\n\n\nStart the bot by calling `bot.py` from the root for the repo.\n\n```shell\npython inference/bot.py --model togethercomputer/Pythia-Chat-Base-7B\n```\n\nLoading the model can take some time, but once it's loaded, you are greeted with a prompt. Say hello.\n\n```shell\n$ python inference/bot.py \nLoading /home/csris/src/github.com/togethercomputer/OpenChatKit/inference/../huggingface_models/GPT-NeoXT-Chat-Base-20B to cuda:1...\nWelcome to OpenChatKit shell.   Type /help or /? to list commands.\n\n\u003e\u003e\u003e Hello.\nHello human.\n\n\u003e\u003e\u003e \n```\n\nEnter additional queries at the prompt, and the model replies. Under the covers, the shell is forming a prompt with all previous queries and passes that to the model to generate more text.\n\nThe shell also supports additional commands to inspect hyperparamters, the full prompt, and more. Commands are prefixed with a `/`.\n\n\u003e **Note**\n\u003e The `/quit` command exits the shell.\n\nPlease see [the inference README](inference/README.md) for more details about arguments, running on multiple/specific GPUs, and running on consumer hardware.\n\n# Fine-tuning Llama-2-7B-32K-beta\n\nLlama-2-7B-32K-beta model can be fine-tuned using various datasets. In this tutorial, we will use the multi-document natural questions dataset and BookSum dataset.\n\n## Downloading and converting the base model\n\nTo download model Llama-2-7B-32K-beta and prepare it for fine-tuning, run this command from the root of the repository.\n\n```shell\npython pretrained/Llama-2-7B-32K-beta/prepare.py\n```\n\nThe weights for this model will be in the `pretrained/Llama-2-7B-32K-beta/togethercomputer_Llama-2-7B-32K-beta` directory.\n\n\n## Fine-tuning the model\n\nThe `training/finetune_llama-2-7b-32k-mqa.sh` and `training/finetune_llama-2-7b-32k-booksum.sh` scripts configure and run the training loop.\n\n1. To fine-tune the multi-document natural questions dataset, run:\n   ```shell\n   bash training/finetune_llama-2-7b-32k-mqa.sh\n   ```\n\n2. To fine-tune the BookSum dataset, run:\n   ```shell\n   bash training/finetune_llama-2-7b-32k-booksum.sh\n   ```\n\nAs the training loop runs, checkpoints are saved to the `model_ckpts` directory at the root of the repo.\n\nPlease see [the training README](training/README.md) for more details about customizing the training run.\n\n## Converting trained weights to Hugging Face format\n\nBefore you can use this model to perform inference, it must be converted to the Hugging Face format. Run this command from the root of the repo to do so.\n\nFor example\n```shell\nmkdir huggingface_models \\\n  \u0026\u0026 python tools/convert_to_hf_llama.py \\\n       --config-name togethercomputer/Llama-2-7B-32K-beta \\\n       --ckpt-path model_ckpts/llama-2-7b-32k-mqa/checkpoint_10 \\\n       --save-path huggingface_models/llama-2-7b-32k-mqa \\\n       --n-stages 4 \\\n       --n-layer-per-stage 8 \\\n       --fp16\n```\nwhere the `--fp16` flag will load and store models in fp16.\n\nMake sure to replace model_ckpts/llama-2-7b-32k-mqa/checkpoint_10` with the latest checkpoint in the `model_ckpts/llama-2-7b-32k-mqa` or `model_ckpts/llama-2-7b-32k-booksum` directory.\n\n\n# Reproducing Pythia-Chat-Base-7B\n\nThis tutorial walks through reproducing the Pythia-Chat-Base-7B model by fine-tuning Eleuther AI's Pythia-6.9B-deduped model using the OIG dataset.\n\n## Downloading training data and the base model\n\nThe chat model was trained on the [OIG](https://huggingface.co/datasets/laion/OIG) dataset built by [LAION](https://laion.ai/), [Together](https://www.together.xyz/), and [Ontocord.ai](https://www.ontocord.ai/). To download the dataset from Hugging Face run the command below from the root of the repo.\n\n```shell\npython data/OIG/prepare.py\n```\n\u003e **Note** \n\u003e You can help make this chat model better by contributing data! See the [OpenDataHub](https://github.com/togethercomputer/OpenDataHub) repo for more details.\n\nOnce the command completes, the data will be in the `data/OIG/files` directory.\n\nPythia-Chat-Base-7B is a fine-tuned variant of Pythia-6.9B-deduped from Eleuther AI. To download the model and prepare it for fine tuning, run this command from the root of the repo.\n\n```shell\npython pretrained/Pythia-6.9B-deduped/prepare.py\n```\n\nThe weights for this model will be in the `pretrained/Pythia-6.9B-deduped/EleutherAI_pythia-6.9b-deduped` directory.\n\n## (Optional) 8bit Adam\n\nTo use 8bit-adam during training, install the `bitsandbytes` package.\n\n```shell\npip install bitsandbytes # optional, to use 8bit-adam\n```\n\n## Training the model\n\nThe `training/finetune_Pythia-Chat-Base-7B.sh` script configures and runs the training loop. After downloading the dataset and the base model, run:\n\n```shell\nbash training/finetune_Pythia-Chat-Base-7B.sh\n```\n\nAs the training loop runs, checkpoints are saved to the `model_ckpts` directory at the root of the repo.\n\nPlease see [the training README](training/README.md) for more details about customizing the training run.\n\n## Converting weights to Hugging Face format\n\nBefore you can use this model to perform inference, it must be converted to the Hugging Face format. Run this command from the root of the repo to do so.\n\n```shell\nmkdir huggingface_models \\\n  \u0026\u0026 python tools/convert_to_hf_gptneox.py \\\n       --config-name EleutherAI/pythia-6.9b-deduped \\\n       --ckpt-path model_ckpts/Pythia-Chat-Base-7B/checkpoint_100 \\\n       --save-path huggingface_models/Pythia-Chat-Base-7B \\\n       --n-stages 4 \\\n       --n-layer-per-stage 8 \\\n       --fp16\n```\nwhere the `--fp16` flag will load and store models in fp16.\n\nMake sure to replace `model_ckpts/Pythia-Chat-Base-7B/checkpoint_100` with the latest checkpoint in the `model_ckpts/Pythia-Chat-Base-7B` directory.\n\n## Testing the new model\n\nYou can use the OpenChatKit Shell test harness to chat with the new model. From the root of the repo, run\n\n```shell\npython inference/bot.py\n```\n\nBy default the script will load the model named Pythia-Chat-Base-7B under the `huggingface_models` directory, but you can override that behavior by specifying `--model`.\n\n```shell\npython inference/bot.py --model ./huggingface_models/GPT-NeoXT-Chat-Base-20B\n```\n\nOnce the model has loaded, enter text at the prompt and the model will reply.\n\n```shell\n$ python inference/bot.py \nLoading /home/csris/src/github.com/togethercomputer/OpenChatKit/inference/../huggingface_models/GPT-NeoXT-Chat-Base-20B to cuda:1...\nWelcome to OpenChatKit shell.   Type /help or /? to list commands.\n\n\u003e\u003e\u003e Hello.\nHello human.\n\n\u003e\u003e\u003e \n```\n\nThe shell also supports additional commands to inspect hyperparamters, the full prompt, and more. Commands are prefixed with a `/`.\n\n\u003e **Note**\n\u003e The `/quit` command exits the shell.\n\nPlease see [the inference README](inference/README.md) for more details about arguments, running on multiple/specific GPUs, and running on consumer hardware.\n\n# Monitoring\n\nBy default, the training script simply prints the loss as training proceeds, but it can also output metrics to a file using [loguru](https://github.com/Delgan/loguru) or report them to Weights \u0026 Biases.\n\n## Loguru\n\nAdd the flag `--train-log-backend loguru` to your training script to log to `./logs/file_{time}.log`\n\n## Weights \u0026 Biases\n\nTo use Weights \u0026 Biases, first login with your Weights \u0026 Biases token.\n\n```shell\nwandb login\n```\n\nAnd set `--train-log-backend wandb` in the training script to enable logging to Weights \u0026 Biases.\n\n# Experimental: Retrieval-Augmented Models\n\n\u003e **Warning**\n\u003e Retrieval support is experimental.\n\nThe code in `/retrieval` implements a python package for querying a Faiss index of Wikipedia. The following steps explain how to use this index to augment queries in the test harness with context from the retriever.\n\n1. Download the Wikipedia index.\n\n```shell\npython data/wikipedia-3sentence-level-retrieval-index/prepare.py\n```\n\n2. Run the bot with the `--retrieval` flag.\n\n```shell\npython inference/bot.py --retrieval\n```\n\nAfter starting, the bot will load both the chat model and the retrieval index, which takes a long time. Once the model and the index are loaded, all queries will be augmented with extra context.\n\n\n```shell\n$ python inference/bot.py --retrieval\nLoading /OpenChatKit/inference/../huggingface_models/GPT-NeoXT-Chat-Base-20B to cuda:0...\nLoading retrieval index...\nWelcome to OpenChatKit shell.   Type /help or /? to list commands.\n\n\u003e\u003e\u003e Where is Zurich?\nWhere is Zurich?\nZurich is located in Switzerland.\n\n\u003e\u003e\u003e\n```\n\n# See Also\n* [docs/GPT-NeoXT-Chat-Base-20B.md](docs/GPT-NeoXT-Chat-Base-20B.md). OpenChatKit also provides a larger, 20B parameter chat model that was trained on GPT-NeoXT-Chat-Base-20B from Eleuther AI.\n\n# License\n\nAll code in this repository was developed by Together Computer except where otherwise noted.  Copyright (c) 2023, Together Computer.  All rights reserved. The code is licensed under the Apache 2.0 license.\n\n\n```\nCopyright 2023 Together Computer\n\nLicensed under the Apache License, Version 2.0 (the \"License\");\nyou may not use this file except in compliance with the License.\nYou may obtain a copy of the License at\n\n   http://www.apache.org/licenses/LICENSE-2.0\n\nUnless required by applicable law or agreed to in writing, software\ndistributed under the License is distributed on an \"AS IS\" BASIS,\nWITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\nSee the License for the specific language governing permissions and\nlimitations under the License.\n```\n\nThis repository also contains code written by a number of other authors. Such contributions are marked and the relevant licensing is included where appropriate.\n\nFor full terms, see the LICENSE file. If you have any questions, comments, or concerns about licensing please [contact us](https://www.together.xyz/contact).\n\n# Citing OpenChatKit\n\n```bibtex\n@software{openchatkit,\n  title = {{OpenChatKit: An Open Toolkit and Base Model for Dialogue-style Applications}},\n  author = {Together Computer},\n  url = {https://github.com/togethercomputer/OpenChatKit}\n  month = {3},\n  year = {2023},\n  version = {0.15},\n}\n```\n\n# Acknowledgements\n\nOur models are fine-tuned versions of large language models trained by [Eleuther AI](https://www.eleuther.ai). We evaluated our model on [HELM](https://crfm.stanford.edu/helm/latest/) provided by the [Center for Research on Foundation Models](https://crfm.stanford.edu). And we collaborated with both [CRFM](https://crfm.stanford.edu) and [HazyResearch](http://hazyresearch.stanford.edu) at Stanford to build this model.\n\nWe collaborated with [LAION](https://laion.ai/) and [Ontocord.ai](https://www.ontocord.ai/) to build the training data used to fine tune this model.\n","funding_links":[],"categories":["Toolkits","[togethercomputer/OpenChatKit](https://github.com/togethercomputer/OpenChatKit)","Python","精选开源项目合集","Open LLM","LLM-List","Large Language Models (LLMs)","Venture Capitalists","A01_文本生成_文本对话","通用聊天机器人工具包","Repos","SDK, Libraries, Frameworks","twitter"],"sub_categories":["GPT开源平替机器人🔥🔥🔥","Instruction finetuned LLM","Instruction-finetuned-LLM","Contribute to our Repository","Competitors: AI ChatBot","大语言对话模型及数据","GPT开源平替机器人","Python library, sdk or frameworks"],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftogethercomputer%2FOpenChatKit","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ftogethercomputer%2FOpenChatKit","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftogethercomputer%2FOpenChatKit/lists"}