{"id":18885540,"url":"https://github.com/togethercomputer/llama-2-7b-32k-instruct","last_synced_at":"2025-10-24T07:41:34.870Z","repository":{"id":189183834,"uuid":"678566424","full_name":"togethercomputer/Llama-2-7B-32K-Instruct","owner":"togethercomputer","description":null,"archived":false,"fork":false,"pushed_at":"2023-08-18T15:59:11.000Z","size":1315,"stargazers_count":77,"open_issues_count":3,"forks_count":5,"subscribers_count":4,"default_branch":"main","last_synced_at":"2024-03-15T22:20:59.709Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/togethercomputer.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2023-08-14T21:18:51.000Z","updated_at":"2024-03-04T15:51:47.000Z","dependencies_parsed_at":"2023-08-18T17:25:32.960Z","dependency_job_id":null,"html_url":"https://github.com/togethercomputer/Llama-2-7B-32K-Instruct","commit_stats":null,"previous_names":["togethercomputer/llama-2-7b-32k-instruct"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/togethercomputer%2FLlama-2-7B-32K-Instruct","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/togethercomputer%2FLlama-2-7B-32K-Instruct/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/togethercomputer%2FLlama-2-7B-32K-Instruct/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/togethercomputer%2FLlama-2-7B-32K-Instruct/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/togethercomputer","download_url":"https://codeload.github.com/togethercomputer/Llama-2-7B-32K-Instruct/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":223645352,"owners_count":17178913,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-08T07:19:39.536Z","updated_at":"2025-10-24T07:41:29.839Z","avatar_url":"https://github.com/togethercomputer.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Building Llama-2-7B-32K-Instruct Using Together API\n\nIn our [blog post](https://together.ai/blog/llama-2-7b-32k-instruct), we released the [Llama-2-7B-32K-Instruct](https://huggingface.co/togethercomputer/Llama-2-7B-32K-Instruct) model finetuned using [Together API](https://together.ai/blog/api-announcement). \nIn this repo, we share the complete recipe. We encourage you to try out [Together API](https://together.ai/blog/api-announcement) and give us feedbacks! The fine-tuning process is carried out in four simple steps: `Distill`, `Train`, `Test` and `Deploy`.\n\n## (Step I) - Distill\n\nLlama-2-7B-32K-Instruct is fine-tuned over a combination of two data sources:\n\n1. **19K single- and multi-round conversations generated by human instructions and [Llama-2-70B-Chat](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf) outputs**.\n   We collected the dataset following the distillation paradigm that is used by [Alpaca](https://crfm.stanford.edu/2023/03/13/alpaca.html), [Vicuna](https://lmsys.org/blog/2023-03-30-vicuna/), [WizardLM](https://arxiv.org/abs/2304.12244) and [Orca](https://www.microsoft.com/en-us/research/publication/orca-progressive-learning-from-complex-explanation-traces-of-gpt-4/) — producing instructions by querying a powerful LLM (in this case, [Llama-2-70B-Chat](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf)).\n   The complete dataset is also released [here](https://huggingface.co/datasets/togethercomputer/llama-instruct).\n   \n2. **Long-context Summarization and Long-context QA**.\n      We follow the recipe of [Llama-2-7B-32K](https://together.ai/blog/Llama-2-7B-32K), and train our model with the [BookSum dataset](https://huggingface.co/datasets/togethercomputer/Long-Data-Collections) and [Multi-document Question Answering](https://arxiv.org/abs/2307.03172) (MQA).\n\nThe final data mixture used for model finetuning is: 19K instruction (50%) + BookSum (25%) + MQA (25%).\n\nTo gather the instruction data from [Llama-2-70B-Chat](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf), we first use the [Together API](https://together.ai/blog/api-announcement) to query the model. Given an instruction such as\n```\ninstruction = \"Create a table about national parks in the US\"\n```\nwe can query the inference API using:\n```\nres = requests.post(endpoint, json={\n    \"model\": \"togethercomputer/llama-2-70b-chat\",\n    \"max_tokens\": 1024,\n    \"prompt\": f\"[INST] {instruction} [/INST]\",\n    \"request_type\": \"language-model-inference\",\n    \"temperature\": 0.7,\n    \"top_p\": 0.7,\n    \"top_k\": 50,\n    \"repetition_penalty\": 1,\n    \"stop\": [\n        \"[INST]\"\n    ],\n    \"safety_model\": \"\",\n    \"repetitive_penalty\": 1\n}, headers={\n    \"Authorization\": \"Bearer \" + \u003c Your API KEY here\u003e,\n})\n```\nwhere `\"[INST] {instruction} [/INST]\"` is the prompt format of LLaMA-2-70B-chat. We then parse out the result with\n```\nresponse = res.json()[\"output\"][\"choices\"][0][\"text\"]\n```\nIn this case, it returns\n\n---\n\nSure, here's a table about national parks in the US:\n\n| National Park | Location | Established | Area (acres) | Notable Features |\n| --- | --- | --- | --- | --- |\n| Yellowstone | Wyoming, Montana, Idaho | 1872 | 2,219,790 | Geysers, hot springs, wildlife |\n| Grand Canyon | Arizona | 1919 | 1,218,375 | Colorado River, canyon, scenic views |\n| Yosemite | California | 1890 | 747,956 | Granite cliffs, waterfalls, giant sequoias |\n| Zion | Utah | 1919 | 146,597 | Canyons, sandstone cliffs, unique rock formations |\n| Great Smoky Mountains | North Carolina, Tennessee | 1926 | 522,426 | Mountains, forests, waterfalls, wildlife |\n| Rocky Mountain | Colorado | 1915 | 265,795 | Mountains, alpine lakes, glaciers, wildlife |\n| Acadia | Maine | 1916 | 33,255 | Coastline, mountains, forests, wildlife |\n| Olympic | Washington | 1938 | 922,650 | Rainforests, mountains, coastline, wildlife |\n| Grand Teton | Wyoming | 1929 | 310,044 | Mountains, glaciers, lakes, wildlife |\n| Great Basin | Nevada | 1986 | 5,000 | 5,000-year-old bristlecone pine trees, limestone caverns |\n| Arches | Utah | 1971 | 7,200 | Over 2,000 natural arches, sandstone formations |\n| Bryce Canyon | Utah | 1928 | 35,835 | Hoodoos, amphitheater-shaped park, scenic views |\n| Canyonlands | Utah | 1964 | 337,598 | Canyons, mesas, buttes, desert landscapes |\n| Death Valley | California, Nevada | 1994 | 3,373,063 | Badwater Basin, salt flats, sand dunes, unique geology |\n| Denali | Alaska | 1917 | 4,740,911 | Mount Denali, glaciers, wildlife, dog sledding |\n| Everglades | Florida | 1935 | 1,508,537 | Mangrove forests, sawgrass marshes, diverse wildlife |\n| Glacier | Montana | 1910 | 1,012,837 | Glaciers, alpine lakes, mountains, wildlife |\n| Glacier Bay | Alaska | 1925 | 3,223,373 | Fjords, glaciers, mountains, wildlife |\n\nNote: This table lists some of the most well-known national parks in the US, but there are many others that are also worth visiting. The area of each park is approximate and may vary slightly depending on the source.\n\n---\n\n\nTo build [Llama-2-7B-32K-Instruct](https://huggingface.co/togethercomputer/Llama-2-7B-32K-Instruct), we collect instructions from 19K human inputs extracted from [ShareGPT-90K](https://huggingface.co/datasets/philschmid/sharegpt-raw) (only using human inputs, not ChatGPT outputs). The actual script handles multi-turn conversations and also supports restarting and caching via a SQLite3 database. You can find the full script [here](https://github.com/togethercomputer/Llama-2-7B-32K-Instruct/blob/main/scripts/distill.py), with merely 122 lines!\n\n\nThe output of this step is a jsonl file, each line corresponding to one conversation:\n```\n{\"text\": \"[INST] ... instruction ... [/INST] ... answer ... [INST] ... instruction ... [/INST] ...\"}\n{\"text\": \"[INST] ... instruction ... [/INST] ... answer ... [INST] ... instruction ... [/INST] ...\"}\n{\"text\": \"[INST] ... instruction ... [/INST] ... answer ... [INST] ... instruction ... [/INST] ...\"}\n```\n\nFinally, we perform a stratified sampling over three data sources with ratios: 19K instruction (50%) + BookSum (25%) + MQA (25%), and concatenate the dataset to a single `instructions.jsonl`.\n\n## (Step II) - Train\n\nThe second step is to fine-tune the [Llama-2-7B-32K](https://huggingface.co/togethercomputer/LLaMA-2-7B-32K) model using the instruction data we just collected.\nFirst, upload the dataset using [Together API](https://together.ai/blog/api-announcement). Suppose the instruction data is stored in `instructions.jsonl`, with the following command\n```\n$ together files upload instructions.jsonl\n```\nit will respond with\n```\nUploading instructions.jsonl: 100%|████████| 99.1M/99.1M [00:29\u003c00:00, 3.53MB/s]\n{\n    \"filename\": \"instructions.jsonl\",\n    \"id\": \"file-cab9fb70-b6de-40de-a298-d06369b14ed8\",\n    \"object\": \"file\"\n}\n```\n\nwhich suggests that the dataset is now uploaded to Together cloud and is made available to fine-tuning jobs. We can then start a fine-tuning job using the file ID:\n\n```\n$ together finetune create --training-file file-cab9fb70-b6de-40de-a298-d06369b14ed8 --model togethercomputer/RedPajama-INCITE-7B-Base\n```\nThis basically means we are creating a fine-tuning job with training file `file-cab9fb70-b6de-40de-a298-d06369b14ed8` (which we just uploaded) over model `togethercomputer/RedPajama-INCITE-7B-Base`.\nThen the command line will respond\n```\n{\n    \"training_file\": \"file-cab9fb70-b6de-40de-a298-d06369b14ed8\",\n    \"model_output_name\": \"zhangcetogether/togethercomputer/RedPajama-INCITE-7B-Base\",\n    \"model_output_path\": \"s3://together-dev/finetune/640cdeb14bfebd1af934bfc5/zhangcetogether/togethercomputer/RedPajama-INCITE-7B-Base/ft-6bc80cf4-e991-4c77-9f47-ef02b8d1bfeb\",\n    \"Suffix\": \"\",\n    \"model\": \"togethercomputer/RedPajama-INCITE-7B-Base\",\n    \"n_epochs\": 4,\n    \"batch_size\": 32,\n    \"learning_rate\": 1e-05,\n    \"user_id\": \"640cdeb14bfebd1af934bfc5\",\n    \"created_at\": 1691431547,\n    \"updated_at\": 1691431547,\n    \"status\": \"pending\",\n    \"owner_address\": \"0xac3f8206287997c39a338f0ec31aa417225dbf0b\",\n    \"id\": \"ft-6bc80cf4-e991-4c77-9f47-ef02b8d1bfeb\",\n    \"job_id\": \"\",\n    \"token_count\": 0,\n    \"param_count\": 0,\n    \"total_price\": 0,\n    \"epochs_completed\": 0,\n    \"events\": [\n        {\n            \"object\": \"fine-tune-event\",\n            \"created_at\": 1691431547,\n            \"level\": \"\",\n            \"message\": \"Fine tune request created\",\n            \"type\": \"JOB_PENDING\",\n            \"param_count\": 0,\n            \"token_count\": 0,\n            \"checkpoint_path\": \"\",\n            \"model_path\": \"\"\n        }\n    ],\n    \"queue_depth\": 0,\n    \"wandb_project_name\": \"\"\n}\n```\nsuggesting the fine-tuning job is now submitted successfully and is now running.\nYou can track the progress of a fine-tuning job on the [Jobs Page](https://api.together.xyz/playground/finetuning) of the [Together API](https://together.ai/blog/api-announcement) platform. You see all of your logs and download checkpoints -- try it!\n\n## (Step III) - Test\n\nWhen a fine-tuning job finishes, your fine-tuned model will automatically show up in the [Models page](https://api.together.xyz/playground) on the platform. \n\n\u003cp align=\"center\"\u003e\n\u003cimg src=\"docs/assets/images/model_showup_demo.png\" width=\"500\"\u003e\n\u003c/p\u003e\n\n\nClick the `play` button to start an instance, and begin testing your model in the [Together Playgrounds](http://api.together.ai) like this:\n\n\u003cimg src=\"docs/assets/images/test_demo.png\" width=\"1000\"\u003e\n\n## (Step IV) - Deploy\n\nNow that you’ve tested the model in our Playgrounds, you can integrate the model into your end application! Query the model using the fine-tuning API. Simply click “\u003c\u003e” in the Playground to see examples of how to query it via the API. \n\n\u003cimg src=\"docs/assets/images/api_query_demo.png\" width=\"1000\"\u003e\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftogethercomputer%2Fllama-2-7b-32k-instruct","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ftogethercomputer%2Fllama-2-7b-32k-instruct","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftogethercomputer%2Fllama-2-7b-32k-instruct/lists"}