{"id":13892134,"url":"https://github.com/pytorch/torchchat","last_synced_at":"2025-05-13T21:08:35.365Z","repository":{"id":249722991,"uuid":"776122617","full_name":"pytorch/torchchat","owner":"pytorch","description":"Run PyTorch LLMs locally on servers, desktop and mobile","archived":false,"fork":false,"pushed_at":"2025-05-06T23:06:58.000Z","size":9388,"stargazers_count":3579,"open_issues_count":93,"forks_count":250,"subscribers_count":33,"default_branch":"main","last_synced_at":"2025-05-06T23:29:57.528Z","etag":null,"topics":["llm","local","pytorch"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"bsd-3-clause","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/pytorch.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2024-03-22T18:15:54.000Z","updated_at":"2025-05-06T14:21:05.000Z","dependencies_parsed_at":"2024-07-29T01:53:48.657Z","dependency_job_id":"8f6ce1e2-224f-4b16-a220-0cdc9f36597c","html_url":"https://github.com/pytorch/torchchat","commit_stats":{"total_commits":901,"total_committers":63,"mean_commits":"14.301587301587302","dds":0.7380688124306326,"last_synced_commit":"de2507b63ed8af7410a30ea1982d1a41b5ae4271"},"previous_names":["pytorch/torchchat"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pytorch%2Ftorchchat","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pytorch%2Ftorchchat/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pytorch%2Ftorchchat/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pytorch%2Ftorchchat/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/pytorch","download_url":"https://codeload.github.com/pytorch/torchchat/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254028991,"owners_count":22002283,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["llm","local","pytorch"],"created_at":"2024-08-06T17:00:47.101Z","updated_at":"2025-05-13T21:08:30.321Z","avatar_url":"https://github.com/pytorch.png","language":"Python","readme":"# Chat with LLMs Everywhere\n\ntorchchat is a small codebase showcasing the ability to run large language models (LLMs) seamlessly. With torchchat, you can run LLMs using Python, within your own (C/C++) application (desktop or server) and on iOS and Android.\n\n\u003e [!IMPORTANT]\n\u003e Update\n\u003e\n\u003e **February 3, 2025**: torchchat has support for [**DeepSeek R1 Distill: 8B**]( https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-8B)!\n\u003e\n\u003e **September 25, 2024**: torchchat has multimodal support for **Llama3.2 11B**!\n\u003e\n\u003e To try it out, finish the [Installation](#Installation) section below, then hop\n\u003e over to our [multimodal guide](docs/multimodal.md) to learn more.\n\n\n## What can you do with torchchat?\n- [Run models via PyTorch / Python](#running-via-pytorch--python)\n  - [Chat](#chat)\n  - [Generate](#generate)\n  - [Run chat in the Browser](#browser)\n- [Run models on desktop/server without python](#desktopserver-execution)\n  - [Use AOT Inductor for faster execution](#aoti-aot-inductor)\n  - [Running in c++ using the runner](#run-using-our-c-runner)\n- [Run models on mobile](#mobile-execution)\n  - [Deploy and run on iOS](#deploy-and-run-on-ios)\n  - [Deploy and run on Android](#deploy-and-run-on-android)\n- [Evaluate a model](#eval)\n\n\n## Highlights\n\n- [[New!!] Multimodal Support for Llama 3.2 11B](docs/multimodal.md)\n- Command line interaction with popular LLMs such as Llama 3, Llama 2, Stories, Mistral and more\n- PyTorch-native execution with performance\n- Supports popular hardware and OS\n  - Linux (x86)\n  - Mac OS (M1/M2/M3)\n  - Android (Devices that support XNNPACK)\n  - iOS 17+ and 8+ Gb of RAM (iPhone 15 Pro+ or iPad with Apple Silicon)\n- Multiple data types including: float32, float16, bfloat16\n- Multiple quantization schemes\n- Multiple execution modes including: Python (Eager, Compile) or Native (AOT Inductor (AOTI), ExecuTorch)\n\n\n## Models\n\nThe following models are supported by torchchat and have associated\naliases.\n\n| Model | Mobile Friendly | Notes |\n|------------------|---|---------------------|\n|[meta-llama/Meta-Llama-3.2-3B-Instruct](https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct)|✅|Tuned for `chat`. Alias to `llama3.2-3b`.|\n|[meta-llama/Meta-Llama-3.2-3B](https://huggingface.co/meta-llama/Llama-3.2-3B)|✅|Best for `generate`. Alias to `llama3.2-3b-base`.|\n|[meta-llama/Llama-Guard-3-1B](https://huggingface.co/meta-llama/Llama-Guard-3-1B)|✅|Tuned for classification. Alias to `llama3-1b-guard`.|\n|[meta-llama/Meta-Llama-3.2-1B-Instruct](https://huggingface.co/meta-llama/Llama-3.2-1B-Instruct)|✅|Tuned for `chat`. Alias to `llama3.2-1b`.|\n|[meta-llama/Meta-Llama-3.2-1B](https://huggingface.co/meta-llama/Llama-3.2-1B)|✅|Best for `generate`. Alias to `llama3.2-1b-base`.|\n|[meta-llama/Llama-3.2-11B-Vision-Instruct](https://huggingface.co/meta-llama/Llama-3.2-11B-Vision-Instruct)||Multimodal (Image + Text). Tuned for `chat`. Alias to `llama3.2-11B`.|\n|[meta-llama/Llama-3.2-11B-Vision](https://huggingface.co/meta-llama/Llama-3.2-11B-Vision)||Multimodal (Image + Text). Tuned for `generate`. Alias to `llama3.2-11B-base`.|\n|[meta-llama/Meta-Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3.1-8B-Instruct)|✅|Tuned for `chat`. Alias to `llama3.1`.|\n|[meta-llama/Meta-Llama-3.1-8B](https://huggingface.co/meta-llama/Meta-Llama-3.1-8B)|✅|Best for `generate`. Alias to `llama3.1-base`.|\n|[meta-llama/Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct)|✅|Tuned for `chat`. Alias to `llama3`.|\n|[meta-llama/Meta-Llama-3-8B](https://huggingface.co/meta-llama/Meta-Llama-3-8B)|✅|Best for `generate`. Alias to `llama3-base`.|\n|[meta-llama/Llama-2-7b-chat-hf](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf)|✅|Tuned for `chat`. Alias to `llama2`.|\n|[meta-llama/Llama-2-13b-chat-hf](https://huggingface.co/meta-llama/Llama-2-13b-chat-hf)||Tuned for `chat`. Alias to `llama2-13b-chat`.|\n|[meta-llama/Llama-2-70b-chat-hf](https://huggingface.co/meta-llama/Llama-2-70b-chat-hf)||Tuned for `chat`. Alias to `llama2-70b-chat`.|\n|[meta-llama/Llama-2-7b-hf](https://huggingface.co/meta-llama/Llama-2-7b-hf)|✅|Best for `generate`. Alias to `llama2-base`.|\n|[meta-llama/CodeLlama-7b-Python-hf](https://huggingface.co/meta-llama/CodeLlama-7b-Python-hf)|✅|Tuned for Python and `generate`. Alias to `codellama`.|\n|[meta-llama/CodeLlama-34b-Python-hf](https://huggingface.co/meta-llama/CodeLlama-34b-Python-hf)|✅|Tuned for Python and `generate`. Alias to `codellama-34b`.|\n|[mistralai/Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1)|✅|Best for `generate`. Alias to `mistral-7b-v01-base`.|\n|[mistralai/Mistral-7B-Instruct-v0.1](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.1)|✅|Tuned for `chat`. Alias to `mistral-7b-v01-instruct`.|\n|[mistralai/Mistral-7B-Instruct-v0.2](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2)|✅|Tuned for `chat`. Alias to `mistral`.|\n|[tinyllamas/stories15M](https://huggingface.co/karpathy/tinyllamas/tree/main)|✅|Toy model for `generate`. Alias to `stories15M`.|\n|[tinyllamas/stories42M](https://huggingface.co/karpathy/tinyllamas/tree/main)|✅|Toy model for `generate`. Alias to `stories42M`.|\n|[tinyllamas/stories110M](https://huggingface.co/karpathy/tinyllamas/tree/main)|✅|Toy model for `generate`. Alias to `stories110M`.|\n|[openlm-research/open_llama_7b](https://huggingface.co/openlm-research/open_llama_7b)|✅|Best for `generate`. Alias to `open-llama`.|\n| [ibm-granite/granite-3b-code-instruct-128k](https://huggingface.co/ibm-granite/granite-3b-code-instruct-128k) |✅| Alias to `granite-code` and `granite-code-3b`.|\n| [ibm-granite/granite-8b-code-instruct-128k](https://huggingface.co/ibm-granite/granite-8b-code-instruct-128k) |✅| Alias to `granite-code-8b`.|\n| [ibm-granite/granite-3.0-2b-instruct](https://huggingface.co/ibm-granite/granite-3.0-2b-instruct) |✅| Alias to `granite3-2b` and `granite3`.|\n| [ibm-granite/granite-3.0-8b-instruct](https://huggingface.co/ibm-granite/granite-3.0-8b-instruct) |✅| Alias to `granite3-8b`.|\n| [ibm-granite/granite-3.1-2b-instruct](https://huggingface.co/ibm-granite/granite-3.1-2b-instruct) |✅| Alias to `granite3.1-2b` and `granite3.1`.|\n| [ibm-granite/granite-3.1-8b-instruct](https://huggingface.co/ibm-granite/granite-3.1-8b-instruct) |✅| Alias to `granite3.1-8b`.|\n| [deepseek-ai/DeepSeek-R1-Distill-Llama-8B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-8B) |✅| Alias to `deepseek-r1:8b`.|\n\n\n## Installation\nThe following steps require that you have [Python 3.10](https://www.python.org/downloads/release/python-3100/) installed.\n\n\u003e [!TIP]\n\u003e torchchat uses the latest changes from various PyTorch projects so it's highly recommended that you use a venv (by using the commands below) or CONDA.\n\n[skip default]: begin\n```bash\ngit clone https://github.com/pytorch/torchchat.git\ncd torchchat\npython3 -m venv .venv\nsource .venv/bin/activate\n./install/install_requirements.sh\nmkdir exportedModels\n```\n[skip default]: end\n\n[shell default]: mkdir exportedModels; ./install/install_requirements.sh\n\n## Commands\n\nThe interfaces of torchchat are leveraged through **Python Commands** and **Native Runners**. While the Python Commands are enumerable in the --help menu, the latter are explored in their respective sections.\n\n```bash\npython3 torchchat.py --help\n```\n\n[skip default]: begin\n\n```bash\n# Output\nusage: torchchat [-h] {chat,browser,generate,export,eval,download,list,remove,where,server} ...\n\npositional arguments:\n  {chat,browser,generate,export,eval,download,list,remove,where,server}\n                        The specific command to run\n    chat                Chat interactively with a model via the CLI\n    generate            Generate responses from a model given a prompt\n    browser             Chat interactively with a model in a locally hosted browser\n    export              Export a model artifact to AOT Inductor or ExecuTorch\n    download            Download model artifacts\n    list                List all supported models\n    remove              Remove downloaded model artifacts\n    where               Return directory containing downloaded model artifacts\n    server              [WIP] Starts a locally hosted REST server for model interaction\n    eval                Evaluate a model via lm-eval\n\noptions:\n  -h, --help            show this help message and exit\n```\n\n[skip default]: end\n\n__Python Inference__ (chat, generate, browser, server)\n* These commands represent different flavors of performing model inference in a Python enviroment.\n* Models are constructed either from CLI args or from loading exported artifacts.\n\n__Exporting__ (export)\n* This command generates model artifacts that are consumed by Python Inference or Native Runners.\n* More information is provided in the [AOT Inductor](https://github.com/pytorch/torchchat?tab=readme-ov-file#aoti-aot-inductor) and [ExecuTorch](https://github.com/pytorch/torchchat?tab=readme-ov-file#export-for-mobile) sections.\n\n__Inventory Management__ (download, list, remove, where)\n* These commands are used to manage and download models.\n* More information is provided in the [Download Weights](https://github.com/pytorch/torchchat?tab=readme-ov-file#download-weights) section.\n\n__Evaluation__ (eval)\n* This command test model fidelity via EleutherAI's [lm_evaluation_harness](https://github.com/EleutherAI/lm-evaluation-harness).\n* More information is provided in the [Evaluation](https://github.com/pytorch/torchchat?tab=readme-ov-file#eval) section.\n\n## Download Weights\nMost models use Hugging Face as the distribution channel, so you will need to create a Hugging Face account.\nCreate a Hugging Face user access token [as documented here](https://huggingface.co/docs/hub/en/security-tokens) with the `write` role.\n\nLog into Hugging Face:\n\n[prefix default]: HF_TOKEN=\"${SECRET_HF_TOKEN_PERIODIC}\"\n\n```\nhuggingface-cli login\n```\n\nTake a look at the available models:\n\n```bash\npython3 torchchat.py list\n```\n\nThen download one for testing (this README uses llama3.1)\n```\npython3 torchchat.py download llama3.1\n```\n\n\u003e [!NOTE]\n\u003e This command may prompt you to request access to Llama 3 via\n\u003e Hugging Face, if you do not already have access. Simply follow the\n\u003e prompts and re-run the command when access is granted.*\n\n\n\u003cdetails\u003e\n\u003csummary\u003eAdditional Model Inventory Management Commands\u003c/summary\u003e\n\n### Where\nThis subcommand shows the location of a particular model.\n```bash\npython3 torchchat.py where llama3.1\n```\nThis is useful in scripts when you do not want to hard-code paths\n\n\n### Remove\nThis subcommand removes the specified model\n```bash\npython3 torchchat.py remove llama3.1\n```\n\nMore information about these commands can be found by adding the `--help` option.\n\n\u003c/details\u003e\n\n\n## Running via PyTorch / Python\n\nThe simplest way to run a model in PyTorch is via [eager execution](https://pytorch.org/blog/optimizing-production-pytorch-performance-with-graph-transformations/).\nThis is the default execution mode for both PyTorch and torchchat. It performs inference\nwithout creating exporting artifacts or using a separate runner.\n\nThe model used for inference can also be configured and tailored to specific needs\n(compilation, quantization, etc.). See the [customization guide](docs/model_customization.md) for the options supported by torchchat.\n\n\u003e [!TIP]\n\u003e For more information about these commands, please refer to the `--help` menu.\n\n### Chat\nThis mode allows you to chat with an LLM in an interactive fashion.\n\n[skip default]: begin\n```bash\npython3 torchchat.py chat llama3.1\n```\n[skip default]: end\n\n### Generate\nThis mode generates text based on an input prompt.\n```bash\npython3 torchchat.py generate llama3.1 --prompt \"write me a story about a boy and his bear\"\n```\n\n\n### Server\nThis mode exposes a REST API for interacting with a model.\nThe server follows the [OpenAI API specification](https://platform.openai.com/docs/api-reference/chat) for chat completions.\n\nTo test out the REST API, **you'll need 2 terminals**: one to host the server, and one to send the request.\nIn one terminal, start the server\n\n[skip default]: begin\n\n```bash\npython3 torchchat.py server llama3.1\n```\n[skip default]: end\n\n\u003c!==\n[shell default]: python3 torchchat.py server llama3.1 \u0026 server_pid=$! ; sleep 90 # wait for server to be ready to accept requests\n--\u003e\n\nIn another terminal, query the server using `curl`. Depending on the model configuration, this query might take a few minutes to respond.\n\n\u003e [!NOTE]\n\u003e Since this feature is under active development, not every parameter is consumed. See api/api.py for details on\n\u003e which request parameters are implemented. If you encounter any issues, please comment on the [tracking Github issue](https://github.com/pytorch/torchchat/issues/973).\n\n\u003cdetails\u003e\n\u003csummary\u003eExample Query\u003c/summary\u003e\n\nSetting `stream` to \"true\" in the request emits a response in chunks. If `stream` is unset or not \"true\", then the client will await the full response from the server.\n\n**Example Input + Output**\n\n```\ncurl http://127.0.0.1:5000/v1/chat/completions \\\n  -H \"Content-Type: application/json\" \\\n  -d '{\n    \"model\": \"llama3.1\",\n    \"stream\": \"true\",\n    \"max_tokens\": 200,\n    \"messages\": [\n      {\n        \"role\": \"system\",\n        \"content\": \"You are a helpful assistant.\"\n      },\n      {\n        \"role\": \"user\",\n        \"content\": \"Hello!\"\n      }\n    ]\n  }'\n```\n[skip default]: begin\n```\n{\"response\":\" I'm a software developer with a passion for building innovative and user-friendly applications. I have experience in developing web and mobile applications using various technologies such as Java, Python, and JavaScript. I'm always looking for new challenges and opportunities to learn and grow as a developer.\\n\\nIn my free time, I enjoy reading books on computer science and programming, as well as experimenting with new technologies and techniques. I'm also interested in machine learning and artificial intelligence, and I'm always looking for ways to apply these concepts to real-world problems.\\n\\nI'm excited to be a part of the developer community and to have the opportunity to share my knowledge and experience with others. I'm always happy to help with any questions or problems you may have, and I'm looking forward to learning from you as well.\\n\\nThank you for visiting my profile! I hope you find my information helpful and interesting. If you have any questions or would like to discuss any topics, please feel free to reach out to me. I\"}\n```\n\n[skip default]: end\n\n\u003c!--\n[shell default]: kill ${server_pid}\n--\u003e\n\n\u003c/details\u003e\n\n### Browser\nThis command opens a basic browser interface for local chat by querying a local server.\n\nFirst, follow the steps in the Server section above to start a local server. Then, in another terminal, launch the interface. Running the following will open a tab in your browser.\n\n[skip default]: begin\n\n```\nstreamlit run torchchat/usages/browser.py\n```\n\n[skip default]: end\n\nUse the \"Max Response Tokens\" slider to limit the maximum number of tokens generated by the model for each response. Click the \"Reset Chat\" button to remove the message history and start a fresh chat.\n\n\n## Desktop/Server Execution\n\n### AOTI (AOT Inductor)\n[AOTI](https://pytorch.org/blog/pytorch2-2/) compiles models before execution\nfor faster inference. The process creates a zipped PT2 file containing all the\nartifacts generated by AOTInductor, and a\n[.so](https://en.wikipedia.org/wiki/Shared_library) file with the runnable\ncontents that is then loaded for inference. This can be done with both Python\nand C++ enviroments.\n\nThe following example exports and executes the Llama3.1 8B Instruct\nmodel.  The first command compiles and performs the actual export.\n\n```bash\npython3 torchchat.py export llama3.1 --output-aoti-package-path exportedModels/llama3_1_artifacts.pt2\n```\n\n\u003e [!NOTE]\n\u003e If your machine has cuda add this flag for performance\n`--quantize torchchat/quant_config/cuda.json` when exporting.\n\nFor more details on quantization and what settings to use for your use\ncase visit our [customization guide](docs/model_customization.md).\n\n### Run in a Python Environment\n\nTo run in a python enviroment, use the generate subcommand like before, but include the pt2 file.\n\n```bash\npython3 torchchat.py generate llama3.1 --aoti-package-path exportedModels/llama3_1_artifacts.pt2 --prompt \"Hello my name is\"\n```\n\n\n### Run using our C++ Runner\n\nTo run in a C++ enviroment, we need to build the runner binary.\n```bash\ntorchchat/utils/scripts/build_native.sh aoti\n```\n\nThen run the compiled executable, with the pt2.\n```bash\ncmake-out/aoti_run exportedModels/llama3_1_artifacts.pt2 -z `python3 torchchat.py where llama3.1`/tokenizer.model -i \"Once upon a time\"\n```\n\n## Mobile Execution\n\n[ExecuTorch](https://github.com/pytorch/executorch) enables you to optimize your model for execution on a\nmobile or embedded device.\n\n### Set Up ExecuTorch\n\nBefore running any commands in torchchat that require ExecuTorch, you\nmust first install ExecuTorch.\n\nTo install ExecuTorch, run the following commands.  This will download the\nExecuTorch repo to ./et-build/src and install various ExecuTorch libraries to\n./et-build/install.\n\n\u003e [!IMPORTANT]\n\u003e The following commands should be run from the torchchat root directory.\n\n```\nexport TORCHCHAT_ROOT=${PWD}\n./torchchat/utils/scripts/install_et.sh\n```\n\n\n### Export for mobile\nSimilar to AOTI, to deploy onto device, we first export the PTE artifact, then we load the artifact for inference.\n\nThe following example uses the Llama3.1 8B Instruct model.\n```\n# Export\npython3 torchchat.py export llama3.1 --quantize torchchat/quant_config/mobile.json --output-pte-path llama3.1.pte\n```\n\n\u003e [!NOTE]\n\u003e We use `--quantize torchchat/quant_config/mobile.json` to quantize the\nllama3.1 model to reduce model size and improve performance for\non-device use cases.\n\nFor more details on quantization and what settings to use for your use\ncase visit our [customization guide](docs/model_customization.md).\n\n### Deploy and run on Desktop\n\nWhile ExecuTorch does not focus on desktop inference, it is capable\nof doing so. This is handy for testing out PTE\nmodels without sending them to a physical device.\n\nSpecifically, there are 2 ways of doing so: Pure Python and via a Runner\n\n\u003cdetails\u003e\n\u003csummary\u003eDeploying via Python\u003c/summary\u003e\n\n```\n# Execute\npython3 torchchat.py generate llama3.1 --pte-path llama3.1.pte --prompt \"Hello my name is\"\n```\n\n\u003c/details\u003e\n\n\n\u003cdetails\u003e\n\u003csummary\u003eDeploying via the c++ Runner\u003c/summary\u003e\n\nBuild the runner\n```bash\ntorchchat/utils/scripts/build_native.sh et\n```\n\nExecute using the runner\n```bash\ncmake-out/et_run llama3.1.pte -z `python3 torchchat.py where llama3.1`/tokenizer.model -i \"Once upon a time\"\n```\n\n\u003c/details\u003e\n\n\n[end default]: end\n\n### Deploy and run on iOS\n\nThe following assumes you've completed the steps for [Setting up ExecuTorch](#set-up-executorch).\n\n\u003cdetails\u003e\n\u003csummary\u003eDeploying with Xcode\u003c/summary\u003e\n\n#### Requirements\n- [Xcode](https://apps.apple.com/us/app/xcode/id497799835?mt=12/) 15.0 or later\n- [Cmake](https://cmake.org/download/) 3.19 or later\n  - Download and open the macOS `.dmg` installer and move the Cmake app to `/Applications` folder.\n  - Install Cmake command line tools: `sudo /Applications/CMake.app/Contents/bin/cmake-gui --install`\n- A development provisioning profile with the [`increased-memory-limit`](https://developer.apple.com/documentation/bundleresources/entitlements/com_apple_developer_kernel_increased-memory-limit) entitlement.\n\n\n#### Steps\n\n1. Open the Xcode project:\n    ```bash\n    open et-build/src/executorch/examples/demo-apps/apple_ios/LLaMA/LLaMA.xcodeproj\n    ```\n    \n2. Click the Play button to launch the app in the Simulator.\n\n3. To run on a device, ensure you have it set up for development and a provisioning profile with the `increased-memory-limit` entitlement. Update the app's bundle identifier to match your provisioning profile with the required capability.\n\n4. After successfully launching the app, copy the exported ExecuTorch model (`.pte`) and tokenizer (`.model`) files to the iLLaMA folder. You can find the model file called `llama3.1.pte` in the current `torchchat` directory and the tokenizer file at `$(python3 torchchat.py where llama3.1)/tokenizer.model` path.\n\n    - **For the Simulator:** Drag and drop both files onto the Simulator window and save them in the `On My iPhone \u003e iLLaMA` folder.\n    - **For a device:** Open a separate Finder window, navigate to the Files tab, drag and drop both files into the iLLaMA folder, and wait for the copying to finish.\n\n5. Follow the app's UI guidelines to select the model and tokenizer files from the local filesystem and issue a prompt.\n\n*Click the image below to see it in action!*\n\n\u003cp align=\"center\"\u003e\n\u003ca href=\"https://pytorch.org/executorch/main/_static/img/llama_ios_app.mp4\"\u003e\n  \u003cimg src=\"https://pytorch.org/executorch/main/_static/img/llama_ios_app.png\" width=\"600\" alt=\"iOS app running a LlaMA model\"\u003e\n\u003c/a\u003e\n\u003c/p\u003e\n\u003c/details\u003e\n\n\n### Deploy and run on Android\n\nThe following assumes you've completed the steps for [Setting up ExecuTorch](#set-up-executorch).\n\n\u003cdetails\u003e\n\u003csummary\u003eApproach 1 (Recommended): Android Studio\u003c/summary\u003e\n\n#### Requirements\n- Android Studio\n- [Java 17](https://developer.android.com/build/jdks)\n- [Android SDK 34](https://developer.android.com/about/versions/14/setup-sdk)\n- [adb](https://developer.android.com/tools/adb)\n\n\n#### Steps\n\n1. Download the AAR file, which contains the Java library and corresponding JNI library, to build and run the app.\n\n   - [executorch.aar](https://ossci-android.s3.amazonaws.com/executorch/release/executorch-241002/executorch.aar) ([sha256sums](https://ossci-android.s3.amazonaws.com/executorch/release/executorch-241002/executorch.aar.sha256sums))\n\n2. Move the downloaded AAR file to `torchchat/edge/android/torchchat/app/libs/`. You may need to create directory `torchchat/edge/android/torchchat/app/libs/` if it does not exist.\n\n3. Push the model and tokenizer file to your device. You can find the model file called `llama3.1.pte` in the current `torchchat` directory and the tokenizer file at `$(python3 torchchat.py where llama3.1)/tokenizer.model` path.\n    ```\n    adb shell mkdir -p /data/local/tmp/llama\n    adb push \u003cmodel.pte\u003e /data/local/tmp/llama\n    adb push \u003ctokenizer.model or tokenizer.bin\u003e /data/local/tmp/llama\n    ```\n\n4. Use Android Studio to open the torchchat app skeleton, located at `torchchat/edge/android/torchchat`.\n\n5. Click the Play button (^R) to launch it to emulator/device.\n\n    - We recommend using a device with at least 12GB RAM and 20GB storage.\n    - If using an emulated device, refer to [this post](https://stackoverflow.com/questions/45517553/cant-change-the-ram-size-in-avd-manager-android-studio) on how to set the RAM.\n\n6. Follow the app's UI guidelines to pick the model and tokenizer files from the local filesystem. Then issue a prompt.\n\n**Note:** The AAR file listed in Step 1 has the tiktoken and sentensepiece tokenizer. To tweak or use a custom tokenizer and runtime, modify the ExecuTorch code\nand use [this script](https://github.com/pytorch/executorch/blob/main/build/build_android_llm_demo.sh) to build the AAR library.\n\n\u003cp align=\"center\"\u003e\n    \u003cimg src=\"https://pytorch.org/executorch/main/_static/img/chat.png\" width=\"600\" alt=\"Android app running a LlaMA model\"\u003e\n\u003c/p\u003e\n\n\n\n\u003c/details\u003e\n\u003cdetails\u003e\n\u003csummary\u003eApproach 2: E2E Script\u003c/summary\u003e\n\nAlternatively, you can run `torchchat/utils/scripts/android_example.sh` which sets up Java, Android SDK Manager, Android SDK, Android emulator (if no physical device is found), builds the app, and launches it for you. It can be used if you don't have a GUI.\n\n```\nexport TORCHCHAT_ROOT=$(pwd)\nsh torchchat/utils/scripts/android_example.sh\n```\n\n\u003c/details\u003e\n\n## Eval\n\n**Note: This feature is still a work in progress and not all features are working**\n\nUses the lm_eval library to evaluate model accuracy on a variety of\ntasks. Defaults to wikitext and can be manually controlled using the\ntasks and limit args. See [Evaluation](torchchat/utils/docs/evaluation.md)\n\n**Examples**\n\nEager mode:\n```\npython3 torchchat.py eval llama3.1 --dtype fp32 --limit 5\n```\n\nTo test the perplexity for a lowered or quantized model, pass it in\nthe same way you would to generate:\n\n```\npython3 torchchat.py eval llama3.1 --pte-path llama3.1.pte --limit 5\n```\n\n## Design Principles\n\ntorchchat embodies PyTorch’s design philosophy [details](https://pytorch.org/docs/stable/community/design.html), especially \"usability over everything else\".\n\n### Native PyTorch\n\ntorchchat is a native-PyTorch library. While we provide integrations with the surrounding ecosystem (eg: Hugging Face models, etc), all of the core functionality is written in PyTorch.\n\n### Simplicity and Extensibility\n\ntorchchat is designed to be easy to understand, use and extend.\n\n- Composition over implementation inheritance - layers of inheritance for code re-use makes the code hard to read and extend\n- No training frameworks - explicitly outlining the training logic makes it easy to extend for custom use cases\n- Code duplication is preferred over unnecessary abstractions\n- Modular building blocks over monolithic components\n\n### Correctness\n\ntorchchat provides well-tested components with a high-bar on correctness.\nWe provide\n\n- Extensive unit-tests to ensure things operate as they should\n\n## Community Contributions\n\nWe really value our community and the contributions made by our wonderful users! \n\nIf you'd like to help out, connect with us and other community members by joining our [Discord](https://discord.gg/hm2Keduk3v). Once you've joined, you can:\n* Head to the `#torchchat-general` channel for general questions, discussion, and community support.\n* Hop in the `#torchchat-contributors` channel if you're interested in contributing directly to project development.\n\nAlso give our [CONTRIBUTING](CONTRIBUTING.md) guide a read.\n\nLooking forward to discussing with you about torchchat future!\n\n## Troubleshooting\n\nA section of commonly encountered setup errors/exceptions. If this section doesn't contain your situation, check the GitHub [issues](https://github.com/pytorch/torchchat/issues)\n\n### Model Access\n\n**Access to model is restricted and you are not in the authorized list**\n\nSome models require an additional step to access. Follow the\nlink provided in the error to get access.\n\n### Installing ExecuTorch\n\n**Failed Building Wheel**\n\nIf `./torchchat/utils/scripts/install_et.sh` fails with an error like `Building wheel for executorch (pyproject.toml) did not run successfully` It's possible that it's linking to an older version of pytorch installed some other way like via homebrew. You can break the link by uninstalling other versions such as `brew uninstall pytorch` Note: You may break something that depends on this, so be aware.\n\n**CERTIFICATE_VERIFY_FAILED**\n\nRun `pip install --upgrade certifi`.\n\n## Filing Issues\n\nIf you encounter bugs or difficulty using torchchat, please file an GitHub [issue](https://github.com/pytorch/torchchat/issues).\n\nPlease include the exact command you ran and the output of that command.\nAlso, run this script and include the output saved to `system_info.txt` so that we can better debug your issue.\n\n```\n(echo \"Operating System Information\"; uname -a; echo \"\"; cat /etc/os-release; echo \"\"; echo \"Python Version\"; python --version || python3 --version; echo \"\"; echo \"PIP Version\"; pip --version || pip3 --version; echo \"\"; echo \"Installed Packages\"; pip freeze || pip3 freeze; echo \"\"; echo \"PyTorch Version\"; python -c \"import torch; print(torch.__version__)\" || python3 -c \"import torch; print(torch.__version__)\"; echo \"\"; echo \"Collection Complete\") \u003e system_info.txt\n```\n\n## Disclaimer\nThe torchchat Repository Content is provided without any guarantees\nabout performance or compatibility. In particular, torchchat makes\navailable model architectures written in Python for PyTorch that may\nnot perform in the same manner or meet the same standards as the\noriginal versions of those models. When using the torchchat Repository\nContent, including any model architectures, you are solely responsible\nfor determining the appropriateness of using or redistributing the\ntorchchat Repository Content and assume any risks associated with your\nuse of the torchchat Repository Content or any models, outputs, or\nresults, both alone and in combination with any other\ntechnologies. Additionally, you may have other legal obligations that\ngovern your use of other content, such as the terms of service for\nthird-party models, weights, data, or other technologies, and you are\nsolely responsible for complying with all such obligations.\n\n\n## Acknowledgements\nThank you to the community for all the\nawesome libraries and tools you've built around local LLM inference.\n\n* Georgi Gerganov and his [GGML](https://github.com/ggerganov/ggml)\n  project shining a spotlight on community-based enablement and\n  inspiring so many other projects.\n\n* Andrej Karpathy and his\n  [llama2.c](https://github.com/karpathy/llama2.c) project.  So many\n  great (and simple!) ideas in llama2.c that we have directly adopted\n  (both ideas and code) from his repo.  You can never go wrong by\n  following Andrej's work.\n\n* Michael Gschwind, Bert Maher, Scott Wolchok, Bin Bao, Chen Yang,\n  Huamin Li and Mu-Chu Li who built the first version of nanogpt (`DSOGPT`)\n  with AOT Inductor proving that AOTI can be used to build efficient\n  LLMs, and DSOs are a viable distribution format for models.\n  [nanoGPT](https://github.com/karpathy/nanoGPT).\n\n* Bert Maher and his\n  [llama2.so](https://github.com/bertmaher/llama2.so), which built on\n  Andrej's llama2.c and on DSOGPT to close the loop on Llama models\n  with AOTInductor.\n\n* Christian Puhrsch, Horace He, Joe Isaacson and many more for their\n  many contributions in Accelerating GenAI models in the *\"Anything,\n  Fast!\"* pytorch.org blogs, and, in particular, Horace He for [GPT,\n  Fast!](https://github.com/pytorch-labs/gpt-fast), which we have\n  directly adopted (both ideas and code) from his repo.\n\n\n## License\n\ntorchchat is released under the [BSD 3 license](LICENSE). (Additional\ncode in this distribution is covered by the MIT and Apache Open Source\nlicenses.) However, you may have other legal obligations that govern\nyour use of content, such as the terms of service for third-party\nmodels.\n","funding_links":[],"categories":["Python","NLP","Repos","A01_文本生成_文本对话","LLM Serving / Inference","Infrastructure / Deployment of LLMs on Device"],"sub_categories":["大语言对话模型及数据","Deployment Frameworks"],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpytorch%2Ftorchchat","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fpytorch%2Ftorchchat","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpytorch%2Ftorchchat/lists"}