{"id":21058256,"url":"https://github.com/kuvaus/llamagptj-chat","last_synced_at":"2026-03-09T17:17:31.965Z","repository":{"id":158240929,"uuid":"633930094","full_name":"kuvaus/LlamaGPTJ-chat","owner":"kuvaus","description":"Simple chat program for LLaMa, GPT-J, and MPT models.","archived":false,"fork":false,"pushed_at":"2023-08-02T21:17:03.000Z","size":1091,"stargazers_count":220,"open_issues_count":6,"forks_count":47,"subscribers_count":7,"default_branch":"main","last_synced_at":"2025-04-09T20:14:27.485Z","etag":null,"topics":["ai","cli","cpp","gpt","gpt4all","gptj","llama","mpt"],"latest_commit_sha":null,"homepage":"","language":"C++","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/kuvaus.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-04-28T15:55:31.000Z","updated_at":"2025-03-03T07:03:19.000Z","dependencies_parsed_at":null,"dependency_job_id":"e61bd09c-db08-4650-9c6a-364b3e776011","html_url":"https://github.com/kuvaus/LlamaGPTJ-chat","commit_stats":{"total_commits":206,"total_committers":3,"mean_commits":68.66666666666667,"dds":"0.024271844660194164","last_synced_commit":"e022976f04609fd7b6d7da06dfa92abfcb549acc"},"previous_names":[],"tags_count":21,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kuvaus%2FLlamaGPTJ-chat","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kuvaus%2FLlamaGPTJ-chat/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kuvaus%2FLlamaGPTJ-chat/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kuvaus%2FLlamaGPTJ-chat/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/kuvaus","download_url":"https://codeload.github.com/kuvaus/LlamaGPTJ-chat/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248103872,"owners_count":21048245,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai","cli","cpp","gpt","gpt4all","gptj","llama","mpt"],"created_at":"2024-11-19T17:07:16.373Z","updated_at":"2026-03-09T17:17:31.897Z","avatar_url":"https://github.com/kuvaus.png","language":"C++","funding_links":[],"categories":[],"sub_categories":[],"readme":"[![CMake](https://github.com/kuvaus/LlamaGPTJ-chat/actions/workflows/cmake.yml/badge.svg)](https://github.com/kuvaus/LlamaGPTJ-chat/actions/workflows/cmake.yml)\n# LlamaGPTJ-chat\nSimple command line chat program for [GPT-J](https://en.wikipedia.org/wiki/GPT-J), [LLaMA](https://en.wikipedia.org/wiki/LLaMA) and [MPT](https://www.mosaicml.com/blog/mpt-7b) models written in C++. Based on [llama.cpp](https://github.com/ggerganov/llama.cpp) and uses [gpt4all-backend](https://github.com/nomic-ai/gpt4all) for full compatibility.\n\n\u003cimg alt=\"LlamaGPTJ-chat demo\" src=\"https://user-images.githubusercontent.com/22169537/234323778-64365dc9-8bd9-4a48-b7de-ec0280a5fb4e.gif\" width=\"600\" /\u003e\n\n\u003e **Warning**\n\u003e Very early progress, might have bugs\n\n# Table of contents\n\u003c!-- TOC --\u003e\n* [Installation](#installation)\n* [Usage](#usage)\n* [GPT-J, LLaMA, and MPT models](#gpt-j-llama-and-mpt-models)\n* [Detailed command list](#detailed-command-list)\n* [Useful features](#useful-features)\n* [License](#license)\n\u003c!-- TOC --\u003e\n\n## Installation\nSince the program is made using c++ it should build and run on most Linux, MacOS and Windows systems. The [Releases](https://github.com/kuvaus/LlamaGPTJ-chat/releases) link has ready-made binaries. AVX2 is faster and works on most newer computers. If you run the program, it will check and print if your computer has AVX2 support.\n\n### Download\n```sh\ngit clone --recurse-submodules https://github.com/kuvaus/LlamaGPTJ-chat\ncd LlamaGPTJ-chat\n```\nYou need to also download a model file, see [supported models](#gpt-j-llama-and-mpt-models) for details and links.\n\n### Build\nSince the program is made using c++ it should build and run on most Linux, MacOS and Windows systems. \nOn most systems, you only need this to build:\n```sh\nmkdir build\ncd build\ncmake ..\ncmake --build . --parallel\n```\n\u003e **Note**\n\u003e \n\u003e If you have an old processor, you can turn AVX2 instructions OFF in the build step with `-DAVX2=OFF` flag.\n\u003e \n\u003e If you have a new processor, you can turn AVX512 instructions ON in the build step with `-DAVX512=ON` flag.\n\u003e \n\u003e On old macOS, set `-DBUILD_UNIVERSAL=OFF` to make the build x86 only instead of the universal Intel/ARM64 binary.\n\u003e On really old macOS, set `-DOLD_MACOS=ON`. This disables `/save` and `/load` but compiles on old Xcode.\n\u003e \n\u003e On Windows you can now use Visual Studio (MSVC) or MinGW. If you want MinGW build instead, set `-G \"MinGW Makefiles\"`.\n\u003e\n\u003e On ARM64 Linux there are no ready-made binaries, but you can now build it from source.\n\n## Usage\n\nAfter compiling, the binary is located at:\n\n```sh\nbuild/bin/chat\n```\nBut you're free to move it anywhere. Simple command for 4 threads to get started:\n```sh\n./chat -m \"/path/to/modelfile/ggml-vicuna-13b-1.1-q4_2.bin\" -t 4\n```\nor\n```sh\n./chat -m \"/path/to/modelfile/ggml-gpt4all-j-v1.3-groovy.bin\" -t 4\n```\n\nHappy chatting!\n\n\n## GPT-J, LLaMA, and MPT models\nCurrent backend supports the GPT-J, LLaMA and MPT models.\n\n### GPT-J model\nYou need to download a GPT-J model first. Here are direct links to models:\n\n\u003e- The default version is **v1.0**: [ggml-gpt4all-j.bin](https://gpt4all.io/models/ggml-gpt4all-j.bin)\n\u003e- At the time of writing the newest is **1.3-groovy**: [ggml-gpt4all-j-v1.3-groovy.bin](https://gpt4all.io/models/ggml-gpt4all-j-v1.3-groovy.bin)\n\nThey're around 3.8 Gb each. The chat program stores the model in RAM on runtime so you need enough memory to run. You can get more details on GPT-J models from [gpt4all.io](https://gpt4all.io/) or [nomic-ai/gpt4all](https://github.com/nomic-ai/gpt4all) github.\n\n### LLaMA model\nAlternatively you need to download a LLaMA model first. The original weights are for research purposes and you can apply for access [here](https://ai.facebook.com/blog/large-language-model-llama-meta-ai/). Below are direct links to derived models:\n\n\u003e- Vicuna 7b **v1.1**: [ggml-vicuna-7b-1.1-q4_2.bin](https://gpt4all.io/models/ggml-vicuna-7b-1.1-q4_2.bin)\n\u003e- Vicuna 13b **v1.1**: [ggml-vicuna-13b-1.1-q4_2.bin](https://gpt4all.io/models/ggml-vicuna-13b-1.1-q4_2.bin)\n\u003e- GPT-4-All **l13b-snoozy**: [ggml-gpt4all-l13b-snoozy.bin](https://gpt4all.io/models/ggml-gpt4all-l13b-snoozy.bin)\n\nThe LLaMA models are quite large: the 7B parameter versions are around 4.2 Gb and 13B parameter 8.2 Gb each. The chat program stores the model in RAM on runtime so you need enough memory to run. You can get more details on LLaMA models from the [whitepaper](https://arxiv.org/abs/2302.13971) or META AI [website](https://ai.facebook.com/blog/large-language-model-llama-meta-ai/).\n\n### MPT model\nYou can also download and use an MPT model instead. Here are direct links to MPT-7B models:\n\u003e- MPT-7B base model pre-trained by Mosaic ML: [ggml-mpt-7b-base.bin](https://gpt4all.io/models/ggml-mpt-7b-base.bin)\n\u003e- MPT-7B instruct model trained by Mosaic ML: [ggml-mpt-7b-instruct.bin](https://gpt4all.io/models/ggml-mpt-7b-instruct.bin)\n\u003e- Non-commercial MPT-7B chat model  trained by Mosaic ML: [ggml-mpt-7b-chat.bin](https://gpt4all.io/models/ggml-mpt-7b-chat.bin)\n\nThey're around 4.9 Gb each. The chat program stores the model in RAM on runtime so you need enough memory to run. You can get more details on MPT models from MosaicML [website](https://www.mosaicml.com/blog/mpt-7b) or [mosaicml/llm-foundry](https://github.com/mosaicml/llm-foundry) github.\n\n## Detailed command list\nYou can view the help and full parameter list with:\n`\n./chat -h\n`\n\n```sh\nusage: ./bin/chat [options]\n\nA simple chat program for GPT-J, LLaMA, and MPT models.\nYou can set specific initial prompt with the -p flag.\nRuns default in interactive and continuous mode.\nType '/reset' to reset the chat context.\nType '/save','/load' to save network state into a binary file.\nType '/save NAME','/load NAME' to rename saves. Default: --save_name NAME.\nType '/help' to show this help dialog.\nType 'quit', 'exit' or, 'Ctrl+C' to quit.\n\noptions:\n  -h, --help            show this help message and exit\n  -v, --version         show version and license information\n  --run-once            disable continuous mode\n  --no-interactive      disable interactive mode altogether (uses given prompt only)\n  --no-animation        disable chat animation\n  --no-saves            disable '/save','/load' functionality\n  -s SEED, --seed SEED  RNG seed for --random-prompt (default: -1)\n  -t N, --threads    N  number of threads to use during computation (default: 4)\n  -p PROMPT, --prompt PROMPT\n                        prompt to start generation with (default: empty)\n  --random-prompt       start with a randomized prompt.\n  -n N, --n_predict  N  number of tokens to predict (default: 200)\n  --top_k            N  top-k sampling (default: 40)\n  --top_p            N  top-p sampling (default: 0.9)\n  --temp             N  temperature (default: 0.9)\n  --n_ctx            N  number of tokens in context window (default: 0)\n  -b N, --batch_size N  batch size for prompt processing (default: 20)\n  --repeat_penalty   N  repeat_penalty (default: 1.1)\n  --repeat_last_n    N  last n tokens to penalize  (default: 64)\n  --context_erase    N  percent of context to erase  (default: 0.8)\n  --b_token             optional beginning wrap token for response (default: empty)\n  --e_token             optional end wrap token for response (default: empty)\n  -j,   --load_json FNAME\n                        load options instead from json at FNAME (default: empty/no)\n  --load_template   FNAME\n                        load prompt template from a txt file at FNAME (default: empty/no)\n  --save_log        FNAME\n                        save chat log to a file at FNAME (default: empty/no)\n  --load_log        FNAME\n                        load chat log from a file at FNAME (default: empty/no)\n  --save_dir        DIR\n                        directory for saves (default: ./saves)\n  --save_name       NAME\n                        save/load model state binary at save_dir/NAME.bin (current: model_state)\n                        context is saved to save_dir/NAME.ctx (current: model_state)\n  -m FNAME, --model FNAME\n                        model path (current: ./models/ggml-vicuna-13b-1.1-q4_2.bin)\n```\n## Useful features\nHere are some handy features and details on how to achieve them using command line options.\n\n### Save/load chat log and read output from other apps\nBy default, the program prints the chat to standard (stdout) output, so if you're including the program into your app, it only needs to read stdout. You can also save the whole chat log to a text file with `--save_log` option. There's an elementary way to remember your past conversation by simply loading the saved chat log with `--load_log` option when you start a new session.\n\n### Run the program once without user interaction\nIf you only need the program to run once without any user interactions, one way is to set prompt with `-p \"prompt\"` and using `--no-interactive` and `--no-animation` flags. The program will read the prompt, print the answer, and close.\n\n### Add AI personalities and characters\nIf you want a personality for your AI, you can change `prompt_template_sample.txt` and use `--load_template` to load the modified file. The only constant is that your input during chat will be put on the `%1` line. Instructions, prompt, response, and everything else can be replaced any way you want. Having different `personality_template.txt` files is an easy way to add different AI characters. With _some_ models, giving both AI and user names instead of `Prompt:` and `Response:`, can make the conversation flow more naturally as the AI tries to mimic a conversation between two people.\n\n### Ability to reset chat context\nYou can reset the chat at any time during chatting by typing `/reset` in the input field. This will clear the AI's memory of past conversations, logits, and tokens. You can then start the chat from a blank slate without having to reload the whole model again.\n\n### Load all parameters using JSON\nYou can also fetch parameters from a json file with `--load_json \"/path/to/file.json\"` flag. Different models might perform better or worse with different input parameters so using json files is a handy way to store and load all the settings at once. The JSON file loader is designed to be simple in order to prevent any external dependencies, and as a result, the JSON file must follow a specific format. Here is a simple example:\n\n```javascript\n{\"top_p\": 1.0, \"top_k\": 50400, \"temp\": 0.9, \"n_batch\": 9}\n```\nThis is useful when you want to store different temperature and sampling settings.\n\nAnd a more detailed one:\n```javascript\n{\n\"top_p\": 1.0,\n\"top_k\": 50400,\n\"temp\": 0.9,\n\"n_batch\": 20,\n\"threads\": 12,\n\"prompt\": \"Once upon a time\",\n\"load_template\": \"/path/to/prompt_template_sample.txt\",\n\"model\": \"/path/to/ggml-gpt4all-j-v1.3-groovy.bin\",\n\"no-interactive\": \"true\"\n}\n```\nThis one loads the prompt from the json, uses a specific template, and runs the program once in no-interactive mode so user does not have to press any input.\n\n## License\n\nThis project is licensed under the MIT [License](https://github.com/kuvaus/LlamaGPTJ-chat/blob/main/LICENSE)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkuvaus%2Fllamagptj-chat","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fkuvaus%2Fllamagptj-chat","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkuvaus%2Fllamagptj-chat/lists"}