{"id":14964526,"url":"https://github.com/ariya/query-llm","last_synced_at":"2025-06-20T06:33:55.178Z","repository":{"id":246055412,"uuid":"818490139","full_name":"ariya/query-llm","owner":"ariya","description":"Query LLM with Chain-of-Tought","archived":false,"fork":false,"pushed_at":"2025-05-23T04:46:42.000Z","size":99,"stargazers_count":13,"open_issues_count":0,"forks_count":4,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-06-09T03:35:41.585Z","etag":null,"topics":["cerebras","chain-of-thought","gemini","groq","llama","llm","lmstudio","localai","mistral","ollama","openai","openrouter"],"latest_commit_sha":null,"homepage":"","language":"JavaScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ariya.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2024-06-22T01:41:36.000Z","updated_at":"2025-05-23T04:46:46.000Z","dependencies_parsed_at":"2024-07-27T20:06:14.854Z","dependency_job_id":"f7650f74-610a-4a6e-a4b9-2030d83cdb5a","html_url":"https://github.com/ariya/query-llm","commit_stats":{"total_commits":88,"total_committers":1,"mean_commits":88.0,"dds":0.0,"last_synced_commit":"289ce1806b574b8eebab310f989dcd30aaaccfd8"},"previous_names":["ariya/query-llm"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/ariya/query-llm","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ariya%2Fquery-llm","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ariya%2Fquery-llm/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ariya%2Fquery-llm/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ariya%2Fquery-llm/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ariya","download_url":"https://codeload.github.com/ariya/query-llm/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ariya%2Fquery-llm/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":259546412,"owners_count":22874560,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cerebras","chain-of-thought","gemini","groq","llama","llm","lmstudio","localai","mistral","ollama","openai","openrouter"],"created_at":"2024-09-24T13:33:18.710Z","updated_at":"2025-06-20T06:33:50.143Z","avatar_url":"https://github.com/ariya.png","language":"JavaScript","readme":"# Query LLM\n\n**Query LLM** is a simple, zero-dependency CLI tool for querying large language models (LLMs). It works seamlessly with both cloud-based LLM services (e.g., [OpenAI GPT](https://platform.openai.com/docs), [Groq](https://groq.com), [OpenRouter](https://openrouter.ai)) and locally hosted LLMs (e.g. [llama.cpp](https://github.com/ggerganov/llama.cpp), [LM Studio](https://lmstudio.ai), [Ollama](https://ollama.com)). Internally, it guides the LLM to perform step-by-step reasoning using the [Chain of Thought method](https://www.promptingguide.ai/techniques/cot).\n\nTo run Query LLM, ensure that [Node.js](https://nodejs.org) (v18 or higher) or [Bun](https://bun.sh) is installed.\n\n```bash\n./query-llm.js\n```\n\nTo obtain quick responses, pipe a question directly:\n```bash\necho \"Top travel destinations in Indonesia?\" | ./query-llm.js\n```\n\nFor specific tasks:\n```bash\necho \"Translate 'thank you' into German\" | ./query-llm.js\n```\n\nFor simpler interactions with LLMs using zero-shot prompting, refer to the sister project, [ask-llm](https://github.com/ariya/ask-llm).\n\n## Using Local LLM Servers\n\nSupported local LLM servers include [llama.cpp](https://github.com/ggerganov/llama.cpp), [Jan](https://jan.ai), [Ollama](https://ollama.com), [Cortex](https://cortex.so), [LocalAI](https://localai.io), [LM Studio](https://lmstudio.ai), and [Msty](https://msty.app).\n\nTo utilize [llama.cpp](https://github.com/ggerganov/llama.cpp) locally with its inference engine, load a quantized model like [Llama-3.2 3B](https://huggingface.co/bartowski/Llama-3.2-3B-Instruct-GGUF) or [Phi-3.5 Mini](https://huggingface.co/bartowski/Phi-3.5-mini-instruct-GGUF). Then set the `LLM_API_BASE_URL` environment variable:\n```bash\n/path/to/llama-server -m Llama-3.2-3B-Instruct-Q4_K_M.gguf\nexport LLM_API_BASE_URL=http://127.0.0.1:8080/v1\n```\n\nTo use [Jan](https://jan.ai) with its local API server, refer to [its documentation](https://jan.ai/docs/local-api). Load a model like [Llama-3.2 3B](https://huggingface.co/bartowski/Llama-3.2-3B-Instruct-GGUF) or [Phi-3.5 Mini](https://huggingface.co/bartowski/Phi-3.5-mini-instruct-GGUF), and set the following environment variables:\n```bash\nexport LLM_API_BASE_URL=http://127.0.0.1:1337/v1\nexport LLM_CHAT_MODEL='llama3-8b-instruct'\n```\n\nTo use [Ollama](https://ollama.com) locally, load a model and configure the environment variable `LLM_API_BASE_URL`:\n```bash\nollama pull llama3.2\nexport LLM_API_BASE_URL=http://127.0.0.1:11434/v1\nexport LLM_CHAT_MODEL='llama3.2'\n```\n\nTo use [Cortex](https://cortex.so) local inference, pull a model (such as `llama3.2` or `phi-3.5`, among [many others](https://cortex.so/models/)) and ensure that its API server is running, and then configure these environment variables:\n```bash\nexport LLM_API_BASE_URL=http://localhost:39281/v1\nexport LLM_CHAT_MODEL='llama3.2:3b-gguf-q4-km'\n```\n\nFor [LocalAI](https://localai.io), initiate its container and adjust the environment variable `LLM_API_BASE_URL`:\n```bash\ndocker run -ti -p 8080:8080 localai/localai llama-3.2-3b-instruct:q4_k_m\nexport LLM_API_BASE_URL=http://localhost:3928/v1\n```\n\nFor [LM Studio](https://lmstudio.ai), pick a model (e.g., Llama-3.2 3B). Next, go to the Developer tab, select the model to load, and click the Start Server button. Then, set the `LLM_API_BASE_URL` environment variable, noting that the server by default runs on port `1234`:\n```bash\nexport LLM_API_BASE_URL=http://127.0.0.1:1234/v1\n```\n\nFor [Msty](https://msty.app), choose a model (e.g., Llama-3.2 3B) and ensure the local AI is running. Go to the Settings menu, under Local AI, and note the Service Endpoint (which defaults to port `10002`). Then set the `LLM_API_BASE_URL` environment variable accordingly:\n```bash\nexport LLM_API_BASE_URL=http://127.0.0.1:10002/v1\n```\n\n## Using Managed LLM Services\n\nSupported LLM services include [AI21](https://studio.ai21.com), [Avian](https://avian.io), [Cerebras](https://cloud.cerebras.ai), [Deep Infra](https://deepinfra.com), [DeepSeek](https://platform.deepseek.com/), [Fireworks](https://fireworks.ai), [Gemini](https://ai.google.dev/gemini-api), [Groq](https://groq.com), [Hyperbolic](https://www.hyperbolic.xyz), [Lepton](https://lepton.ai), [Mistral](https://console.mistral.ai), [Nebius](https://studio.nebius.ai), [Novita](https://novita.ai), [OpenAI](https://platform.openai.com), [OpenRouter](https://openrouter.ai), and [Together](https://www.together.ai).\n\nFor configuration specifics, refer to the relevant section. The quality of answers can vary based on the model's performance.\n\n* [AI21](https://studio.ai21.com)\n```bash\nexport LLM_API_BASE_URL=https://api.ai21.com/studio/v1\nexport LLM_API_KEY=\"yourownapikey\"\nexport LLM_CHAT_MODEL=jamba-1.5-mini\n```\n\n* [Avian](https://avian.io)\n```bash\nexport LLM_API_BASE_URL=https://api.avian.io/v1\nexport LLM_API_KEY=\"yourownapikey\"\nexport LLM_CHAT_MODEL=\"Meta-Llama-3.1-8B-Instruct\"\n```\n\n* [Cerebras](https://cloud.cerebras.ai)\n```bash\nexport LLM_API_BASE_URL=https://api.cerebras.ai/v1\nexport LLM_API_KEY=\"yourownapikey\"\nexport LLM_CHAT_MODEL=\"llama3.1-8b\"\n```\n\n* [Deep Infra](https://deepinfra.com)\n```bash\nexport LLM_API_BASE_URL=https://api.deepinfra.com/v1/openai\nexport LLM_API_KEY=\"yourownapikey\"\nexport LLM_CHAT_MODEL=\"meta-llama/Meta-Llama-3.1-8B-Instruct\"\n```\n\n* [DeepSeek](https://platform.deepseek.com)\n```bash\nexport LLM_API_BASE_URL=https://api.deepseek.com/v1\nexport LLM_API_KEY=\"yourownapikey\"\nexport LLM_CHAT_MODEL=\"deepseek-chat\"\n```\n\n* [Fireworks](https://fireworks.ai/)\n```bash\nexport LLM_API_BASE_URL=https://api.fireworks.ai/inference/v1\nexport LLM_API_KEY=\"yourownapikey\"\nexport LLM_CHAT_MODEL=\"accounts/fireworks/models/llama-v3p1-8b-instruct\"\n```\n\n* [Google Gemini](https://ai.google.dev/gemini-api)\n```bash\nexport LLM_API_BASE_URL=https://generativelanguage.googleapis.com/v1beta\nexport LLM_API_KEY=\"yourownapikey\"\nexport LLM_CHAT_MODEL=\"gemini-1.5-flash\"\n```\n\n* [Groq](https://groq.com/)\n```bash\nexport LLM_API_BASE_URL=https://api.groq.com/openai/v1\nexport LLM_API_KEY=\"yourownapikey\"\nexport LLM_CHAT_MODEL=\"llama-3.1-8b-instant\"\n```\n\n* [Hyperbolic](https://www.hyperbolic.xyz)\n```bash\nexport LLM_API_BASE_URL=https://api.hyperbolic.xyz/v1\nexport LLM_API_KEY=\"yourownapikey\"\nexport LLM_CHAT_MODEL=\"meta-llama/Meta-Llama-3.1-8B-Instruct\"\n```\n\n* [Lepton](https://lepton.ai)\n```bash\nexport LLM_API_BASE_URL=https://llama3-1-8b.lepton.run/api/v1\nexport LLM_API_KEY=\"yourownapikey\"\nexport LLM_CHAT_MODEL=\"llama3-1-8b\"\n```\n\n* [Mistral](https://console.mistral.ai)\n```bash\nexport LLM_API_BASE_URL=https://api.mistral.ai/v1\nexport LLM_API_KEY=\"yourownapikey\"\nexport LLM_CHAT_MODEL=\"open-mistral-7b\"\n```\n\n* [Nebius](https://studio.nebius.ai)\n```bash\nexport LLM_API_BASE_URL=https://api.studio.nebius.ai/v1\nexport LLM_API_KEY=\"yourownapikey\"\nexport LLM_CHAT_MODEL=\"meta-llama/Meta-Llama-3.1-8B-Instruct\"\n```\n\n* [Novita](https://novita.ai)\n```bash\nexport LLM_API_BASE_URL=https://api.novita.ai/v3/openai\nexport LLM_API_KEY=\"yourownapikey\"\nexport LLM_CHAT_MODEL=\"meta-llama/llama-3.1-8b-instruct\"\n```\n\n* [OpenAI](https://platform.openai.com)\n```bash\nexport LLM_API_BASE_URL=https://api.openai.com/v1\nexport LLM_API_KEY=\"yourownapikey\"\nexport LLM_CHAT_MODEL=\"gpt-4o-mini\"\n```\n\n* [OpenRouter](https://openrouter.ai/)\n```bash\nexport LLM_API_BASE_URL=https://openrouter.ai/api/v1\nexport LLM_API_KEY=\"yourownapikey\"\nexport LLM_CHAT_MODEL=\"meta-llama/llama-3.1-8b-instruct\"\n```\n\n* [Together](https://www.together.ai/)\n```bash\nexport LLM_API_BASE_URL=https://api.together.xyz/v1\nexport LLM_API_KEY=\"yourownapikey\"\nexport LLM_CHAT_MODEL=\"meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo\"\n```\n\n## Evaluating Questions\n\nIf there is a text file containing pairs of `User` and `Assistant` messages, it can be evaluated with Query LLM:\n\n```\nUser: Which planet is the largest?\nAssistant: The largest planet is /Jupiter/.\n\nUser: and the smallest?\nAssistant: The smallest planet is /Mercury/.\n```\n\nAssuming the above content is in `qa.txt`, executing the following command will initiate a multi-turn conversation with the LLM, asking questions sequentially and verifying answers using regular expressions:\n```bash\n./query-llm.js qa.txt\n```\n\nFor additional examples, please refer to the `tests/` subdirectory.\n\nTwo environment variables can be used to modify the behavior:\n\n* `LLM_DEBUG_FAIL_EXIT`: When set, Query LLM will exit immediately upon encountering an incorrect answer, and subsequent questions in the file will not be processed.\n\n* `LLM_DEBUG_PIPELINE`: When set, and if the expected regular expression does not match the answer, the internal LLM pipeline will be printed to stdout.\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fariya%2Fquery-llm","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fariya%2Fquery-llm","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fariya%2Fquery-llm/lists"}