{"id":14964700,"url":"https://github.com/picovoice/picollm","last_synced_at":"2025-10-23T02:42:46.011Z","repository":{"id":240096923,"uuid":"784440009","full_name":"Picovoice/picollm","owner":"Picovoice","description":"On-device LLM Inference Powered by X-Bit Quantization","archived":false,"fork":false,"pushed_at":"2025-05-13T22:05:30.000Z","size":101963,"stargazers_count":238,"open_issues_count":0,"forks_count":13,"subscribers_count":8,"default_branch":"main","last_synced_at":"2025-05-15T20:07:13.808Z","etag":null,"topics":["compression","efficient-inference","gemma","generative-ai","language-model","language-models","large-language-model","llama","llama2","llama3","llm","llm-inference","llms","mistral","mixtral","model-compression","natural-language-processing","quantization","self-hosted"],"latest_commit_sha":null,"homepage":"https://picovoice.ai/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Picovoice.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2024-04-09T21:23:22.000Z","updated_at":"2025-05-13T21:56:45.000Z","dependencies_parsed_at":"2024-09-24T13:58:01.008Z","dependency_job_id":"dbb25dc9-69d1-48f9-bbe3-7081ee411b24","html_url":"https://github.com/Picovoice/picollm","commit_stats":{"total_commits":29,"total_committers":8,"mean_commits":3.625,"dds":0.6206896551724138,"last_synced_commit":"f9e3f55e3504d0a7d8555f9b8d9fd77a76c4cc28"},"previous_names":["picovoice/picollm"],"tags_count":13,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Picovoice%2Fpicollm","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Picovoice%2Fpicollm/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Picovoice%2Fpicollm/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Picovoice%2Fpicollm/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Picovoice","download_url":"https://codeload.github.com/Picovoice/picollm/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254414501,"owners_count":22067272,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["compression","efficient-inference","gemma","generative-ai","language-model","language-models","large-language-model","llama","llama2","llama3","llm","llm-inference","llms","mistral","mixtral","model-compression","natural-language-processing","quantization","self-hosted"],"created_at":"2024-09-24T13:33:39.952Z","updated_at":"2025-10-23T02:42:45.937Z","avatar_url":"https://github.com/Picovoice.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# picoLLM Inference Engine\n\n[![GitHub release](https://img.shields.io/github/release/Picovoice/picollm.svg)](https://github.com/Picovoice/picollm/releases)\n[![GitHub](https://img.shields.io/github/license/Picovoice/picollm)](https://github.com/Picovoice/picollm/)\n\n[![Maven Central](https://img.shields.io/maven-central/v/ai.picovoice/picollm-android?label=maven-central%20%5Bandroid%5D)](https://repo1.maven.org/maven2/ai/picovoice/picollm-android/)\n[![npm](https://img.shields.io/npm/v/@picovoice/picollm-web?label=npm%20%5Bweb%5D)](https://www.npmjs.com/package/@picovoice/picollm-web)\n[![CocoaPods](https://img.shields.io/cocoapods/v/picoLLM-iOS)](https://cocoapods.org/pods/picoLLM-iOS)\u003c!-- markdown-link-check-disable-line --\u003e\n[![PyPI](https://img.shields.io/pypi/v/picollm)](https://pypi.org/project/picollm/)\n[![Nuget](https://img.shields.io/nuget/v/picollm)](https://www.nuget.org/packages/PicoLLM/)\n\nMade in Vancouver, Canada by [Picovoice](https://picovoice.ai)\n\n[![Twitter URL](https://img.shields.io/twitter/url?label=%40AiPicovoice\u0026style=social\u0026url=https%3A%2F%2Ftwitter.com%2FAiPicovoice)](https://twitter.com/AiPicovoice)\u003c!-- markdown-link-check-disable-line --\u003e\n[![YouTube Channel Views](https://img.shields.io/youtube/channel/views/UCAdi9sTCXLosG1XeqDwLx7w?label=YouTube\u0026style=social)](https://www.youtube.com/channel/UCAdi9sTCXLosG1XeqDwLx7w)\n\npicoLLM Inference Engine is a highly accurate and cross-platform SDK optimized for running compressed large language\nmodels. picoLLM Inference Engine is:\n\n- Accurate; picoLLM Compression improves GPTQ by [significant margins](https://picovoice.ai/blog/picollm-towards-optimal-llm-quantization/)\n- Private; LLM inference runs 100% locally.\n- Cross-Platform\n    - Linux (x86_64), macOS (arm64, x86_64), and Windows (x86_64, arm64)\n    - Raspberry Pi (5 and 4)\n    - Android and iOS\n    - Chrome, Safari, Edge, and Firefox\n- Runs on CPU and GPU\n- Free for open-weight models\n\n## Table of Contents\n\n- [picoLLM](#picollm-inference-engine)\n    - [Table of Contents](#table-of-contents)\n    - [Showcases](#showcases)\n      - [Raspberry Pi](#raspberry-pi)\n      - [Android](#android)\n      - [iOS](#ios)\n      - [Cross-Browser Local LLM](#cross-browser-local-llm)\n      - [Llama-3-70B-Instruct on GeForce RTX 4090](#llama-3-70b-instruct-on-geforce-rtx-4090)\n      - [Local LLM-Powered Voice Assistant on Raspberry Pi](#local-llm-powered-voice-assistant-on-raspberry-pi)\n      - [Local Llama-3-8B-Instruct Voice Assistant on CPU](#local-llama-3-8b-instruct-voice-assistant-on-cpu)\n    - [Accuracy](#accuracy)\n    - [Models](#models)\n    - [AccessKey](#accesskey)\n    - [Demos](#demos)\n        - [Python](#python-demos)\n        - [.NET](#net-demos)\n        - [Node.js](#nodejs-demos)\n        - [Android](#android-demos)\n        - [iOS](#ios-demos)\n        - [Web](#web-demos)\n        - [C](#c-demos)\n    - [SDKs](#sdks)\n        - [Python](#python-sdk)\n        - [.NET](#net-sdk)\n        - [Node.js](#nodejs-sdk)\n        - [Android](#android-sdk)\n        - [iOS](#ios-sdk)\n        - [Web](#web-sdk)\n        - [C](#c-sdk)\n    - [Releases](#releases)\n\n## Showcases\n\n### Raspberry Pi\n\n[![Local LLM on Raspberry Pi](https://img.youtube.com/vi/CeKPXZ_8hkI/0.jpg)](https://www.youtube.com/watch?v=CeKPXZ_8hkI)\n\n### Android\n\n[![How to Run a Local LLM on Android](https://img.youtube.com/vi/XeUMkue-5lI/0.jpg)](https://www.youtube.com/watch?v=XeUMkue-5lI)\n\n### iOS\n\n[![How to Run a Local LLM on iOS](https://img.youtube.com/vi/dNK5esdkI0Y/0.jpg)](https://www.youtube.com/watch?v=dNK5esdkI0Y)\n\n### Cross-Browser Local LLM\n\n[Live Demo — Works offline!](https://picovoice.ai/picollm/)\n\n### Llama-3-70B-Instruct on GeForce RTX 4090\n\n[![Llama-3-70B-Instruct on GeForce RTX 4090](https://img.youtube.com/vi/4mcVwbOOIqk/0.jpg)](https://www.youtube.com/watch?v=4mcVwbOOIqk)\n\n### Local LLM-Powered Voice Assistant on Raspberry Pi\n\n[![Local LLM-Powered Voice Assistant on Raspberry Pi](https://img.youtube.com/vi/GEndT3RGRvw/0.jpg)](https://www.youtube.com/watch?v=GEndT3RGRvw)\n\n### Local Llama-3-8B-Instruct Voice Assistant on CPU\n\n[![Local Llama-3-8B-Instruct Voice Assistant on CPU](https://img.youtube.com/vi/uV0GlXDFSPw/0.jpg)](https://www.youtube.com/watch?v=uV0GlXDFSPw)\n\n## Accuracy\n\npicoLLM Compression is a novel large language model (LLM) quantization algorithm developed within Picovoice. Given a task-specific cost function, picoLLM Compression automatically learns the optimal bit allocation strategy across and within LLM's weights. Existing techniques require a fixed bit allocation scheme, which is subpar.\n\nFor example, picoLLM Compression recovers MMLU score degradation of widely adopted GPTQ by 91%, 99%, and 100% at 2, 3,\nand 4-bit settings. The figure below depicts the MMLU comparison between picoLLM and GPTQ for Llama-3-8b [[1]](https://picovoice.ai/blog/picollm-towards-optimal-llm-quantization/).\n\n![picoLLM Compression vs GPTQ MMLU scores when applied to Llama-3-8B](./resources/mmlu-llama-3-8b.svg)\n\n## Models\n\npicoLLM Inference Engine supports the following open-weight models. The models are on\n[Picovoice Console](https://console.picovoice.ai/).\n\n- Gemma\n    - `gemma-2b`\n    - `gemma-2b-it`\n    - `gemma-7b`\n    - `gemma-7b-it`\n- Llama-2\n    - `llama-2-7b`\n    - `llama-2-7b-chat`\n    - `llama-2-13b`\n    - `llama-2-13b-chat`\n    - `llama-2-70b`\n    - `llama-2-70b-chat`\n- Llama-3\n    - `llama-3-8b`\n    - `llama-3-8b-instruct`\n    - `llama-3-70b`\n    - `llama-3-70b-instruct`\n- Llama-3.2\n    - `llama3.2-1b-instruct`\n    - `llama3.2-3b-instruct`\n- Mistral\n    - `mistral-7b-v0.1`\n    - `mistral-7b-instruct-v0.1`\n    - `mistral-7b-instruct-v0.2`\n- Mixtral\n    - `mixtral-8x7b-v0.1`\n    - `mixtral-8x7b-instruct-v0.1`\n- Phi-2\n    - `phi2`\n- Phi-3\n    - `phi3`\n- Phi-3.5\n  - `phi3.5`\n\n## AccessKey\n\nAccessKey is your authentication and authorization token for deploying Picovoice SDKs, including picoLLM. Anyone who is\nusing Picovoice needs to have a valid AccessKey. You must keep your AccessKey secret. You would need internet\nconnectivity to validate your AccessKey with Picovoice license servers even though the LLM inference is running 100%\noffline and completely free for open-weight models. Everyone who signs up for\n[Picovoice Console](https://console.picovoice.ai/) receives a unique AccessKey.\n\n## Demos\n\n### Python Demos\n\nInstall the demo package:\n\n```console\npip3 install picollmdemo\n```\n\nRun the following in the terminal:\n\n```console\npicollm_demo_completion --access_key ${ACCESS_KEY} --model_path ${MODEL_PATH} --prompt ${PROMPT}\n```\n\nReplace `${ACCESS_KEY}` with yours obtained from Picovoice Console, `${MODEL_PATH}` with the path to a model file\ndownloaded from Picovoice Console, and `${PROMPT}` with a prompt string.\n\nFor more information about Python demos go to [demo/python](demo/python/README.md).\n\n### .NET Demos\n\nFrom [demo/dotnet/PicoLLMDemo](demo/dotnet/PicoLLMDemo) build and run the demo:\n\n```console\ndotnet build -c CompletionDemo.Release\ndotnet run -c CompletionDemo.Release -- --access_key ${ACCESS_KEY} --model_path ${MODEL_PATH} --prompt ${PROMPT}\n```\n\nReplace `${ACCESS_KEY}` with yours obtained from Picovoice Console, `${MODEL_PATH}` with the path to a model file\ndownloaded from Picovoice Console, and `${PROMPT}` with a prompt string.\n\nFor more information about .NET demos go to [demo/dotnet](demo/dotnet).\n\n### Node.js Demos\n\nInstall the demo package:\n\n```console\nyarn global add @picovoice/picollm-node-demo\n```\n\nRun the following in the terminal:\n\n```console\npicollm-completion-demo --access_key ${ACCESS_KEY} --model_path ${MODEL_PATH} --prompt ${PROMPT}\n```\n\nReplace `${ACCESS_KEY}` with yours obtained from Picovoice Console, `${MODEL_PATH}` with the path to a model file\ndownloaded from Picovoice Console, and `${PROMPT}` with a prompt string.\n\nFor more information about Node.js demos go to [Node.js demo](./demo/nodejs).\n\n### Android Demos\n\nUsing Android Studio, open the [Completion demo](demo/android/Completion) as an Android project, copy your AccessKey into MainActivity.java, and run the application.\n\nTo learn about how to use picoLLM in a chat application, try out the [Chat demo](demo/android/Chat).\n\nFor more information about Android demos go to [demo/android](demo/android/README.md).\n\n### iOS Demos\n\nTo run the completion demo, go to [demo/ios/Completion](demo/ios/Completion) and run:\n\n```console\npod install\n```\n\nReplace `let ACCESS_KEY = \"${YOUR_ACCESS_KEY_HERE}\"` in the file [VieModel.swift](demo/ios/Completion/PicoLLMCompletionDemo/ViewModel.swift) with your AccessKey obtained from [Picovoice Console](https://console.picovoice.ai/).\n\nThen, using [Xcode](https://developer.apple.com/xcode/), open the generated `PicoLLMCompletionDemo.xcworkspace` and run the application.\n\nTo learn about how to use picoLLM in a chat application, try out the [Chat demo](demo/ios/Chat).\n\nFor more information about iOS demos go to [demo/ios](demo/ios/README.md).\n\n### Web Demos\n\nFrom [demo/web](demo/web) run the following in the terminal:\n\n```console\nyarn\nyarn start\n```\n\n(or)\n\n```console\nnpm install\nnpm run start\n```\n\nOpen `http://localhost:5000` in your browser to try the demo.\n\n### C Demos\n\nBuild the demo:\n\n```console\ncmake -S demo/c/ -B demo/c/build \u0026\u0026 cmake --build demo/c/build\n```\n\nRun the demo:\n\n```console\n./demo/c/build/picollm_demo_completion -a ${ACCESS_KEY} -l ${LIBRARY_PATH} -m ${MODEL_FILE_PATH} -p ${PROMPT}\n```\n\nReplace `${ACCESS_KEY}` with yours obtained from Picovoice Console, `${LIBRARY_PATH}` with the path to the shared\nlibrary file located in the [lib](lib) directory, `${MODEL_FILE_PATH}` with the path to a model file downloaded from\nPicovoice Console, and `${PROMPT}` with a prompt string.\n\nFor more information about C demos go to [demo/c](demo/c/README.md).\n\n## SDKs\n\n### Python SDK\n\nInstall the Python SDK:\n\n```console\npip3 install picollm\n```\n\nCreate an instance of the engine and generate a prompt completion:\n\n```python\nimport picollm\n\npllm = picollm.create(\n    access_key='${ACCESS_KEY}',\n    model_path='${MODEL_PATH}')\n\nres = pllm.generate('${PROMPT}')\nprint(res.completion)\n```\n\nReplace `${ACCESS_KEY}` with yours obtained from Picovoice Console, `${MODEL_PATH}` to the path to a model file\ndownloaded from Picovoice Console, and `${PROMPT}` to a prompt string. Finally, when done be sure to explicitly release\nthe resources using `pllm.release()`.\n\n### .NET SDK\n\nInstall the .NET SDK using NuGet or the dotnet CLI:\n\n```console\ndotnet add package PicoLLM\n```\n\nCreate an instance of the engine and generate a prompt completion:\n\n```csharp\nusing Pv;\n\nPicoLLM pllm = PicoLLM.Create(\"${ACCESS_KEY}\", \"${MODEL_PATH}\");\n\nPicoLLMCompletion res = pllm.Generate('${PROMPT}');\nConsole.WriteLine(res.Completion);\n```\n\nReplace `${ACCESS_KEY}` with yours obtained from Picovoice Console, `${MODEL_PATH}` to the path to a model file\ndownloaded from Picovoice Console, and `${PROMPT}` to a prompt string.\n\n`PicoLLM` will have its resources freed by the garbage collector, but to have resources freed immediately after use,\nwrap it in a using statement or call `.Dispose()` directly:\n\n```csharp\nusing(PicoLLM pllm = PicoLLM.Create(accessKey, modelPath))\n{\n    // .. picoLLM usage here\n}\n```\n\n### Node.js SDK\n\nInstall the Node.js SDK:\n\n```console\nyarn add @picovoice/picollm-node\n```\n\nCreate instances of the picoLLM class:\n\n```javascript\nconst { PicoLLM } = require(\"@picovoice/picollm-node\");\nconst pllm = new PicoLLM('${ACCESS_KEY}', '${MODEL_PATH}');\n\nconst res = await pllm.generate('${PROMPT}');\nconsole.log(res.completion);\n```\n\nReplace `${ACCESS_KEY}` with yours obtained from Picovoice Console, `${MODEL_PATH}` to the path to a model file\ndownloaded from Picovoice Console, and `${PROMPT}` to a prompt string. Finally, when done be sure to explicitly release\nthe resources using `pllm.release()`.\n\n### Android SDK\n\nCreate an instance of the inference engine and generate a prompt completion:\n\n```java\nimport ai.picovoice.picollm.*;\n\ntry {\n    PicoLLM picollm = new PicoLLM.Builder()\n        .setAccessKey(\"${ACCESS_KEY}\")\n        .setModelPath(\"${MODEL_PATH}\")\n        .build();\n    PicoLLMCompletion res = picollm.generate(\n        \"${PROMPT}\",\n        new PicoLLMGenerateParams.Builder().build());\n} catch (PicoLLMException e) { }\n```\n\nReplace `${ACCESS_KEY}` with your `AccessKey` from Picovoice Console, `${MODEL_PATH}` to the path to a model file\ndownloaded from Picovoice Console, and `${PROMPT}` to a prompt string. Finally, when done be sure to explicitly release\nthe resources using `picollm.delete()`.\n\n### iOS SDK\n\nCreate an instance of the engine and generate a prompt completion:\n\n```swift\nimport PicoLLM\n\nlet pllm = try PicoLLM(\n    accessKey: \"${ACCESS_KEY}\",\n    modelPath: \"${MODEL_PATH}\")\n\nlet res = pllm.generate(prompt: \"${PROMPT}\")\nprint(res.completion)\n```\n\nReplace `${ACCESS_KEY}` with yours obtained from Picovoice Console, `${MODEL_PATH}` to the path to a model file\ndownloaded from Picovoice Console, and `${PROMPT}` to a prompt string.\n\n### Web SDK\n\nInstall the web SDK using yarn:\n\n```console\nyarn add @picovoice/picollm-web\n```\n\nor using npm:\n\n```console\nnpm install --save @picovoice/picollm-web\n```\n\nCreate an instance of the engine using `PicoLLMWorker` and transcribe an audio file:\n\n```typescript\nimport { PicoLLMWorker } from \"@picovoice/picollm-web\";\n\nconst picoLLMModel = {\n  modelFile: '${MODEL_FILE}'\n}\n\nconst picoLLM = await PicoLLMWorker.create(\n  \"${ACCESS_KEY}\",\n  picoLLMModel\n);\n\nconst res = await picoLLM.generate(`${PROMPT}`);\nconsole.log(res.completion);\n```\n\nReplace `${ACCESS_KEY}` with yours obtained from [Picovoice Console](https://console.picovoice.ai/), `${MODEL_FILE}` with the contents of the model file as `File`, `Blob` or `URL (path to model file)` format and `${PROMPT}` with a prompt string. Finally, when done release the resources using `picoLLM.release()`.\n\n### C SDK\n\nCreate an instance of the engine and generate a prompt completion:\n\n```c\npv_picollm_t *pllm = NULL;\npv_picollm_init(\n    \"${ACCESS_KEY}\",\n    \"${MODEL_PATH}\",\n    \"best\",\n    \u0026pllm);\n\npv_picollm_usage_t usage;\npv_picollm_endpoint_t endpoint;\nint32_t num_completion_tokens;\npv_picollm_completion_token_t *completion_tokens;\nchar *output;\npv_picollm_generate(\n    pllm,\n    \"${PROMPT}\",\n    -1,    // completion_token_limit\n    NULL,  // stop_phrases\n    0,     // num_stop_phrases\n    -1,    // seed\n    0.f,   // presence_penalty\n    0.f,   // frequency_penalty\n    0.f,   // temperature\n    1.f,   // top_p\n    0,     // num_top_choices\n    NULL,  // stream_callback\n    NULL,  // stream_callback_context\n    \u0026usage,\n    \u0026endpoint,\n    \u0026completion_tokens,\n    \u0026num_completion_tokens,\n    \u0026output);\nprintf(\"%s\\n\", output);\n```\n\nReplace `${ACCESS_KEY}` with yours obtained from Picovoice Console, `${MODEL_PATH}` to the path to a model file\ndownloaded from Picovoice Console, and `${PROMPT}` to a prompt string.\n\nFinally, when done, be sure to release the resources explicitly:\n\n```c\npv_picollm_delete(pllm);\n```\n\n## Releases\n\n### v1.3.0 - March 14th, 2025\n\n- Performance improvements\n- Significant performance improvements for iOS and Web\n\n### v1.2.0 - November 26th, 2024\n\n- Performance improvements\n- Added support for phi3.5\n\n### v1.1.0 - October 1st, 2024\n\n- Added `interrupt()` function for halting completion generation early\n- Performance improvements\n- Added support for phi3\n- Bug fixes\n\n### v1.0.0 - May 28th, 2024\n\n- Initial release\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpicovoice%2Fpicollm","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fpicovoice%2Fpicollm","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpicovoice%2Fpicollm/lists"}