{"id":21423914,"url":"https://github.com/picovoice/serverless-picollm","last_synced_at":"2025-04-19T14:43:44.707Z","repository":{"id":241426066,"uuid":"804594648","full_name":"Picovoice/serverless-picollm","owner":"Picovoice","description":"LLM Inference on AWS Lambda","archived":false,"fork":false,"pushed_at":"2024-06-03T22:42:38.000Z","size":22840,"stargazers_count":10,"open_issues_count":0,"forks_count":0,"subscribers_count":5,"default_branch":"main","last_synced_at":"2025-03-29T08:51:06.577Z","etag":null,"topics":["aws-lambda","llm","llm-compression","llm-inference","serverless","serverless-inference"],"latest_commit_sha":null,"homepage":"https://picovoice.ai/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Picovoice.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-05-22T22:22:26.000Z","updated_at":"2024-10-05T10:19:24.000Z","dependencies_parsed_at":"2025-01-23T07:13:06.302Z","dependency_job_id":"c7af4e23-5785-42f8-a1eb-65239450e4cb","html_url":"https://github.com/Picovoice/serverless-picollm","commit_stats":null,"previous_names":["picovoice/serverless-picollm"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Picovoice%2Fserverless-picollm","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Picovoice%2Fserverless-picollm/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Picovoice%2Fserverless-picollm/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Picovoice%2Fserverless-picollm/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Picovoice","download_url":"https://codeload.github.com/Picovoice/serverless-picollm/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":249716738,"owners_count":21315068,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["aws-lambda","llm","llm-compression","llm-inference","serverless","serverless-inference"],"created_at":"2024-11-22T21:18:52.068Z","updated_at":"2025-04-19T14:43:44.691Z","avatar_url":"https://github.com/Picovoice.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Serverless picoLLM: LLMs Running in AWS Lambda!\n\nCode for the Serverless LLM article on picovoice.ai which you can find here: [picoLLM on Lambda](https://picovoice.ai/blog/picollm-on-lambda/).\n\n![The Demo in Action](resources/serverless-picollm-small.gif)\n\n## Disclaimer\n\nTHIS DEMO EXCEEDS *AWS* FREE TIER USAGE.\nYOU **WILL** BE CHARGED BY *AWS* IF YOU DEPLOY THIS DEMO.\n\n## Prerequisites\n\nYou will need to following in order to deploy and run this demo:\n\n1. A [Picovoice Console](https://console.picovoice.ai/) account with a valid AccessKey.\n\n2. An [AWS](https://aws.amazon.com/) account.\n\n3. AWS SAM CLI installed and setup. Follow the [offical guide](https://docs.aws.amazon.com/serverless-application-model/latest/developerguide/install-sam-cli.html) completely.\n\n4. A valid [Docker](https://docs.docker.com/get-docker/) installation.\n\n## Setup\n\n1. Clone the [`serverless-picollm` repo](https://github.com/Picovoice/serverless-picollm):\n\n```console\ngit clone https://github.com/Picovoice/serverless-picollm.git\n```\n\n2. Download a `Phi2` based `.pllm` model from the `picoLLM` section of the [Picovoice Console](https://console.picovoice.ai/picollm).\n\n\u003e [!TIP]\n\u003e Other models will work as long as they are chat-enabled and fit within the AWS Lambda code size and memory limits.\n\u003e You will also need to update the `Dialog` object in [client.py](client.py) to the appropriate class.\n\u003e\n\u003e For example, if using `Llama3` with the `llama-3-8b-instruct-326` model, the line in [client.py](client.py) should be updated to: \n\u003e ```python\n\u003e dialog = picollm.Llama3ChatDialog(history=3)\n\u003e ```\n\n3. Place the downloaded `.pllm` model in the [`models/`](models/) directory.\n\n4. Replace `\"${YOUR_ACCESS_KEY_HERE}\"` inside the [`src/app.py`](src/app.py) file with your AccessKey obtained from [Picovoice Console](https://console.picovoice.ai/).\n\n## Deploy\n\n1. Use AWS SAM CLI to build the app:\n\n```console\nsam build\n```\n\n2. Use AWS SAM CLI to deploy the app, following the guided prompts:\n\n```console\nsam deploy --guided\n```\n\n3. At the end of the deployment AWS SAM CLI will print an outputs section. Make note of the `WebSocketURI`. It should look something like this:\n\n```\nCloudFormation outputs from deployed stack\n-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------\nOutputs\n-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------\nKey                 HandlerFunctionFunctionArn\nDescription         HandlerFunction function ARN\nValue               arn:aws:lambda:us-west-2:000000000000:function:picollm-lambda-HandlerFunction-ABC123DEF098\n\nKey                 WebSocketURI\nDescription         The WSS Protocol URI to connect to\nValue               wss://ABC123DEF098.execute-api.us-west-2.amazonaws.com/Prod\n-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------\n```\n\n```\nwss://ABC123DEF098.execute-api.us-west-2.amazonaws.com/Prod\n```\n\u003e [!NOTE]\n\u003e If you make any changes to the model, `Dockerfile` or `app.py` files, you will need to repeat all these deployment steps.\n\n## Chat!\n\n1. Run `client.py`, passing in the URL copied from the deployment step:\n\n```console\npython client.py -u \u003cWebSocket URL\u003e\n```\n\n2. Once connected the client will give you a prompt. Type in your chat message and `picoLLM` will stream back a response from the lambda!\n\n```\n\u003e What is the capital of France?\n\u003c The capital of France is Paris.\n\n\u003c [Completion finished @ `6.35` tps]\n```\n\n\u003e [!IMPORTANT]\n\u003e When you first send a message you may get the following response: `\u003c [Lambda is loading \u0026 caching picoLLM. Please wait...]`.\n\u003e This means the `picoLLM` is loading the model as lambda streams it from the Elastic Container Registry.\n\u003e Because of the nature and limitations of AWS Lambda this process *may* take upwards of a few minutes.\n\u003e Subsequent messages and connections will not take as long to load as lambda will cache the layers.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpicovoice%2Fserverless-picollm","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fpicovoice%2Fserverless-picollm","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpicovoice%2Fserverless-picollm/lists"}