{"id":23510281,"url":"https://github.com/SomeOddCodeGuy/WilmerAI","last_synced_at":"2025-08-28T06:30:41.457Z","repository":{"id":245948380,"uuid":"783558843","full_name":"SomeOddCodeGuy/WilmerAI","owner":"SomeOddCodeGuy","description":"What If Language Models Expertly Routed All Inference? WilmerAI allows prompts to be routed to specialized workflows based on the domain chosen by your LLM. Also allows chat Assistants to be powered by multiple LLMs working in tandem to generate a response. Compatible with Socg's Offline Wikipedia Article API.","archived":false,"fork":false,"pushed_at":"2025-08-25T05:35:00.000Z","size":15124,"stargazers_count":761,"open_issues_count":2,"forks_count":48,"subscribers_count":14,"default_branch":"master","last_synced_at":"2025-08-25T08:00:29.180Z","etag":null,"topics":["ai","generative-ai","llms"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/SomeOddCodeGuy.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2024-04-08T06:10:36.000Z","updated_at":"2025-08-25T06:28:52.000Z","dependencies_parsed_at":null,"dependency_job_id":"2cd06999-58a9-4154-b86d-1376fb1f1633","html_url":"https://github.com/SomeOddCodeGuy/WilmerAI","commit_stats":null,"previous_names":["someoddcodeguy/wilmerai"],"tags_count":8,"template":false,"template_full_name":null,"purl":"pkg:github/SomeOddCodeGuy/WilmerAI","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SomeOddCodeGuy%2FWilmerAI","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SomeOddCodeGuy%2FWilmerAI/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SomeOddCodeGuy%2FWilmerAI/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SomeOddCodeGuy%2FWilmerAI/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/SomeOddCodeGuy","download_url":"https://codeload.github.com/SomeOddCodeGuy/WilmerAI/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SomeOddCodeGuy%2FWilmerAI/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":272453768,"owners_count":24937468,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-08-28T02:00:10.768Z","response_time":74,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai","generative-ai","llms"],"created_at":"2024-12-25T12:00:45.272Z","updated_at":"2025-08-28T06:30:41.448Z","avatar_url":"https://github.com/SomeOddCodeGuy.png","language":"Python","readme":"## DISCLAIMER:\n\n\u003e This is a personal project under heavy development. It could, and likely does, contain bugs, incomplete code,\n\u003e or other unintended issues. As such, the software is provided as-is, without warranty of any kind.\n\u003e\n\u003e This project and any expressed views, methodologies, etc., found within are the result of contributions by the\n\u003e maintainer and any contributors in their free time, and should not reflect upon any of their employers.\n\n---\n\n## What is WilmerAI?\n\nWilmer is an application that sits between your front end (or any other LLM-calling program) and the LLM APIs you're\nsending your prompts to.\n\nTo connect to it, Wilmer exposes OpenAI- and Ollama-compatible API endpoints, and on the backend it can connect to LLM\nAPIs like OpenAI, KoboldCpp, and Ollama.\n\nTo visualize: you type a prompt into your front end, which is connected to Wilmer. The prompt gets sent to Wilmer\nfirst, which runs it through a series of workflows. Each workflow may make calls to multiple LLMs, after which the final\nresponse comes back to you.\n\nFrom your perspective, it looks like a (likely long-running) one-shot call to an LLM. But in reality, it could involve\nmany\nLLMs—and even tools—performing complex work.\n\n### What Does WilmerAI Stand For?\n\nWilmerAI stands for **\"What If Language Models Expertly Routed All Inference?\"**\n\n---\n\n## Maintainer's Note — UPDATED 2025-08-17\n\n\u003e **IMPORTANT:**  \n\u003e Until October 2025, WilmerAI will not accept any new Pull Requests that modify anything within the\n\u003e Middleware modules; some exceptions may apply. Updates to iSevenDays' new MCP tool-calling feature, or adding new\n\u003e custom users or prompt templates within the Public directory, are still welcome.\n\u003e\n\u003e **Roadmap to Complete Before New PRs Accepted**\n\u003e * ~~Reasoning LLM Support (think block removal, prepend text to prompt/system)~~ **(COMPLETE)**\n\u003e * ~~Refactor LlmApis~~ **(First Round COMPLETE)**\n\u003e * ~~Refactor FrontEnd Apis~~ **(First Round COMPLETE)**\n\u003e * ~~Refactor Workflows~~ **(First Round COMPLETE)**\n\u003e * ~~Vector Memory Initial Implementation~~ **(COMPLETE)**\n\u003e * Rewrite Readme and Expand Documentation *(In Progress)*\n\u003e * Full Redo of Most Example Users, Using New Prompting Strategies *(In Progress)*\n\u003e * Second Round Refactoring for Unit Tests\n\u003e * Full Unit Test Coverage of Primary Functions\n\u003e\n\u003e During this time, there are very likely to be new bugs introduced. I really don’t have the ability\n\u003e to work on this project during the week at all, so it’s a heads-down code-a-thon on weekends whenever\n\u003e I can. Please bear with me if I break stuff along the way over the next few weeks.\n\u003e\n\u003e To help reduce the pain of this, I’ve finally set up tags/releases, with major checkpoints from the past\n\u003e few months chosen so you can grab earlier, better-working versions.\n\u003e\n\u003e PS: Please bear with me if one of my documents says something dumb. When time is short, documentation usually\n\u003e suffers the most, so I’m relying heavily on LLMs right now. Normally I would do it by hand or at least\n\u003e proofread it better—so I apologize in advance. I’ll clean that up soon-ish.\n\u003e\n\u003e — Socg\n\n## The Power of Workflows\n\n### Semi-Autonomous Workflows Allow You Determine What Tools and When\n\nThe below shows Open WebUI connected to 2 instances of Wilmer. The first instance just hits Mistral Small 3 24b\ndirectly, and then the second instance makes a call to\nthe [Offline Wikipedia API](https://github.com/SomeOddCodeGuy/OfflineWikipediaTextApi) before making the call to the\nsame model.\n\n![No-RAG vs RAG](Docs/Gifs/Search-Gif.gif)\n*Click the image to play gif if it doesn't start automatically*\n\n### Iterative LLM Calls To Improve Performance\n\nA zero-shot to an LLM may not give great results, but follow-up questions will often improve them. If you\nregularly\nperform [the same follow-up questions when doing tasks like software development](https://github.com/SomeOddCodeGuy/SomeOddCodeGuy/blob/main/Github_Images/My%20personal%20guide%20for%20developing%20software%20with%20AI%20assistance%20r_LocalLLaMA.png),\ncreating a workflow to\nautomate those steps can have great results.\n\n### Distributed LLMs\n\nWith workflows, you can have as many LLMs available to work together in a single call as you have computers to support.\nFor example, if you have old machines lying around that can run 3-8b models? You can put them to use as worker LLMs in\nvarious nodes. The more LLM APIs that you have available to you, either on your own home hardware or via proprietary\nAPIs, the more powerful you can make your workflow network. A single prompt to Wilmer could reach out to 5+ computers,\nincluding proprietary APIs, depending on how you build your workflow.\n\n## Some (Not So Pretty) Pictures to Help People Visualize What It Can Do\n\n#### Example of A Simple Assistant Workflow Using the Prompt Router\n\n![Single Assistant Routing to Multiple LLMs](Docs/Examples/Images/Wilmer-Assistant-Workflow-Example.jpg)\n\n#### Example of How Routing Might Be Used\n\n![Prompt Routing Example](Docs/Examples/Images/Wilmer-Categorization-Workflow-Example.png)\n\n#### Group Chat to Different LLMs\n\n![Groupchat to Different LLMs](Docs/Examples/Images/Wilmer-Groupchat-Workflow-Example.png)\n\n#### Example of a UX Workflow Where A User Asks for a Website\n\n![Oversimplified Example Coding Workflow](Docs/Examples/Images/Wilmer-Simple-Coding-Workflow-Example.jpg)\n\n## Key Features\n\n- **Prompt Routing**: Prompts sent into Wilmer can be routed to any custom category, whether that be a domain (like\n  coding, math, medical, etc) or a persona name (for groups chats with a different LLM for each persona).\n\n\n- **Custom Workflows**: Routing isn't required; you can also override the routing so that every prompt goes to a single\n  workflow every time.\n\n\n- **Single Prompts Responded To By Multiple LLMs in tandem**: Every node in a workflow can hit a different LLM if you\n  want, so a single prompt could be worked on by 10+ LLMs if that was what you wanted. This means one AI assistant can\n  be powered by several workflows, and many LLMs, all working together to generate the best answer.\n\n\n- **Support For The Offline Wikipedia API**: WilmerAI has a node that can make calls to the\n  [OfflineWikipediaTextApi](https://github.com/SomeOddCodeGuy/OfflineWikipediaTextApi), to allow for RAG setups to\n  improve factual responses.\n\n\n- **Continually Generated Chat Summaries to Simulate a \"Memory\"**: The Chat Summary node will generate \"memories\",\n  by chunking your messages and then summarizing them and saving them to a file. It will then take those summarized\n  chunks and generate an ongoing, constantly updating, summary of the entire conversation. This allows conversations\n  that far exceed the LLM's context to continue to maintain some level of consistency.\n\n\n- **Hotswap Models to Maximize VRAM Usage:** Leveraging Ollama's hotswapping, you can run complex workflows even on\n  systems with smaller amounts of VRAM. For example, if a 24GB RTX 3090 can load a 14b model, then using endpoints\n  pointed towards Ollama, you can have workflows with as many 14b models as your computer has storage to hold, and\n  each node that uses a different model will cause Ollama to unload the previous model, and load the new one.\n\n\n- **Customizable Presets**: Presets are saved in a json file that you can readily customize. Presets are configured in\n  json files and sent as-is to the API, so if a new sampler comes out that isn't included in Wilmer, you can just\n  pop into the json file for the preset and update it. Each LLM type that Wilmer hits gets its own preset folder.\n\n\n- **Vision Multi-Modal Support Via Ollama:** Experimental support of image processing when using\n  Ollama as the front end API, and having an Ollama backend API to send it to. Send multiple images in a single\n  message, even if the LLM itself does not support that; Wilmer will iterate through them and query the LLM\n  one at a time. The images can either be utilized as variables for prompts in other workflows, or can be added\n  to the conversation as messages.\n\n\n- **Mid-Workflow Conditional Workflows:** Similar to the main domain routing, you can kick off new workflows inside\n  of other workflows, either directly or based on a condition. So you can ask the LLM \"Would a Wikipedia article help\n  here?\", and if the answer is 'yes' then kick off a wikipedia workflow, or if 'no' then kick off a workflow that just\n  hits LLMs.\n\n\n- **MCP Server Tool Integration using MCPO:** New and experimental support for MCP\n  server tool calling using MCPO, allowing tool use mid-workflow. Big thank you\n  to [iSevenDays](https://github.com/iSevenDays)\n  for the amazing work on this feature. More info can be found in the [ReadMe](Public/modules/README_MCP_TOOLS.md)\n\n## Why Make WilmerAI?\n\nWilmer was kicked off in late 2023, during the Llama 2 era, to make maximum use of fine-tunes through routing.\nThe routers that existed at the time didn't handle semantic routing well- often categorizing was based on a single\nword and the last message only; but sometimes a single word isn't enough to describe a category, and the last\nmessage may have too much inferred speech or lack too much context to appropriately categorize on.\n\nAlmost immediately after Wilmer was started, it became apparent that just routing wasn't enough: the finetunes were ok,\nbut nowhere near as smart as proprietary LLMs. However, when the LLMs were forced to iterate on the same task over and\nover, the quality of their responses tended to improve (as long as the prompt was well written). This meant that the\noptimal result wasn't routing just to have a single LLM one-shot the response, but rather sending the prompt to\nsomething\nmore complex.\n\nInstead of relying on unreliable autonomous agents, Wilmer became focused on semi-autonomous Workflows, giving the\nuser granular control of the path the LLMs take, and allow maximum use of the user's own domain knowledge and\nexperience. This also meant that multiple LLMs could work together, orchestrated by the workflow itself,\nto come up with a single solution.\n\nRather than routing to a single LLM, Wilmer routes to many via a whole workflow.\n\nThis has allowed Wilmer's categorization to be far more complex and customizable than most routers. Categorization is\nhandled by user defined workflows, with as many nodes and LLMs involved as the user wants, to break down the\nconversation and determine exactly what the user is asking for. This means the user can experiment with different\nprompting styles to try to make the router get the best result. Additionally, the routes are more than just keywords,\nbut rather full descriptions of what the route entails. Little is left to the LLM's \"imagination\". The goal is that\nany weakness in Wilmer's categorization can be corrected by simply modifying the categorization workflow. And once\nthat category is chosen? It goes to another workflow.\n\nEventually Wilmer became more about Workflows than routing, and an optional bypass was made to skip routing entirely.\nBecause of the small footprint, this means that users can run multiple instances of Wilmer- some hitting a workflow\ndirectly, while others use categorization and routing.\n\nWhile Wilmer may have been the first of its kind, many other semantic routers have since appeared; some of which are\nlikely faster and better. But this project will continue to be maintained for a long time to come, as the maintainer\nof the project still uses it as his daily driver, and has many more plans for it.\n\n## Wilmer API Endpoints\n\n### How Do You Connect To Wilmer?\n\nWilmer exposes several different APIs on the front end, allowing you to connect most applications in the LLM space\nto it.\n\nWilmer exposes the following APIs that other apps can connect to it with:\n\n- OpenAI Compatible v1/completions (*requires [Wilmer Prompt Template](Public/Configs/PromptTemplates/wilmerai.json)*)\n- OpenAI Compatible chat/completions\n- Ollama Compatible api/generate (*requires [Wilmer Prompt Template](Public/Configs/PromptTemplates/wilmerai.json)*)\n- Ollama Compatible api/chat\n\n### What Wilmer Can Connect To\n\nOn the backend, Wilmer is capable to connecting to various APIs, where it will send its prompts to LLMs. Wilmer\ncurrently is capable of connecting to the following API types:\n\n- OpenAI Compatible v1/completions\n- OpenAI Compatible chat/completions\n- Ollama Compatible api/generate\n- Ollama Compatible api/chat\n- KoboldCpp Compatible api/v1/generate (*non-streaming generate*)\n- KoboldCpp Compatible /api/extra/generate/stream (*streaming generate*)\n\nWilmer supports both streaming and non-streaming connections, and has been tested using both Sillytavern\nand Open WebUI.\n\n## Maintainer's Note:\n\n\u003e This is a passion project that is being supported in my free time. I do not have the ability to contribute to this\n\u003e during standard business hours on\n\u003e weekdays due to work, so my only times to make code updates are weekends, and some weekday late nights.\n\u003e\n\u003e If you find a bug or other issue, a fix may take a week or two to go out. I apologize in\n\u003e advance if that ends up being the case, but please don't take it as meaning I am not taking the\n\u003e issue seriously. In reality, I likely\n\u003e won't have the ability to even look at the issue until the following Friday or Saturday.\n\u003e\n\u003e -Socg\n\n## IMPORTANT:\n\n\u003e Please keep in mind that workflows, by their very nature, could make many calls to an API endpoint based on how you\n\u003e set them up. WilmerAI does not track token usage, does not report accurate token usage via its API, nor offer any\n\u003e viable\n\u003e way to monitor token usage. So if token usage tracking is important to you for cost reasons, please be sure to keep\n\u003e track of how many tokens you are using via any dashboard provided to you by your LLM APIs, especially early on as you\n\u003e get used to this software.\n\u003e\n\u003eYour LLM directly affects the quality of WilmerAI. This is an LLM driven project, where the flows and outputs are\n\u003e almost\n\u003e entirely dependent on the connected LLMs and their responses. If you connect Wilmer to a model that produces lower\n\u003e quality outputs, or if your presets or prompt template have flaws, then Wilmer's overall quality will be much lower\n\u003e quality as well. It's not much different than agentic workflows in that way.\n\n### Youtube Videos\n\n[WilmerAI Setup Tutorial](https://www.youtube.com/watch?v=v2xYQCHZwJM)\n\nThis 40 minute video shows:\n\n- A walkthrough of downloading and setting up Wilmer\n- Running Wilmer and sending a cURL command to it\n- A walkthrough of the wikipedia workflow\n- A brief talk of the new Socg users\n\n[WilmerAI Tutorial Youtube PlayList](https://www.youtube.com/playlist?list=PLjIfeYFu5Pl7J7KGJqVmHM4HU56nByb4X)\n\nThis 3 hour video series shows:\n\n- A more in-depth walkthrough of Wilmer and what it is\n- An explanation of some of the workflows, as well as the custom python script module\n- Explaining Socg's personal setup\n- Setting up and running an example user\n- Showing a run of a workflow on an RTX 4090 that utilizes Ollama's ability to hotswap multiple 14b models,\n  allowing a 24GB video card to run as many models that would fit individually on the card as you have hard\n  drive space for.\n\n### Connecting in SillyTavern\n\n#### Text Completion\n\nTo connect as a Text Completion in SillyTavern, follow these steps (the below screenshot is from SillyTavern):\n\nConnect as OpenAI Compatible v1/Completions:\n\n![OpenAI Text Completion Settings](Docs/Examples/Images/ST_text_completion_settings.png)\n\nOR\n\nConnect as Ollama api/generate:\n\n![Ollama Text Completion Settings](Docs/Examples/Images/ST_ollama_text_completion_settings.png)\n\nWhen using text completions, you need to use a WilmerAI-specific Prompt Template format. An importable ST file can be\nfound within `Docs/SillyTavern/InstructTemplate`. The context template is also included if you'd like to use that as\nwell.\n\nThe instruction template looks like this:\n\n```\n[Beg_Sys]You are an intelligent AI Assistant.[Beg_User]SomeOddCodeGuy: Hey there![Beg_Assistant]Wilmer: Hello![Beg_User]SomeOddCodeGuy: This is a test[Beg_Assistant]Wilmer:  Nice.\n```\n\nFrom SillyTavern:\n\n```\n    \"input_sequence\": \"[Beg_User]\",\n    \"output_sequence\": \"[Beg_Assistant]\",\n    \"first_output_sequence\": \"[Beg_Assistant]\",\n    \"last_output_sequence\": \"\",\n    \"system_sequence_prefix\": \"[Beg_Sys]\",\n    \"system_sequence_suffix\": \"\",\n```\n\nThere are no expected newlines or characters between tags.\n\nPlease ensure that Context Template is \"Enabled\" (checkbox above the dropdown)\n\n#### Chat Completions (Not Recommended)\n\nTo connect as Open AI Chat Completions in SillyTavern, follow these steps (the below screenshot is from SillyTavern):\n\n![Chat Completion Settings](Docs/Examples/Images/ST_chat_completion_settings.png)\n\n* Once connected, your presets are largely irrelevant and will be controlled by Wilmer; settings like temperature,\n  top_k, etc. The only field you need to update is your truncate length. I recommend setting it to the maximum your\n  front end will allow; in SillyTavern, that is around 200,000 tokens.\n* If you connect via chat/Completion, please go to presets, expand \"Character Names Behavior\", and set it to \"Message\n  Content\". If you do not do this, then go to your Wilmer user file and set `chatCompleteAddUserAssistant` to true. (I\n  don't recommend setting both to true at the same time. Do either character names from SillyTavern, OR user/assistant\n  from Wilmer. The AI might get confused otherwise.)\n\n### Additional Recommendations\n\nFor either connection type, I recommend going to the \"A\" icon in SillyTavern and selecting \"Include Names\" and \"Force\nGroups and Personas\" under instruct mode, and then going to the far left icon (where the samplers are) and checking\n\"stream\" on the top left, and then on the top right checking \"unlock\" under context and dragging it to 200,000+. Let\nWilmer\nworry about the context.\n\n### Connecting in Open WebUI\n\nWhen connecting to Wilmer from Open WebUI, simply connect to it as if it were an Ollama instance.\n\n![Ollama Open WebUI Settings](Docs/Examples/Images/OW_ollama_settings.png)\n\n## Quick-ish Setup\n\nWilmer currently has no user interface; everything is controlled via JSON configuration files located in the \"Public\"\nfolder. This folder contains all essential configurations. When updating or downloading a new copy of WilmerAI, you\nshould\nsimply copy your \"Public\" folder to the new installation to retain your settings.\n\nThis section will walk you through setting up Wilmer. I have broken the sections into steps; I might recommend copying\neach step, 1 by 1, into an LLM and asking it to help you set the section up. That may make this go much easier.\n\n**IMPORTANT NOTES**\n\u003e It is important to note three things about Wilmer setup.\n\u003e * A) Preset files are 100% customizable. What is in that file goes to the llm API. This is because cloud\n    APIs do not handle some of the various presets that local LLM APIs handle. As such, if you use OpenAI API\n    or other cloud services, the calls will probably fail if you use one of the regular local AI presets. Please\n    see the preset \"OpenAI-API\" for an example of what openAI accepts.\n\u003e\n\u003e\n\u003e * B) I have recently replaced all prompts in Wilmer to go from using the second person to third person. This has\n    had pretty decent results for me, and I'm hoping it will for you as well.\n\u003e\n\u003e\n\u003e * C) By default, all the user files are set to turn on streaming responses. You either need to enable\n    this in your front end that is calling Wilmer so that both match, or you need to go into Users/username.json\n    and set Stream to \"false\". If you have a mismatch, where the front end does/does not expect streaming and your\n    wilmer expects the opposite, nothing will likely show on the front end.\n\n### Step 1: Installing the Program\n\nInstalling Wilmer is straightforward. Ensure you have Python installed; the author has been using the program with\nPython 3.10 and 3.12, and both work well.\n\n**Option 1: Using Provided Scripts**\n\nFor convenience, Wilmer includes a BAT file for Windows and a .sh file for macOS. These scripts will create a virtual\nenvironment, install the required packages from `requirements.txt`, and then run Wilmer. You can use these scripts to\nstart Wilmer each time.\n\n- **Windows**: Run the provided `.bat` file.\n- **macOS**: Run the provided `.sh` file.\n- **linux**: The author doesn't have a Linux machine and can't test it, so none is provided\n\n\u003e **IMPORTANT:** Never run a BAT or SH file without inspecting it first, as this can be risky. If you are unsure about\n\u003e the safety of such a file, open it in Notepad/TextEdit, copy the contents and then ask your LLM to review it for any\n\u003e potential issues.\n\n**Option 2: Manual Installation**\n\nAlternatively, you can manually install the dependencies and run Wilmer with the following steps:\n\n1. Install the required packages:\n   ```bash\n   pip install -r requirements.txt\n   ```\n\n2. Start the program:\n   ```bash\n   python server.py\n   ```\n\nThe provided scripts are designed to streamline the process by setting up a virtual environment. However, you can safely\nignore them if you prefer manual installation.\n\n#### Script arguments for .bat, .sh and .py files:\n\n**NOTE**: When running either the bat file, the sh file or the python file, all three now accept the following\nOPTIONAL arguments:\n\n* \"--ConfigDirectory\": Directory where your config files can be found. By default, this is the \"Public\" folder within\n  the Wilmer root.\n* \"--LoggingDirectory\": The directory where file logs, if enabled, are stored. Be default file logging is turned OFF,\n  and in the event that they are enabled in the user json, they default to going to the \"logs\" folder in the Wilmer root\n* \"--User\": The user that you want to run under.\n\nSo, for example, consider the following possible runs:\n\n* `bash run_macos.sh` (will use user specified in _current-user.json, configs in \"Public\", logs in \"logs\")\n* `bash run_macos.sh --User \"single-model-assistant\"` (will default to public for configs and \"log\" for logs)\n* `bash run_macos.sh --ConfigDirectory \"/users/socg/Public/configs\" --User \"single-model-assistant\"` (will just\n  use default for \"logs\"\n* `bash run_macos.sh --ConfigDirectory \"/users/socg/Public/configs\" --User \"single-model-assistant\" --LoggingDirectory\n  \"/users/socg/wilmerlogs\"`\n\nThis these optional arguments allow users to spin up multiple instances of WilmerAI, each instance using a different\nuser profile, logging to a different place, and specifying configs at a different location, if desired.\n\n### Step 2 Fast Route: Use Pre-made Users\n\nWithin the Public/Configs you will find a series of folders containing json files. The two that you are\nmost interested in are the `Endpoints` folder and the `Users` folder.\n\n**NOTE:** The Factual workflow nodes of the `assistant-single-model`, `assistant-multi-model`\nand `group-chat-example` users will attempt to utilize the\n[OfflineWikipediaTextApi](https://github.com/SomeOddCodeGuy/OfflineWikipediaTextApi)\nproject to pull full wikipedia articles to RAG against. If you don't have this API, the workflow\nshould not have any issues, but I personally use this API to help improve the factual responses I get.\nYou can specify the IP address to your API in the user json of your choice.\n\nFirst, choose which template user you'd like to use:\n\n* **\\_example\\_simple\\_router\\_no\\_memory**: This is a simple user that has routing to WIKI, CODING and GENERAL\n  categories, each going to a special workflow. Best used with direct and productive front ends like Open WebUI.\n  Requires the Offline Wikipedia API\n\n* **\\_example\\_general\\_workflow**: This is a simple user that runs a single general purpose workflow. Simple, to the\n  point. Best used with direct and productive front ends like Open WebUI. Requires the Offline Wikipedia API\n\n* **\\_example\\_coding\\_workflow**: This is a simple user that runs a single coding workflow. Simple, to the point. Best\n  used with direct and productive front ends like Open WebUI. Requires the Offline Wikipedia API\n\n* **\\_example\\_wikipedia\\_multi\\_step\\_workflow**: This is a wikipedia search against the Offline Wikipedia API.\n  Requires the Offline Wikipedia API\n\n* **\\_example\\_wikipedia\\_multi\\_step\\_workflow**: This is a wikipedia search against the Offline Wikipedia API, but\n  instead of just 1 pass it does a total of 4, attempting to build up extra info for the report. Still very\n  experimental; not sure how I feel about the results yet. Requires the Offline Wikipedia API\n\n* **\\_example\\_assistant\\_with\\_vector\\_memory**: This template is for a simple \"assistant\" that will diligently think\n  through your message via a series of workflow nodes, and will attempt to track important facts in a simple vector\n  memory implementation (*EXPERIMENTAL*)\n\n  \u003e This user thinks a LOT, so it's slow and chews up tokens. I recommend using a non-reasoning model with this. Use\n  this with a local model or prepare for it to expensive\n\n* **\\_example\\_game\\_bot\\_with\\_file\\_memory**: This is best used with a game front end, like a custom text game\n  implementation or SillyTavern. This is an experimental user with the goal of trying to solve some of the common\n  complaints or problems that have voiced on various boards. Feedback is welcome.\n\n  \u003e Again this is expensive and thinks a lot. It's very slow.\n\n**IMPORTANT**: Each of the above users call custom workflows pointing to workflows in the _common directory. You can\nfind other workflows to swap out as well.\n\nOnce you have selected the user that you want to use, there are a couple of steps to perform:\n\n1) Update the endpoints for your user under Public/Configs/Endpoints. The example characters are sorted into folders\n   for each. The user's endpoint folder is specified at the bottom of their user.json file. You will want to fill in\n   every endpoint\n   appropriately for the LLMs you are using. You can find some example endpoints under the `_example-endpoints` folder.\n    1) **NOTE** Currently, there is best support for standard openai chat completions and v1 completions endpoints, and\n       recently KoboldCpp's generate endpoint was added to the mix, since that is the author's favorite to use. If you\n       use\n       koboldcpp, I HIGHLY recommend turning off context shifting (--noshift). It will absolutely break Wilmer.\n\n2) You will need to set your current user. You can do this when running the bat/sh/py file by using the --User argument,\n   or you can do this in Public/Configs/Users/_current-user.json.\n   Simply put the name of the user as the current user and save.\n\n3) You will want to open your user json file and peek at the options. Here you can set whether you want streaming or\n   not,\n   can set the IP address to your offline wiki API (if you're using it), specify where you want your memories/summary\n   files\n   to go during DiscussionId flows, and also specify where you want the sqllite db to go if you use Workflow Locks.\n\nThat's it! Run Wilmer, connect to it, and you should be good to go.\n\n### Step 2 Slow Route: Endpoints and Models (Learn How to Actually Use the Thing)\n\nFirst, we'll set up the endpoints and models. Within the Public/Configs folder you should see the following sub-folders.\nLet's\nwalk\nthrough what you need.\n\n### **Endpoints**\n\nThese configuration files represent the LLM API endpoints you are connected to. For example, the following JSON file,\n`SmallModelEndpoint.json`, defines an endpoint:\n\n```json\n{\n  \"modelNameForDisplayOnly\": \"Small model for all tasks\",\n  \"endpoint\": \"http://12.0.0.1:5000\",\n  \"apiTypeConfigFileName\": \"KoboldCpp\",\n  \"maxContextTokenSize\": 8192,\n  \"modelNameToSendToAPI\": \"\",\n  \"trimBeginningAndEndLineBreaks\": true,\n  \"dontIncludeModel\": false,\n  \"removeThinking\": true,\n  \"startThinkTag\": \"\u003cthink\u003e\",\n  \"endThinkTag\": \"\u003c/think\u003e\",\n  \"openingTagGracePeriod\": 100,\n  \"expectOnlyClosingThinkTag\": false,\n  \"addTextToStartOfSystem\": true,\n  \"textToAddToStartOfSystem\": \"/no_think \",\n  \"addTextToStartOfPrompt\": false,\n  \"textToAddToStartOfPrompt\": \"\",\n  \"addTextToStartOfCompletion\": false,\n  \"textToAddToStartOfCompletion\": \"\",\n  \"ensureTextAddedToAssistantWhenChatCompletion\": false,\n  \"removeCustomTextFromResponseStartEndpointWide\": false,\n  \"responseStartTextToRemoveEndpointWide\": []\n}\n```\n\n- **endpoint**: The address of the LLM API that you are connecting to. Must be an OpenAI-compatible API of either text\n  Completions or Chat Completions type (if you're unsure—that's the vast majority of APIs, so this will probably work\n  with whatever you're trying).\n- **apiTypeConfigFileName**: The exact name of the JSON file from the `ApiTypes` folder that specifies what type of API\n  this is, minus the \".json\" extension. \"Open-AI-API\" will probably work for most cloud services.\n- **maxContextTokenSize**: Specifies the max token size that your endpoint can accept. This is used to set the model's\n  truncation length property.\n- **modelNameToSendToAPI**: Specifies what model name to send to the API. For cloud services, this can be important. For\n  example, OpenAI may expect \"gpt-3.5-turbo\" here. For local AI running in Kobold, text-generation-webui, etc., this is\n  mostly unused. (Ollama may use it, though).\n- **trimBeginningAndEndLineBreaks**: This boolean will run a trim at the start and end of the final response to remove\n  any spaces or linebreaks before or after the text. Some LLMs don't handle those extra spaces/lines well.\n- **dontIncludeModel**: This will NOT send the model name you specify in your endpoint config to the LLM API endpoint.\n  Generally, sending that model name will tell systems like MLX, Llama.cpp server, and Ollama to load the model with\n  that name. You may have a reason why you don't want it to do that and instead have the model you already loaded on\n  that port be used. Setting this to `true` will stop the model name from being sent.\n- **removeThinking**: This boolean is for reasoning models. By setting this to `true`, it will completely strip out the\n  thinking text from responses coming from LLMs, both for streaming and non-streaming. (NOTE: When streaming, this\n  buffers the response until thinking is done. That means it looks like the LLM isn't sending you anything, but in\n  actuality, it's thinking. The moment the thinking is done, this will remove the thinking block and start sending you\n  the LLM's response. So as a user, it just looks like the time to first token is far longer than it is.)\n- **startThinkTag** \u0026 **endThinkTag**: Allows you to set custom think tags. Some LLMs do things like `\u003creasoning\u003e` or\n  `\u003cthinking\u003e` as opposed to `\u003cthink\u003e`. With these, each endpoint can account for the specific start and end tags it\n  expects. Both must be defined for `removeThinking` to work.\n- **openingTagGracePeriod**: An integer defining the number of characters at the beginning of the LLM's response to scan\n  for a `startThinkTag`. If the tag is not found within this window, the system assumes there is no thinking block and\n  disables removal for the rest of the response.\n- **expectOnlyClosingThinkTag**: This is for models that sometimes don't send their opening think tag and instead just\n  start thinking. This will continue to buffer the response until the `endThinkTag` appears, at which point it removes\n  everything before that and sends the rest of the stream. If no closing tag appears, you may get a dump of the whole\n  response at once.\n- **addTextToStartOfSystem**: This will add whatever text you put in `textToAddToStartOfSystem` to the start of the\n  system prompt. Made specifically for models that accept commands like \"/no\\_think \". This will make every prompt run\n  by this specific endpoint add that text.\n- **textToAddToStartOfSystem**: The text to add if `addTextToStartOfSystem` is `true`.\n- **addTextToStartOfPrompt**: Same as the one for system, but this adds it to the beginning of the last user message in\n  a chat history, or the beginning of the whole user prompt in a `v1/Completion` context.\n- **textToAddToStartOfPrompt**: The text to add if `addTextToStartOfPrompt` is `true`.\n- **addTextToStartOfCompletion**: This is meant to seed the start of the AI's response. The intention was for reasoning\n  models, so that you can forcefully add opening and closing think tags. However, you can also use it to force the LLM\n  to respond in certain ways, like the old trick of having the LLM always start with \"Absolutely\\! Here is your\n  answer: \".\n- **textToAddToStartOfCompletion**: The text to add if `addTextToStartOfCompletion` is `true`.\n- **ensureTextAddedToAssistantWhenChatCompletion**: If `addTextToStartOfCompletion` is enabled for a Chat Completions\n  model, this setting will ensure the text is added inside a new assistant message if the conversation does not already\n  end with one. Some inference APIs may not appreciate this.\n- **removeCustomTextFromResponseStartEndpointWide**: A boolean that, if `true`, enables the removal of custom\n  boilerplate text from the beginning of an LLM's response.\n- **responseStartTextToRemoveEndpointWide**: A list of strings to check for and remove from the start of the LLM's\n  response if `removeCustomTextFromResponseStartEndpointWide` is `true`. For example,\n  `[\"Assistant:\", \"Okay, here's the answer:\"]`. The system will remove the first match it finds.\n\n##### Amusing Example of Completion Seeding\n\nAs a final quick test of addTextToStartOfCompletion before getting this commit ready, I ran the below test having my\nworkflow describe a picture of a cat that I sent it. The response amused me.\n\n```json\n  \"addTextToStartOfCompletion\": true,\n\"textToAddToStartOfCompletion\": \"Roses are red, violets are blue,  \",\n\"ensureTextAddedToAssistantWhenChatCompletion\": true\n```\n\n![Completion Seeding: Roses Are Red...](Docs/Examples/Images/Completion_Seed_Roses_Red.png)\n\n#### ApiTypes\n\nThese configuration files represent the different API types that you might be hitting when using Wilmer.\n\n```json\n{\n  \"nameForDisplayOnly\": \"KoboldCpp Example\",\n  \"type\": \"koboldCppGenerate\",\n  \"presetType\": \"KoboldCpp\",\n  \"truncateLengthPropertyName\": \"max_context_length\",\n  \"maxNewTokensPropertyName\": \"max_length\",\n  \"streamPropertyName\": \"stream\"\n}\n```\n\n- **type**: Can be either: `KoboldCpp`, `OllamaApiChat`, `OllamaApiChatImageSpecific`, `OllamaApiGenerate`,\n  `Open-AI-API`, `OpenAI-Compatible-Completions`, or `Text-Generation-WebUI`.\n- **presetType**: This specifies the name of the folder that houses the presets you want to use. If you peek in the\n  Presets\n  folder, you'll see what I mean. Kobold has the best support. I plan to add more support for others later. With that\n  said, there is absolutely nothing stopping you from making a new folder in Presets, putting your own json in with\n  whatever\n  your favorite LLM program accepts with the payload, making a new API type json, and using it. Very little about\n  presets are hardcoded. I suspect that when I try to add proper support for Ollama and text-generation-webui, I may not\n  need\n  any code changes at all; just some new jsons/folders.\n- **truncateLengthPropertyName**: This specifies what the API expects the max context size field to be called\n  when sending a request. Compare the Open-AI-API file to the KoboldCpp file; Open-AI-API doesn't support this\n  field at all, so we left it blank. Whereas KoboldCpp does support it, and it expects us to send the value\n  with the property name \"truncation_length\". If you are unsure what to do, for locally running APIs I recommend\n  trying KoboldCpp's settings, and for cloud I recommend trying Open-AI-API's settings. The actual value we send\n  here is in the Endpoints config.\n- **maxNewTokensPropertyName**: Similar to the truncate length, this is the API's expected property name\n  for the number of tokens you want the LLM to respond with. The actual value we send here is on each individual\n  node within workflows.\n- **streamPropertyName**: Same as max tokens and truncate length. This specifies the field name for whether to\n  stream the response to the front end or send the whole response as a text block once it is done.\n\n#### PromptTemplates\n\nThese files specify the prompt template for a model. Consider the following example, `llama3.json`:\n\n```json\n{\n  \"promptTemplateAssistantPrefix\": \"\u003c|start_header_id|\u003eassistant\u003c|end_header_id|\u003e\\n\\n\",\n  \"promptTemplateAssistantSuffix\": \"\u003c|eot_id|\u003e\",\n  \"promptTemplateEndToken\": \"\",\n  \"promptTemplateSystemPrefix\": \"\u003c|start_header_id|\u003esystem\u003c|end_header_id|\u003e\\n\\n\",\n  \"promptTemplateSystemSuffix\": \"\u003c|eot_id|\u003e\",\n  \"promptTemplateUserPrefix\": \"\u003c|start_header_id|\u003euser\u003c|end_header_id|\u003e\\n\\n\",\n  \"promptTemplateUserSuffix\": \"\u003c|eot_id|\u003e\"\n}\n```\n\nThese templates are applied to all v1/Completion endpoint calls. If you prefer not to use a template, there is a file\ncalled `_chatonly.json` that breaks up messages with newlines only.\n\n### Step 3: Creating a User\n\nCreating and activating a user involves four major steps. Follow the instructions below to set up a new user.\n\n#### Users Folder\n\nFirst, within the `Users` folder, create a JSON file for the new user. The easiest way to do this is to copy an existing\nuser JSON file, paste it as a duplicate, and then rename it. Here is an example of a user JSON file:\n\n```json\n{\n  \"port\": 5006,\n  \"stream\": true,\n  \"customWorkflowOverride\": false,\n  \"customWorkflow\": \"CodingWorkflow-LargeModel-Centric\",\n  \"routingConfig\": \"assistantSingleModelCategoriesConfig\",\n  \"categorizationWorkflow\": \"CustomCategorizationWorkflow\",\n  \"defaultParallelProcessWorkflow\": \"SlowButQualityRagParallelProcessor\",\n  \"fileMemoryToolWorkflow\": \"MemoryFileToolWorkflow\",\n  \"chatSummaryToolWorkflow\": \"GetChatSummaryToolWorkflow\",\n  \"conversationMemoryToolWorkflow\": \"CustomConversationMemoryToolWorkflow\",\n  \"recentMemoryToolWorkflow\": \"RecentMemoryToolWorkflow\",\n  \"discussionIdMemoryFileWorkflowSettings\": \"_DiscussionId-MemoryFile-Workflow-Settings\",\n  \"discussionDirectory\": \"D:\\\\Temp\",\n  \"sqlLiteDirectory\": \"D:\\\\Temp\",\n  \"chatPromptTemplateName\": \"_chatonly\",\n  \"verboseLogging\": true,\n  \"chatCompleteAddUserAssistant\": true,\n  \"chatCompletionAddMissingAssistantGenerator\": true,\n  \"useOfflineWikiApi\": true,\n  \"offlineWikiApiHost\": \"127.0.0.1\",\n  \"offlineWikiApiPort\": 5728,\n  \"endpointConfigsSubDirectory\": \"assistant-single-model\",\n  \"presetConfigsSubDirectoryOverride\": \"preset-folder-that-is-not-username\",\n  \"useFileLogging\": false\n}\n```\n\n- **port**: Specifies the port Wilmer should run on. Choose a port that is not in use. By default, Wilmer hosts\n  on `0.0.0.0`, making it visible on your network if run on another computer. Running multiple instances of Wilmer on\n  different ports is supported.\n- **stream**: Determines whether to stream the output of the LLM to the UI. This setting must match between Wilmer and\n  the front end.\n- **customWorkflowOverride**: When `true`, the router is disabled, and all prompts go only to the specified workflow,\n  making it a single workflow instance of Wilmer.\n- **customWorkflow**: The custom workflow to use when `customWorkflowOverride` is `true`.\n- **routingConfig**: The name of a routing config file from the `Routing` folder, without the `.json` extension.\n- **categorizationWorkflow**: Specifies the workflow used to categorize your prompt. Review and adjust this workflow to\n  improve categorization results.\n- **defaultParallelProcessWorkflow**: The workflow for parallel processing tasks. If you copy another user folder to\n  make yours, you can likely just leave this alone for now, other than changing the endpoints.\n- **fileMemoryToolWorkflow**: The workflow for file memory tools. If you copy another user folder to make yours, you can\n  likely just leave this alone for now, other than changing the endpoints.\n- **chatSummaryToolWorkflow**: The workflow for chat summary tools. If you copy another user folder to make yours, you\n  can likely just leave this alone for now, other than changing the endpoints.\n- **conversationMemoryToolWorkflow**: The workflow for conversation memory tools. If you copy another user folder to\n  make yours, you can likely just leave this alone for now, other than changing the endpoints.\n- **recentMemoryToolWorkflow**: The workflow for recent memory tools. If you copy another user folder to make yours, you\n  can likely just leave this alone for now, other than changing the endpoints.\n- **discussionIdMemoryFileWorkflowSettings**: Settings for the memory file, including memory chunk size and summary\n  prompts.\n- **discussionDirectory**: Specifies where discussion files are stored. Ensure this directory exists to avoid crashes\n  when using `DiscussionId`.\n- **sqlLiteDirectory**: Specifies where the sql lite db will be created if you are using workflow locks.\n- **chatPromptTemplateName**: Specifies the chat prompt template.\n- **verboseLogging**: Currently unused but reserved for future use.\n- **chatCompleteAddUserAssistant**: When Wilmer is connected to as a chat/Completions endpoint, sometimes the front end\n  won't include names in the messages. This can cause issues for Wilmer. This setting adds \"User:\" and \"Assistant:\" to\n  messages for better context understanding in that situation.\n- **chatCompletionAddMissingAssistantGenerator**: Creates an empty \"Assistant:\" message as the last message, sort of\n  like a prompt generator, when being connected to as chat/Completions endpoint. This is only used\n  if `chatCompleteAddUserAssistant` is `true`.\n- **useOfflineWikiApi**: This specifies whether you want to use\n  the [OfflineWikipediaTextApi](https://github.com/SomeOddCodeGuy/OfflineWikipediaTextApi) for factual workflows\n  or for the example group's `DataFinder` character.\n- **offlineWikiApiHost**: IP of the computer running the OfflineWikipediaTextApi.\n- **offlineWikiApiPort**: Port for your wiki API. Unless you specifically change this, it's already good in all the\n  example user configs.\n- **endpointConfigsSubDirectory**: Name of the subfolder in Endpoints where your endpoint jsons will live.\n- **presetConfigsSubDirectoryOverride**: This is an optional field to specify a different preset sub-directory folder\n  name\n  than default. The default preset subdirectory folder name will be your username. For backwards compatibility, if it\n  cannot find the preset in your username or whatever custom foldername you give, it will look in the root of the api\n  type you are using, as that's where presets use to live.\n- **useFileLogging**: Specifies whether to log the outputs from Wilmer to a file. By default this is false, and if the\n  value does not exist in the config it is false. When false, the logs will be printed to the console. NOTE: The\n  optional argument --LoggingDirectory for the .bat, .sh or .py files allow you to override where the logs are written.\n  By default they go to the root WilmerAI/logs directory.\n\n#### Users Folder, _current-user.json File\n\nNext, update the `_current-user.json` file to specify what user you want to use. Match the name of the new user JSON\nfile,\nwithout the `.json` extension.\n\n**NOTE**: You can ignore this if you want to use the --User argument when running Wilmer instead.\n\n#### Routing Folder\n\nCreate a routing JSON file in the `Routing` folder. This file can be named anything you want. Update the `routingConfig`\nproperty in your user JSON file with this name, minus the `.json` extension. Here is an example of a routing config\nfile:\n\n```json\n{\n  \"CODING\": {\n    \"description\": \"Any request which requires a code snippet as a response\",\n    \"workflow\": \"CodingWorkflow\"\n  },\n  \"FACTUAL\": {\n    \"description\": \"Requests that require factual information or data\",\n    \"workflow\": \"ConversationalWorkflow\"\n  },\n  \"CONVERSATIONAL\": {\n    \"description\": \"Casual conversation or non-specific inquiries\",\n    \"workflow\": \"FactualWorkflow\"\n  }\n}\n```\n\n- **Element Name**: The category, such as \"CODING\", \"FACTUAL\", or \"CONVERSATIONAL\".\n- **description**: Sent to the categorizing LLM along with the category name to help with prompt categorization.\n- **workflow**: The name of the workflow JSON file, without the `.json` extension, triggered if the category is chosen.\n\n#### Workflow Folder\n\nIn the `Workflow` folder, create a new folder that matches the username from the `Users` folder. The quickest way to do\nthis is to copy an existing user's folder, duplicate it, and rename it.\n\nIf you choose to make no other changes, you will need to go through the workflows and update the endpoints to point to\nthe endpoint you want. If you are using an example workflow added with Wilmer, then you should already be fine here.\n\n## Quick Setup RECAP:\n\nWithin the \"Public\" folder you should have:\n\n* You should have created/edited an endpoint to point to your LLM and set up your model\n* You should have made a json file with your username in Users folder\n* You should have updated _current-user with your new username, or an existing one if you are using a pre-included user\n* You should have made a routing json file with your categories in the Routing folder, or chosen the one you want to use\n  that is pre-existing\n* You should have ensured your new user json file has the correct routing config specified\n* You should have a folder with your user's name in the Workflows folder\n    * This folder should contain a json matching every workflow from your user folder\n    * This folder should contain a json matching every workflow from your Routing config\n        * If you're missing a workflow, Wilmer will crash.\n\n## Understanding Workflows\n\n### Setting up Workflows\n\nWorkflows in this project are modified and controlled in the `Public/Workflows` folder, within your user's specific\nworkflows folder. For example, if your user is named `socg` and you have a `socg.json` file in the `Users` folder, then\nwithin workflows you should have a `Workflows/socg` folder.\n\n### Workflow Structure\n\nWorkflows are JSON files made up of \"nodes\" that execute sequentially. The system has been updated to support a more\npowerful dictionary-based format that allows for top-level configuration and variables, making workflows much cleaner\nand easier to manage.\n\n#### New Format (Recommended)\n\nThis format allows you to define custom variables at the top level of the JSON. These variables can then be used in any\nnode throughout the workflow.\n\n```json\n{\n  \"persona\": \"You are a helpful and creative AI assistant.\",\n  \"shared_endpoint\": \"OpenWebUI-NoRouting-Single-Model-Endpoint\",\n  \"nodes\": [\n    {\n      \"title\": \"Gather Relevant Memories\",\n      \"type\": \"VectorMemorySearch\",\n      \"endpointName\": \"{shared_endpoint}\"\n    },\n    {\n      \"title\": \"Respond to User\",\n      \"type\": \"Standard\",\n      \"systemPrompt\": \"{persona}\\n\\nHere are some relevant memories from our past conversations:\\n[\\n{agent1Output}\\n]\",\n      \"endpointName\": \"{shared_endpoint}\",\n      \"preset\": \"Conversational_Preset\",\n      \"returnToUser\": true\n    }\n  ]\n}\n```\n\n#### Old Format (Still Supported)\n\nFor 100% backward compatibility, the original format, which is just a list of nodes, is still fully supported and will\nwork without any changes.\n\n```json\n[\n  {\n    \"title\": \"Coding Agent\",\n    \"systemPrompt\": \"You are an exceptionally powerful and intelligent technical AI...\",\n    \"prompt\": \"\",\n    \"lastMessagesToSendInsteadOfPrompt\": 6,\n    \"endpointName\": \"SocgMacStudioPort5002\",\n    \"preset\": \"Coding\"\n  },\n  {\n    \"title\": \"Reviewing Agent\",\n    \"systemPrompt\": \"You are an exceptionally powerful and intelligent technical AI...\",\n    \"prompt\": \"Please critically review the response: {agent1Output}\",\n    \"endpointName\": \"SocgMacStudioPort5002\",\n    \"preset\": \"Coding\"\n  }\n]\n```\n\n### Node Properties\n\nA workflow is made up of one or more nodes. Each node is a JSON object with properties that define its behavior.\n\n- **`type`**: **(Required)** A string that determines the node's function. This is the most important property. Common\n  types include `\"Standard\"` (for LLM calls), `\"PythonModule\"` (for running custom scripts), `\"VectorMemorySearch\"`, and\n  `\"CustomWorkflow\"` (for running another workflow).\n- **`title`**: A descriptive name for the node. This is for your use only and may show up in console logs to help with\n  debugging.\n- **`systemPrompt`**: The system prompt to send to the LLM API.\n- **`prompt`**: The user prompt to send. If left blank, the conversation history will be sent instead, based on the\n  `lastMessagesToSendInsteadOfPrompt` value.\n- **`lastMessagesToSendInsteadOfPrompt`**: Specify how many recent messages to send to the LLM if the `prompt` field is\n  an empty string.\n- **`endpointName`**: The name of the LLM API endpoint to use for this node. This must match a JSON file name from the\n  `Endpoints` folder (without the `.json` extension).\n- **`preset`**: The preset to use for the API call, controlling parameters like temperature and token limits. This must\n  match a JSON file name from the `Presets` folder (without the `.json` extension).\n- **`maxResponseSizeInTokens`**: Overrides the preset to specify the maximum number of tokens for the LLM's response for\n  this specific node.\n- **`addUserTurnTemplate`**: A boolean. Set to `false` if you are sending raw conversation history. Set to `true` if you\n  are sending a custom string via the `prompt` field that should be wrapped in the user turn template.\n- **`returnToUser`**: A boolean. If set to `true` on a node that is not the final one, its output will be returned to\n  the user immediately. This is useful for \"fire and forget\" tasks where later nodes perform background actions (like\n  saving memories) without making the user wait.\n- **`useRelativeTimestamps`**: A boolean. If set to `true`, timestamps will be prepended to messages in a\n  human-readable, relative format (e.g., `[Sent 5 minutes ago]`). If omitted or `false`, absolute timestamps are used.\n- **`workflowName`**: Used only in `\"CustomWorkflow\"` nodes. Specifies the file name of the sub-workflow to execute.\n- **`scoped_variables`**: Used only in `\"CustomWorkflow\"` nodes. A list of variables from the current workflow (e.g.,\n  `[\"{agent1Output}\"]`) to pass into the sub-workflow. These become available as `{agent#Input}` variables inside the\n  sub-workflow.\n\n### Variables in Prompts\n\nYou can use a rich set of dynamic variables within `systemPrompt` and `prompt` fields. These placeholders will be\nreplaced with real-time values when the workflow runs.\n\n#### Inter-Node \u0026 Workflow Variables\n\n- **`{agent#Output}`**: The result from a previously executed node within the *same* workflow. The `#` corresponds to\n  the node's position (e.g., `{agent1Output}` for the first node, `{agent2Output}` for the second).\n- **`{agent#Input}`**: A value passed from a parent workflow into a sub-workflow via `scoped_variables`. For example,\n  `{agent1Input}` is the first variable passed from the parent.\n- **`{custom_variable}`**: Any custom key defined at the top level of a workflow JSON (in the new dictionary format) is\n  available as a variable. For example, if you define `\"persona\": \"You are a pirate.\"`, you can use `{persona}` in any\n  prompt within that workflow.\n\n#### Conversation \u0026 Message Variables\n\n- **`{chat_user_prompt_last_one}`**: The raw text content of the last message in the conversation. Also available for\n  `two`, `three`, `four`, `five`, `ten`, and `twenty` messages.\n- **`{templated_user_prompt_last_one}`**: The last message, but wrapped in the appropriate user/assistant prompt\n  template tags. Also available for `two`, `three`, `four`, `five`, `ten`, and `twenty`.\n- **`{chat_system_prompt}`**: The system prompt sent from the front-end client (e.g., a character card).\n- **`{templated_system_prompt}`**: The front-end system prompt, wrapped in the appropriate system prompt template tag.\n- **`{messages}`**: The raw, complete list of message objects. This is primarily useful for advanced templating with\n  `jinja2` enabled on the node.\n\n#### Date, Time \u0026 Context Variables\n\n- **`{todays_date_pretty}`**: Today's date, e.g., \"August 17, 2025\".\n- **`{todays_date_iso}`**: Today's date in ISO format, e.g., \"2025-08-17\".\n- **`{current_time_12h}`**: The current time in 12-hour format, e.g., \"8:48 PM\".\n- **`{current_time_24h}`**: The current time in 24-hour format, e.g., \"20:48\".\n- **`{current_month_full}`**: The full name of the current month, e.g., \"August\".\n- **`{current_day_of_week}`**: The full name of the current day, e.g., \"Sunday\".\n- **`{current_day_of_month}`**: The day of the month as a number, e.g., \"17\".\n- **`{time_context_summary}`**: A natural language summary of the conversation's timeline,\n  e.g., \"[Time Context: This conversation started 2 hours ago. The most recent message was sent 5 minutes ago.]\".\n\n#### Prompt Routing Variables\n\nThese variables are automatically available in categorization workflows and are populated from your routing\nconfiguration file.\n\n- **`{category_colon_descriptions}`**: A semicolon-separated list of categories and their descriptions. Example: \"\n  CODING: Any request which requires a code snippet...; FACTUAL: Requests that require factual information...\".\n- **`{category_colon_descriptions_newline_bulletpoint}`**: The same as above, but formatted as a bulleted list.\n- **`{categoriesSeparatedByOr}`**: A simple list of just the category names. Example: \"CODING or FACTUAL or\n  CONVERSATIONAL\".\n- **`{categoryNameBulletpoints}`**: A bulleted list of just the category names.\n\n#### Special Variables\n\n- **`[TextChunk]`**: A special placeholder primarily used within memory-generation workflows (e.g., inside\n  `fileMemoryWorkflowName` or `vectorMemoryWorkflowName`). It represents a specific block of conversation text that is\n  being analyzed or summarized.\n\n### Other Types of Nodes\n\n#### The Memory System: Creators and Retrievers\n\nThe memory system has been fundamentally redesigned for performance and power. The core principle is a separation of\nconcerns between two types of nodes: **Creators** and **Retrievers**.\n\n* **Memory Creators (Write Operations)**: These are computationally \"heavy\" nodes that analyze the conversation,\n  generate new memories, and save them to files. This process is designed to run in the background, often after a\n  workflow lock, so it doesn't slow down the user experience.\n* **Memory Retrievers (Read Operations)**: These are \"lightweight\" nodes that perform fast, inexpensive lookups of\n  existing memories to provide context for an AI's response.\n\nThis split allows you to build highly responsive workflows. You can retrieve existing context instantly at the beginning\nof a workflow, get a fast reply to the user, and then trigger a memory creation node in the background to update the\nmemories with the latest turn of the conversation.\n\n-----\n\n### **Memory Creator Node**\n\nThis is the engine of the memory system. You only need one type of creator node, which handles all types of memory\ngeneration.\n\n#### QualityMemory\n\nThis is the primary and only node for **creating and updating** all persistent memories. When this node runs, it checks\nthe conversation history and, if enough new messages have been added, it will generate and save new memories. It can\ncreate classic file-based memories or the new, powerful searchable vector memories, depending on your configuration.\nThis node does **not** return any text to the workflow; its only job is to write memories to storage in the background.\n\nIt's best practice to place this node at the end of a workflow, after a workflow lock, to ensure memory generation\ndoesn't delay the AI's response to the user.\n\n```json\n{\n  \"id\": \"create_memories_node\",\n  \"type\": \"QualityMemory\",\n  \"name\": \"Create or Update All Memories\"\n}\n```\n\n-----\n\n### **Memory Retriever Nodes**\n\nThese nodes read from the memory files that the `QualityMemory` node creates. They are fast and provide different kinds\nof context to your AI.\n\n#### RecentMemorySummarizerTool\n\nThis node quickly **reads** the last few memory chunks from the long-term memory file (`\u003cid\u003e_memories.jsonl`). It's\nexcellent for providing the AI with immediate context on what was discussed recently. You can specify how many of the\nmost recent summarized chunks to retrieve.\n\nNote that if a `discussionId` is not active, this node falls back to simply pulling the last `N` turns directly from the\ncurrent chat history, acting as a stateless memory provider.\n\n```json\n{\n  \"id\": \"get_recent_memories_node\",\n  \"type\": \"RecentMemorySummarizerTool\",\n  \"name\": \"Get Recent Memories\",\n  \"maxTurnsToPull\": 0,\n  \"maxSummaryChunksFromFile\": 5,\n  \"customDelimiter\": \"\\n------------\\n\"\n}\n```\n\n* `maxSummaryChunksFromFile`: Specifies how many of the latest memory chunks to pull from the file.\n\n#### FullChatSummary\n\nThis node **reads** the single, continuously updated \"rolling summary\" of the entire conversation from the chat summary\nfile (`\u003cid\u003e_summary.jsonl`). Use this to give the AI a high-level, condensed overview of the entire chat history from\nstart to finish. This node does **not** generate or update the summary; it only retrieves the existing one.\n\n```json\n{\n  \"id\": \"get_full_summary_node\",\n  \"type\": \"FullChatSummary\",\n  \"name\": \"Get Full Chat Summary\"\n}\n```\n\n#### VectorMemorySearch\n\nThis is the new **smart search** node, designed for Retrieval-Augmented Generation (RAG). It performs a powerful,\nrelevance-based search against the dedicated vector memory database (`\u003cid\u003e_vector_memory.db`). Instead of just getting\nrecent context, this node allows you to look up specific facts, topics, or details from anywhere in the conversation\nhistory.\n\nThe search input **must be a string of keywords separated by semicolons (`;`)**. The node will find the memory chunks\nmost relevant to those keywords and return them, ranked by relevance.\n\n```json\n{\n  \"id\": \"smart_search_node\",\n  \"type\": \"VectorMemorySearch\",\n  \"name\": \"Search for Specific Details\",\n  \"input\": \"Project Stardust;mission parameters;Dr. Evelyn Reed\"\n}\n```\n\n-----\n\n### **Configuring Memory Generation (`_DiscussionId-MemoryFile-Workflow-Settings.json`)**\n\nThe behavior of the `QualityMemory` node is controlled by a dedicated configuration file for each `discussionId`. This\nis where you decide what kind of memories to create and how they should be generated.\n\nHere is a breakdown of the key configuration options:\n\n```json\n{\n  // This is the master switch for the new memory system.\n  // Set to true to create searchable vector memories.\n  // Set to false to use the classic file-based memory system.\n  \"useVectorForQualityMemory\": true,\n  // ====================================================================\n  // == Vector Memory Configuration (Only used if the above is true) ==\n  // ====================================================================\n\n  // For advanced users: specify a workflow to generate the structured JSON for a vector memory.\n  \"vectorMemoryWorkflowName\": \"my-vector-memory-workflow\",\n  // The LLM endpoint to use specifically for vector memory generation. Falls back to \"endpointName\".\n  \"vectorMemoryEndpointName\": \"gpt-4-turbo\",\n  // The preset for the specified endpoint. Falls back to \"preset\".\n  \"vectorMemoryPreset\": \"default_preset_for_json_output\",\n  // The max response size for the generated JSON. Falls back to \"maxResponseSizeInTokens\".\n  \"vectorMemoryMaxResponseSizeInTokens\": 1024,\n  // The target size in tokens for a chunk of conversation before it's processed.\n  \"vectorMemoryChunkEstimatedTokenSize\": 1000,\n  // The max number of new messages before forcing processing, even if token size isn't met.\n  \"vectorMemoryMaxMessagesBetweenChunks\": 5,\n  // How many of the most recent turns to ignore. This prevents summarizing an in-progress thought.\n  \"vectorMemoryLookBackTurns\": 3,\n  // ====================================================================\n  // == File-based Memory Configuration (Only used if the switch is false) ==\n  // ====================================================================\n\n  // For advanced users: specify a workflow to generate the summary text for a file-based memory.\n  \"fileMemoryWorkflowName\": \"my-file-memory-workflow\",\n  // The system prompt used for the summarization LLM call when not using a workflow.\n  \"systemPrompt\": \"You are an expert summarizer. Your task is to extract key facts...\",\n  // The user prompt used for the summarization LLM call. [TextChunk] is replaced automatically.\n  \"prompt\": \"Please summarize the following conversation chunk: [TextChunk]\",\n  // The target size in tokens for a chunk of conversation before it's summarized.\n  \"chunkEstimatedTokenSize\": 1000,\n  // The max number of new messages before forcing a summarization, even if token size isn't met.\n  \"maxMessagesBetweenChunks\": 5,\n  // How many of the most recent turns to ignore for file-based memory generation.\n  \"lookbackStartTurn\": 3,\n  // ====================================================================\n  // == General / Fallback LLM Settings                           ==\n  // ====================================================================\n\n  // The default LLM endpoint to use if a specific one (e.g., vectorMemoryEndpointName) isn't set.\n  \"endpointName\": \"default_endpoint\",\n  // The default preset to use.\n  \"preset\": \"default_preset\",\n  // The default max response size in tokens.\n  \"maxResponseSizeInTokens\": 400\n}\n```\n\n* **`useVectorForQualityMemory`**: This boolean is the most important setting. `true` enables the creation of a\n  searchable SQLite database for the discussion. `false` falls back to the classic `.jsonl` memory file.\n* **`vectorMemoryWorkflowName` / `fileMemoryWorkflowName`**: These keys allow you to specify the name of a sub-workflow\n  to handle memory generation. This gives you complete control over the summarization process, allowing for multi-step\n  logic (e.g., extracting topics then summarizing each one). If a workflow name is not provided, the system falls back\n  to a direct LLM call using the `systemPrompt` and `prompt` fields.\n* **`chunkEstimatedTokenSize` / `maxMessagesBetweenChunks`**: These values control how often the `QualityMemory` node\n  decides to create a new memory chunk. A new memory is created if either the token count of new messages exceeds\n  `chunkEstimatedTokenSize` OR the number of new messages exceeds `maxMessagesBetweenChunks`.\n\n#### Parallel Processing Node\n\nThese nodes are used for memories and chat summary right now. These will break the memories up into chunks, and use\nmultiple LLMs to iterate through them. Every endpoint specified here will be utilized.\n\n```json\n{\n  \"title\": \"\",\n  \"agentName\": \"\",\n  \"systemPrompt\": \"You are an intelligent and capable assistant. Please answer the following request completely\",\n  \"prompt\": \"\",\n  \"multiModelList\": [\n    {\n      \"endpointName\": \"SocgWindowsPort5003\"\n    },\n    {\n      \"endpointName\": \"SocgMacStudioPort5001\"\n    },\n    {\n      \"endpointName\": \"SocgMacbookPort5004\"\n    }\n  ],\n  \"preset\": \"Default\",\n  \"type\": \"SlowButQualityRAG\",\n  \"ragTarget\": \"\",\n  \"ragType\": \"RecentMemory\",\n  \"maxResponseSizeInTokens\": 400,\n  \"addUserTurnTemplate\": true\n}\n```\n\n* IMPORTANT: Don't fiddle with these too much. Right now they are used for specific purposes and are not very flexible.\n  You can change the following:\n    * multiModelList: add or remove as many endpoints as you want here\n    * preset: Change the preset to whatever you want\n    * prompt and system prompt: if they are filled within the workflow, feel free to change them. Otherwise leave them\n      alone.\n\n#### Python Module Caller Node\n\nThis node can call any `.py` file with the `Invoke(*args, **kwargs)` method that returns a string (even an empty\nstring). What you do within Invoke is entirely up to you. This can be used to indefinitely extend WilmerAI's abilities.\n\n```json\n{\n  \"title\": \"Python Module Caller\",\n  \"module_path\": \"D:/Temp/MyTestModule.py\",\n  \"args\": [\n    \"{agent1Output}\"\n  ],\n  \"kwargs\": {},\n  \"type\": \"PythonModule\"\n}\n```\n\n### Full Text Wikipedia Offline API Caller Node\n\nThis node will make a call to\nthe [OfflineWikipediaTextApi](https://github.com/SomeOddCodeGuy/OfflineWikipediaTextApi)\nand will pull back a response based on the promptToSearch that you pass in. You can use this text to pass\ninto other nodes for factual responses (see factual workflows in the sample users).\n\n`NOTE: The below node is deprecated and will be replaced. The new node is right below it. I'm not removing it yet\nin case someone is actively using it.`\n\n```json\n  {\n  \"title\": \"Querying the offline wikipedia api\",\n  \"agentName\": \"Wikipedia Search Api Agent Three\",\n  \"promptToSearch\": \"{agent1Output}\",\n  \"type\": \"OfflineWikiApiFullArticle\"\n}\n```\n\n`NOTE: This is the new node. This node will require you to be using the newest version of the OfflineWikipediaTextApi.\nIf you are using an older version, you will not have the required \"top_article\" endpoint and this will crash.`\n\n```json\n  {\n  \"title\": \"Querying the offline wikipedia api\",\n  \"agentName\": \"Wikipedia Search Api Agent Three\",\n  \"promptToSearch\": \"{agent1Output}\",\n  \"type\": \"OfflineWikiApiBestFullArticle\"\n}\n```\n\nIn addition, there is a similar node that will take top N full articles where the user can specify the number of total\nresults to take and then the top N of these. If percentile, num_results, and top_n_articles are not specified then\ndefaults of 0.5, 10, and 3 will be used respectively. The output articles are given in order of score, where largest\nscored article is first by default (descending). top_n_articles can also be negative, where a negative number will give\nthe results as ascending score rather then descending - this is useful when context is truncated by LLM.\nNOTE: since the output from the wikipedia articles for this can be quite long, you may need to pay attention to the\nModel Endpoint that this is output to and possibly increase the \"maxContextTokenSize\" to handle the larger output size.\nUsing ascending results might help with this.\n\n```json  \n  {\n  \"title\": \"Querying the offline wikipedia api\",\n  \"agentName\": \"Wikipedia Search Api Agent Three\",\n  \"promptToSearch\": \"{agent1Output}\",\n  \"type\": \"OfflineWikiApiTopNFullArticles\",\n  \"percentile\": 0.4,\n  \"num_results\": 40,\n  \"top_n_articles\": 4\n}\n```\n\nThe configuration for these nodes can be found in the user json.\n\n```json\n{\n  \"useOfflineWikiApi\": false,\n  \"offlineWikiApiHost\": \"127.0.0.1\",\n  \"offlineWikiApiPort\": 5728\n}\n```\n\nWhen set to false, the node is hardcoded to respond that no additional information was found.\n\n### First Paragraph Text Wikipedia Offline API Caller Node\n\nThis is an alternative setting to the full text. txtapi-wikipedia by default returns the first\nparagraph of the wiki article. If that is all you need, then this endpoint will return that.\n\nThe only difference from the previous node is the type.\n\n```json\n{\n  \"title\": \"Querying the offline wikipedia api\",\n  \"agentName\": \"Wikipedia Search Api Agent Three\",\n  \"promptToSearch\": \"{agent1Output}\",\n  \"type\": \"OfflineWikiApiPartialArticle\"\n}\n```\n\n### Get Custom File\n\nThis node allows you to load a custom text file in a workflow. The text file can contain anything that you want,\nand will be treated as any other output node. So if this node is the first in your workflow, then you can reference\nthe file using `{agent1Output}` in future nodes of that workflow, for example.\n\n- `filepath`: The specific file you want to load. This must be a single .txt file\n- `delimiter`: The separator that you use in the file to separate different topics, if applicable.\n- `customReturnDelimiter`: What you'd like to replace the delimiter with when returning the text from the file\n  as an agent output.\n\nExample for delimiter: perhaps in your file you have 5 paragraphs, each separated by 2 newlines. You'd like to\nbreak those into individual items, separated by asterisks. Your delimiter might be \"\\n\\n\" while your custom\ndelimiter might be \"\\n\\n******\\n\\n\" This would cause the below\n\n```text\nparagraph 1\n\nparagraph 2\n```\n\nto become\n\n```text\nparagraph 1\n\n******\n\nparagraph 2\n```\n\nThis node can be added with the following syntax:\n\n```json\n{\n  \"title\": \"Custom File Grabber\",\n  \"type\": \"GetCustomFile\",\n  \"filepath\": \"D:\\\\Temp\\\\some_custom_file.txt\",\n  \"delimiter\": \"\\n\\n\",\n  \"customReturnDelimiter\": \"\\n\\n\"\n}\n```\n\n### Workflow Lock\n\nA workflow lock acts as a way to lock a workflow at a certain point during asynchronous operations, so that you don't\nencounter race conditions of two instances of a workflow crashing into each other via consecutive calls.\n\nTo put it more simply, lets use an example workflow:\n\n```text\nNode 1: \"GetCurrentSummaryFromFile\" type node. This does NOT generate a chat summary;\nit only pulls the existing summary out of the file if one is there.\n\nNode 2: \"RecentMemorySummarizerTool\" type node. This does NOT generate memories; it\nonly pulls existing memories out of the file if they exist.\n\nNode 3: Responder node. Just a regular chat node with no \"type\" that would output text.\nHowever, since we want this one to respond to the user, we will be setting `returnToUser`\nto true. This will force this node to output its response to the user, even though it's\nonly halfway through the workflow.\n\nNode 4: \"WorkflowLock\" type node. This node will generate a workflow lock at this point.\nWe will give the WorkflowLock an ID, and until that workflow lock is released, any node\nof this type with that ID for this Wilmer instance will not be allowed to progress past\nthe lock.\n\nNode 5: \"FullChatSummary\" type node. This will generate memories and generate a chat\nsummary. This can be a very long and intensive process. \n```\n\nOk, so looking at the above, let's make a couple of assumptions to help make workflow locks make sense.\n\n- Lets assume that you own 2 computers, both serving a different LLM. Maybe Llama 3.1 8b on computer A, and Qwen2.5 7b\n  on Computer B.\n- The LLM you use to respond to the user in Node 3 is on Computer A, using Llama 3.1\n- The LLM you use to generate memories and chat summaries in Node 5 is on Computer B, using Qwen2.5\n- For the below example, lets assume you have 200 messages in your chat, and have not yet generated\n  a memory or chat summary file.\n- You are using a streaming workflow; ie your front end has streaming enabled. If this is not true,\n  then node 3 won't respond to you and the workflow lock is useless. Most people have this on.\n\nBased on these assumptions, lets step through the workflow.\n\n1) You send a message.\n2) Nodes 1 and 2 gathering your memories and chat summary file. They don't GENERATE anything, they\n   only pull what exists from the file. If nothing exists, they pull nothing\n3) Node 3 utilizes the outputs of Nodes 1 and 2, the memories and chat summary, to generate a response.\n   Because respondToUser is true, that response is sent to the front end UI for the user to read.\n4) Node 4, the workflow lock, engages a lock on whatever the ID is. Lets say the ID is \"MyMemoryLock\". So\n   now Wilmer has registered that there is a workflow lock called \"MyMemoryLock\", and it is turned on.\n5) The FullChatSummary node begins generating memories and a summary. Because you have 200 messages, this will\n   take around 5 minutes to complete.\n\nOk, so you likely got your response from Node 3 in about 10 seconds. But now your memories and summary are being\ngenerated, and that could take up to 5 minutes, so... no more robits for you for 5 minutes?\n\nNope. Thanks to the workflow lock, that isn't the case.\n\nLets consider if you immediately send another message after receiving the response to your UI from Node 3, meaning\nyou that start talking to the AI while there is still 4 minutes of work left for Node 5 on generating memories and\nthe chat summary. Also remember that Computer A was used to respond to you, while Computer B is the one generating\nthe memories.\n\n1) You send a message.\n2) Nodes 1 and 2 gathering your memories and chat summary file. They don't GENERATE anything, they\n   only pull what exists from the file. If nothing exists, they pull nothing\n3) Node 3 utilizes the outputs of Nodes 1 and 2, the memories and chat summary, to generate a response.\n   Because respondToUser is true, that response is sent to the front end.\n4) Workflow hits the workflow lock node. It sees that \"MyMemoryLock\" is engaged, and ends the workflow here,\n   not progressing past this point.\n\nSo what happened? You sent a message, the LLM on Computer A (your responder AI, which currently has nothing\nelse to do but respond to you) responds to you, and then the workflow lock stops the workflow immediately after.\nComputer B is still busy generating memories and summaries from your first message, so we don't want to send\nanother request to it yet. But computer B being busy means nothing for computer A, which is ready to go and will\nhappily keep talking to you.\n\nThis means that, using workflow locks, you can keep talking to your LLM while memories and summaries are being\ngenerated. In this example we used small models, but in reality we might use large ones. For example, Socg might\nuse a Mac Studio with Llama 3.1 70b running, and a Macbook with another instance of Llama 3.1 70b running. Both of\nthose, on a mac, can take up to 2 minutes to respond on a lengthy prompt, so writing memories and summaries can take\nforever. Thanks to this workflow locks, there is no need to wait for those memories/summaries to complete, as the\nconversation can continue using the Studio to respond while the Macbook works tirelessly in the background\nupdating memories/summaries.\n\n```json\n  {\n  \"title\": \"Workflow Lock\",\n  \"type\": \"WorkflowLock\",\n  \"workflowLockId\": \"FullCustomChatSummaryLock\"\n}\n```\n\n**IMPORTANT**: Workflow locks automatically unlock when a workflow has finished its task, and workflow locks\nautomatically release when Wilmer is restarted. Each user gets their own workflow lock tracking,\nwhich is done in the user's sqlLite database (the path to which can be configured in the user json). Workflow locks\nare tracked by a combination if ID, user, and API instance. So as long as you are in the same instance of Wilmer and\nthe same User, you can use the same workflow id in as many workflows as you want. Meaning 1 workflow can cause locks\nin other workflows, if that's what you desire.\n\nWorkflow locks work best in multi-computer setups.\n\n### Image Processor\n\nThe image processor node allows you to utilize Ollama to get information about any images sent to the backend via the\nstandard Ollama images API request for either the Wilmer exposed api/chat or api/generate endpoints.\n\nSo, essentially- if you connect Open WebUI to Wilmer, it will connect to an endpoint Wilmer exposes that is compatible\nwith Ollama's api/chat api endpoint. If you send a picture in Open WebUI, that will be sent to Wilmer as if it were\ngoing to Ollama. Wilmer will see the image, and if you have an ImageProcessor node, that node will caption the image so\nthat you can send it to your main text LLMs later in the workflow. The ImageProcessor node currently requires that the\nendpoint be of the `OllamaApiChatImageSpecific` ApiType, but support for KoboldCpp should be coming soon as well.\n\nIn the event that no image is sent into a workflow with the ImageProcessor node, the node will return a hardcoded\nstring of \"There were no images attached to the message\".\n\n```json\n  {\n  \"title\": \"Image Processor\",\n  \"agentName\": \"Image Processing Agent One\",\n  \"type\": \"ImageProcessor\",\n  \"systemPrompt\": \"There is currently a conversation underway between a user and an AI Assistant in an online chat program. The AI Assistant has no ability to see images, and must rely on a written description of an image to understand what image was sent.\\nWhen given an image from the user, please describe the image in vivid detail so that the AI assistant can know what image was sent and respond appropriately.\",\n  \"prompt\": \"The user has sent a new image in a chat. Please describe every aspect of the image in vivid detail. If the image appears to be a screenshot of a website or desktop application, describe not only the contents of the programs but also the general layout and UX. If it is a photo or artwork, please describe in detail the contents and any styling that can be identified. If it is a screenshot of a game that has menu options or a HUD or any sort of interactive UX, please be sure to summarize not only what is currently occurring in the screenshot but also what options appear to be available in the various UI elements. Spare no detail.\",\n  \"endpointName\": \"Socg-OpenWebUI-Image-Endpoint\",\n  \"preset\": \"_Socg_OpenWebUI_Image_Preset\",\n  \"maxResponseSizeInTokens\": 2000,\n  \"addUserTurnTemplate\": true,\n  \"addDiscussionIdTimestampsForLLM\": true,\n  \"addAsUserMessage\": true,\n  \"message\": \"[SYSTEM: The user recently added images to the conversation. The images have been analyzed by an advanced vision AI, which has described them in detail. The descriptions of the images can be found below:```\\n[IMAGE_BLOCK]]\\n```]\"\n}\n```\n\n- `addAsUserMessage`: If this is set to true, not only will the node put the output from the image model into an\n  agentOutput to be used later, but it will also add a new message to the conversation collection being processed\n  containing that as well. So essentially- every LLM that is called after this node will see 1 more message added\n  to the conversation history- a message with a role of 'user' that will contain the output of the LLM in a particular\n  format specified in the next field, message. If this is false, the node will act like a normal node and only generate\n  an agentOutput\n- `message`: This is used together with 'addAsUserMessage' being true. This is the message that will be added to the\n  chat history. There is a special variable for this called **`[IMAGE_BLOCK]`** that will be replaced with whatever\n  the image llm output from this node; ie `[IMAGE_BLOCK]` will be replaced with whatever the agentOutput value of this\n  node will be. This node is optional; there is a hardcoded message that will be used as default if you do not specify\n  one. The example message I put above is the hardcoded message it would use.\n\n**NOTE**- If addAsUserMessage is true, it will not affect the agentOutput. The node will still produce one as normal,\nand that output will be whatever the LLM responded with. The agentOutput will not contain the value of `message`.\n\n**IMPORTANT**: The ImageProcessor node currently does not support streaming; this only responds as non-streaming, and\nis meant to be used in the middle of a workflow as a captioner, not as the responder for a workflow.\n\n**IMPORTANT**: If you use this with Open WebUI it's fine out of the box, but if you use this in SillyTavern while\nconnected to Wilmer as Text Completion -\u003e Ollama, simply be sure to go to the 3 squares icon at the top right\n(Extensions) -\u003e Click \"Image Captioning\" section, and put the Wilmer prompt template user tag in front of whatever\ncaption prompt you have. So instead of the default `\"What’s in this image?\"` it needs to be `\"[Beg_User]What’s in this\nimage?\"` Captioning seems to work fine with this change. I will be adding screenshots and/or a quickguide\nfor this once I'm done with my testing.\n\n---\n\n### **Custom Workflow Node**\n\nThe **`CustomWorkflow` Node** allows you to execute an entire, separate workflow from within the current workflow. This\nis incredibly powerful for encapsulating reusable logic, breaking down complex processes into smaller, manageable parts,\nand orchestrating multi-step agentic tasks. The final result of the child workflow is captured and stored in the\nparent's state, accessible to subsequent nodes.\n\n#### **Properties**\n\n* `type` (string, required): Must be `\"CustomWorkflow\"`.\n* `workflowName` (string, required): The filename of the custom workflow to execute (e.g., `\"MySubWorkflow.json\"`).\n* `is_responder` (boolean, optional, default: `false`): Determines if this node provides the final, user-facing\n  response.\n    * If `true`, the sub-workflow's final output is returned to the user, and the parent workflow terminates. If the\n      initial request was for a streaming response, this sub-workflow will stream its output.\n    * If `false` (or omitted), the sub-workflow runs \"silently.\" Its final output is captured and stored in an\n      `agent#Output` variable for the parent workflow to use, but it is not sent to the user.\n* `scoped_variables` (array of strings, optional): **(Recommended)** A list of values to pass from the parent workflow\n  into the child workflow's global scope. These values become available to *all nodes* in the child workflow as\n  `{agent1Input}`, `{agent2Input}`, etc., based on their order in the array. This is the most flexible way to provide a\n  child workflow with the context it needs.\n* `firstNodeSystemPromptOverride` (string, optional): Overrides the `systemPrompt` for the very first node in the child\n  workflow. This is a legacy method for passing data.\n* `firstNodePromptOverride` (string, optional): Overrides the `prompt` for the very first node in the child workflow.\n  This is also a legacy method for passing data.\n\n#### **Syntax**\n\n```json\n{\n  \"title\": \"Call a Sub-Workflow to Summarize Text\",\n  \"type\": \"CustomWorkflow\",\n  \"workflowName\": \"SummarizerWorkflow.json\",\n  \"is_responder\": false,\n  \"scoped_variables\": [\n    \"{agent1Output}\",\n    \"A custom static string value\"\n  ],\n  \"firstNodeSystemPromptOverride\": \"You are a helpful summarization assistant. The user has provided the following text from a previous step: {agent1Output}\",\n  \"firstNodePromptOverride\": \"Please summarize the provided text.\"\n}\n```\n\n-----\n\n### **Conditional Custom Workflow Node**\n\nThe **`ConditionalCustomWorkflow` Node** extends the `CustomWorkflow` node with powerful branching logic. It dynamically\nselects and executes a specific sub-workflow based on the resolved value of a conditional variable (e.g., the output\nfrom a previous node). This allows you to create adaptive workflows that react differently based on runtime conditions.\n\nEach potential path, or \"route,\" can also have its own unique prompt overrides, giving you fine-grained control over how\neach selected sub-workflow is initiated.\n\n#### **Properties**\n\n* `type` (string, required): Must be `\"ConditionalCustomWorkflow\"`.\n* `conditionalKey` (string, required): A variable placeholder (e.g., `{agent1Output}`) whose resolved value determines\n  which workflow to execute.\n* `conditionalWorkflows` (object, required): A dictionary that maps the possible values of `conditionalKey` to workflow\n  filenames.\n    * **`Default`** (string, optional): A special key that specifies a fallback workflow to run if the `conditionalKey`'\n      s value does not match any other key in the map.\n* `is_responder` (boolean, optional, default: `false`): Functions identically to the `CustomWorkflow` node, determining\n  if the selected sub-workflow provides the final user-facing response.\n* `scoped_variables` (array of strings, optional): **(Recommended)** Functions identically to the `CustomWorkflow` node.\n  The provided variables are passed into whichever sub-workflow is chosen by the conditional logic.\n* `routeOverrides` (object, optional): A dictionary specifying prompt overrides for each potential route. The keys in\n  this object should correspond to the keys in `conditionalWorkflows`. Each route can define:\n    * `systemPromptOverride` (string, optional): Overrides the system prompt for the first node in the selected\n      workflow.\n    * `promptOverride` (string, optional): Overrides the user prompt for the first node in the selected workflow.\n\n#### **Syntax**\n\n```json\n{\n  \"title\": \"Route to a Specific Coding Model\",\n  \"type\": \"ConditionalCustomWorkflow\",\n  \"conditionalKey\": \"{agent1Output}\",\n  \"conditionalWorkflows\": {\n    \"Python\": \"PythonCodingWorkflow.json\",\n    \"JavaScript\": \"JavaScriptCodingWorkflow.json\",\n    \"Default\": \"GeneralCodingWorkflow.json\"\n  },\n  \"is_responder\": true,\n  \"scoped_variables\": [\n    \"{lastUserMessage}\"\n  ],\n  \"routeOverrides\": {\n    \"Python\": {\n      \"systemPromptOverride\": \"You are an expert Python programmer. The user's request is: {agent1Input}\"\n    },\n    \"JavaScript\": {\n      \"systemPromptOverride\": \"You are a master JavaScript developer. The user's request is: {agent1Input}\"\n    }\n  }\n}\n```\n\n#### **Behavior and Known Issue**\n\n1. **Conditional Execution**: The node resolves the value of `conditionalKey`. It then performs a **case-insensitive**\n   search for that value as a key within the `conditionalWorkflows` map. For example, if `{agent1Output}` resolves to\n   `\"python\"`, `\"Python\"`, or `\"PYTHON\"`, it will correctly match the `\"Python\"` key and select\n   `PythonCodingWorkflow.json`. If no match is found, it will use the workflow specified under the `Default` key.\n\n2. **⚠️ Known Issue: Route Override Key Casing**: When looking for overrides in the `routeOverrides` map, the logic is\n   different. The resolved `conditionalKey` is normalized to lowercase and then **capitalized** (e.g., `\"python\"`\n   becomes `\"Python\"`). This means the keys in your `routeOverrides` object **must be capitalized** to be found.\n\n    * ✅ **Correct**: `\"Python\"`, `\"JavaScript\"`\n    * ❌ **Incorrect**: `\"python\"`, `\"javascript\"`\n\n3. **Fallback Behavior**: If `routeOverrides` is not defined for a matching route, the selected sub-workflow will\n   execute using its own default prompts for its first node.\n\n-----\n\n### **Passing Data from a Parent Workflow to a Child Workflow**\n\nA critical concept to understand is that a child workflow runs in an isolated context. It **cannot** directly access the\noutputs of the parent workflow (e.g., `{agent1Output}`, `{agent2Output}`, etc., from the parent are unavailable inside\nthe child).\n\nThere are two primary mechanisms to pass data from the parent to the child.\n\n#### **Method 1: `scoped_variables` (Recommended Method)**\n\nThis is the most powerful and flexible method. The `scoped_variables` property lets you define a list of values from the\nparent's context that you want to make available globally within the child.\n\n* **How it Works**: The values you list in `scoped_variables` are bundled and passed to the child workflow. Inside the\n  child workflow, they can be accessed *at any node* using the special `{agent#Input}` syntax. The numbering corresponds\n  to the order in the array (0-indexed array, 1-indexed variable).\n\n    * The 1st item in `scoped_variables` becomes `{agent1Input}`.\n    * The 2nd item becomes `{agent2Input}`.\n    * ...and so on.\n\n* **Example**:\n\n    * Parent Node Config:\n      ```json\n      \"scoped_variables\": [\n        \"{agent1Output}\",\n        \"{lastUserMessage}\"\n      ]\n      ```\n    * Usage anywhere in the Child Workflow's JSON:\n      ```json\n      \"prompt\": \"The text to analyze is '{agent1Input}' and the user's original question was '{agent2Input}'.\"\n      ```\n\n#### **Method 2: Prompt Overrides (Legacy Method)**\n\nThis was the original method for passing data. You can embed parent variables directly into the\n`firstNodeSystemPromptOverride` or `firstNodePromptOverride` properties.\n\n* **How it Works**: The variable placeholders are resolved in the parent's context *before* the child workflow is\n  called. The resulting string is then forced upon the first node of the child workflow, replacing its original prompt.\n* **Limitation**: This method only makes the data available to the **first node** of the child workflow. If you need the\n  data in later nodes, you must have the first node explicitly output it so it can be accessed via that child's\n  `{agent1Output}`. This is why `scoped_variables` is now the recommended approach.\n\n### **Receiving Data from a Child Workflow**\n\nThe process of getting the final result back from a child workflow is simple. The entire final output of the\nsub-workflow (i.e., the result of its last or \"responder\" node) is treated as the output of the `CustomWorkflow` node\nitself.\n\n* **Example**: If a `CustomWorkflow` node is the **4th node** in your parent workflow, its final, resolved output will\n  be stored in the parent's `{agent4Output}`, ready to be used by node 5 and beyond.\n\n---\n\n## Understanding Memories\n\nWilmerAI's memory system has undergone a significant evolution, making it more powerful, flexible, and intelligent. It's\ndesigned to provide rich, searchable context for your conversations, ensuring the AI remains coherent and knowledgeable\nover long discussions. This guide will provide an exhaustive breakdown of how the new system works.\n\nThe core architecture is built on a few key principles:\n\n1. **Three Types of Memory:** The system supports three distinct memory types, each with a specific purpose:\n   chronological **Long-Term Memory** (file-based), a holistic **Rolling Chat Summary**, and a powerful, searchable *\n   *Vector Memory** for RAG.\n2. **Separation of Concerns:** The system is split into **Creator** nodes, which perform the computationally expensive\n   work of writing memories, and **Retriever** nodes, which perform fast, inexpensive reads. This split ensures your\n   chat remains responsive even while memories are being updated in the background.\n3. **Workflow-Driven:** Memory operations are implemented as nodes within the workflow engine, giving you explicit\n   control over when and how memories are created and accessed.\n\n-----\n\n### How Memories are Enabled\n\nThe entire persistent memory system is activated by a single tag: `[DiscussionId]`. You must include this tag anywhere\nin your conversation (system prompt, user prompt, or messages) to enable the creation and retrieval of long-term\nmemories.\n\n`[DiscussionId]#######[/DiscussionId]` (where `#######` is any unique identifier).\n\nFor example, `[DiscussionId]project_alpha_123[/DiscussionId]`. Wilmer will automatically remove this tag before sending\nprompts to the LLM.\n\n\u003e **NOTE:** It's recommended not to put the `DiscussionId` in a character's main definition or system prompt if you want\n\u003e separate conversations with that character to have separate memories. Placing it in an author's note or the first\n\u003e message of a chat is often a better practice. Some front-ends support variables that can help create unique IDs, for\n\u003e example: `[DiscussionId]{{char}}_2025-08-17[/DiscussionId]`.\n\n-----\n\n### The Three Types of Memory\n\nWhen a `DiscussionId` is active, the system can maintain up to three distinct files in your `Public/` directory, each\nserving a specific purpose.\n\n#### 1\\. Long-Term Memory (File-Based)\n\nThis system provides a chronological, diary-like record of the conversation.\n\n* **Memory File (`\u003cid\u003e_memories.jsonl`)**: This is the classic memory file. Wilmer groups messages into chunks, uses an\n  LLM to summarize them, and saves these summaries sequentially. It's a detailed, append-only ledger of what's been\n  discussed, with each summary chunk linked via a hash to the last message it was based on.\n\n#### 2\\. Rolling Chat Summary\n\nThis provides a high-level narrative of the entire conversation, updated periodically.\n\n* **Chat Summary File (`\u003cid\u003e_summary.jsonl`)**: This file maintains a single, continuously updated story of the entire\n  conversation. It synthesizes the chunks from the Long-Term Memory file into a holistic overview, giving the AI a\n  bird's-eye view of everything that has happened so far.\n\n#### 3\\. Vector Memory (The Smart Search System) 🧠\n\nThis is the most powerful addition to the memory system. It creates a dedicated, intelligent database for each\ndiscussion, enabling highly relevant, keyword-based search for Retrieval-Augmented Generation (RAG).\n\n* **Vector Memory Database (`\u003cid\u003e_vector_memory.db`)**: Instead of just a text summary, vector memories are stored as\n  structured data in a dedicated **SQLite database**. When a memory is created, it's saved with rich metadata. The\n  system uses SQLite's FTS5 extension to perform powerful full-text searches across this metadata, allowing the AI to\n  perform a \"smart search\" to find the most relevant pieces of information about a specific topic.\n\n-----\n\n### Using Memories in a Workflow: Creators vs. Retrievers\n\nThis separation is crucial for performance. Writing and summarizing memories can take time. By splitting the process,\nyou can design workflows where the AI responds instantly using existing memories, while the creation of new memories\nhappens in the background.\n\n#### Memory Creation Nodes (The \"Heavy Lifting\")\n\nThese nodes perform the computationally expensive work of generating and saving memories.\n\n* **`QualityMemory`**: This is the main node for **creating and updating** your **Long-Term (File-Based)** and **Vector\n  ** memories. You place this in your workflow where you want memory generation to happen (usually at the very end).\n  It's the engine that powers the main memory system and will generate either file-based or vector memories based on\n  your configuration.\n* **`chatSummarySummarizer`**: This is a special-purpose creator node used exclusively for generating and updating the *\n  *Rolling Chat Summary** (`\u003cid\u003e_summary.jsonl`).\n\n#### Memory Retrieval Nodes (The \"Fast Readers\")\n\nThese nodes are lightweight and designed to quickly read existing memories to provide context for an AI's response.\n\n* **`RecentMemory` / `RecentMemorySummarizerTool`**: Reads the last few summary chunks from your Long-Term Memory File (\n  `\u003cid\u003e_memories.jsonl`). It's great for giving an AI general context of what just happened.\n* **`FullChatSummary`**: Reads the entire Rolling Chat Summary from `\u003cid\u003e_summary.jsonl`. Use this to give the AI the\n  complete \"story so far.\"\n* **`VectorMemorySearch`**: The powerful **smart search** node. It performs a keyword search against the Vector Memory\n  database (`\u003cid\u003e_vector_memory.db`) to find the most relevant information for RAG.\n\n-----\n\n### Configuration: The Master Settings File\n\nAll settings for how the `QualityMemory` node generates memories are controlled by a single configuration file:\n`_DiscussionId-MemoryFile-Workflow-Settings.json`.\n\nThe most important setting is the master switch that determines which memory system to use:\n\n* **`useVectorForQualityMemory`**: If `false` (the default), the `QualityMemory` node will write to the classic\n  Long-Term Memory (`.jsonl` files). If `true`, it will create the powerful, searchable Vector Memories in the SQLite\n  database.\n\nBelow is an example of the expanded settings file:\n\n```json\n{\n  \"Display_Only_Description\": \"Settings for the QualityMemory node.\",\n  \"useVectorForQualityMemory\": true,\n  \"vectorMemoryWorkflowName\": \"my-vector-memory-workflow\",\n  \"vectorMemoryEndpointName\": \"Your-Endpoint\",\n  \"vectorMemoryPreset\": \"_Your_Preset\",\n  \"vectorMemoryMaxResponseSizeInTokens\": 2048,\n  \"vectorMemoryChunkEstimatedTokenSize\": 1000,\n  \"vectorMemoryMaxMessagesBetweenChunks\": 5,\n  \"vectorMemoryLookBackTurns\": 3,\n  \"fileMemoryWorkflowName\": \"my-file-memory-workflow\",\n  \"systemPrompt\": \"You are a summarizer. [Memory_file] [Chat_Summary]\",\n  \"prompt\": \"Summarize this chunk: [TextChunk]\",\n  \"endpointName\": \"Your-Endpoint\",\n  \"preset\": \"_Your_MemoryChatSummary_Preset\",\n  \"maxResponseSizeInTokens\": 250,\n  \"chunkEstimatedTokenSize\": 2500,\n  \"maxMessagesBetweenChunks\": 20,\n  \"lookbackStartTurn\": 7\n}\n```\n\n#### Breakdown of Configuration Fields\n\n* **`useVectorForQualityMemory`**: The master switch. `true` for Vector DB, `false` for file-based `.jsonl`.\n\n* **General Settings**:\n\n    * **`chunkEstimatedTokenSize` / `maxMessagesBetweenChunks`**: These work together. A memory is generated when the\n      conversation history since the last memory point reaches `chunkEstimatedTokenSize` **OR** when\n      `maxMessagesBetweenChunks` have passed, whichever comes first. This ensures memories are created regularly.\n    * **`lookbackStartTurn`**: Tells Wilmer to ignore the last N messages when creating memories. This is useful for\n      preventing static text or system messages from being included in memory summaries.\n\n* **Vector-Specific Settings**: The settings prefixed with `vectorMemory...` apply *only* when\n  `useVectorForQualityMemory` is `true`, allowing you to tune vector memory generation independently.\n\n* **File-Based Settings**: The classic settings (`systemPrompt`, `prompt`, `endpointName`, etc.) apply *only* when\n  `useVectorForQualityMemory` is `false`.\n\n-----\n\n### Implementation Guide: Putting It All Together\n\nThis section provides exhaustive, step-by-step instructions for creating and using each memory type.\n\n#### How to Create \u0026 Use Vector Memories (RAG)\n\nVector memory is the most powerful option for RAG. It involves creating a workflow that outputs structured JSON data.\n\n**Step 1: Configure for Vector Memory**\n\nIn `_DiscussionId-MemoryFile-Workflow-Settings.json`, enable vector memory and specify the name of the workflow that\nwill generate it.\n\n```json\n{\n  \"useVectorForQualityMemory\": true,\n  \"vectorMemoryWorkflowName\": \"my-vector-fact-extraction-workflow\"\n}\n```\n\n**Step 2: Build the Generation Workflow**\n\nYour vector memory workflow must have a final node that outputs a **JSON string**. This string can represent a single\nJSON object or, more commonly, an array of objects. Each object represents a single \"memory\" or \"fact\" to be stored in\nthe database.\n\nBased on the system's database schema, each JSON object **must** contain the following keys to be indexed for search:\n`title`, `summary`, `entities`, and `key_phrases`. The value for `summary` is also used as the primary `memory_text`.\nYou can include other keys (like `sentiment` or `topics` from the example), but they will only be stored as metadata and\nnot used in the default search.\n\nHere is a complete, three-step example workflow (`my-vector-fact-extraction-workflow.json`):\n\n```json\n[\n  {\n    \"title\": \"LLM Determine New Memories\",\n    \"agentName\": \"Memory Finder Agent One\",\n    \"type\": \"Standard\",\n    \"systemPrompt\": \"You are a Fact Extraction Agent... Your role is crucial for maintaining the illusion of long-term memory... Focus on the Subject, Not the Conversation... Timelessness... Self-Contained...\",\n    \"prompt\": \"Analyze the following new messages in the chat and extract persistent facts for the Fact Database:\\n\\n\u003cnew_messages\u003e\\n{agent1Input}\\n\u003c/new_messages\u003e\\n\\nFormat the output as a list of bullet points...\",\n    \"endpointName\": \"Your-Endpoint\",\n    \"preset\": \"_Your_MemoryChatSummary_Preset\",\n    \"maxResponseSizeInTokens\": 2000,\n    \"returnToUser\": false\n  },\n  {\n    \"title\": \"LLM adding context to the memories\",\n    \"agentName\": \"Memory Finder Agent Two\",\n    \"type\": \"Standard\",\n    \"systemPrompt\": \"You are an AI assistant that structures factual data. When given a bullet point list of memories, and the messages they were pulled from, please add additional structured information as required about each memory.\",\n    \"prompt\": \"A new series of messages arrived in the chat...:\\n\\n\u003cnew_messages\u003e\\n{agent1Input}\\n\u003c/new_messages\u003e\\n\\nNew memories were generated...:\\n\\n\u003cnew_memories\u003e\\n{agent1Output}\\n\u003c/new_memories\u003e\\n\\nFor each memory, please do the following:\\n- Separate the memories with headers ('# Memory 1', etc.).\\n- Specify a `title`: A concise, 5-10 word headline.\\n- Write a `summary`: the exact text of the memory from the bullet point list.\\n- Specify `entities`: a list of important proper nouns.\\n- Specify `key_phrases`: A list of key conceptual phrases.\\n\\nPlease respond with the structured memory list now.\",\n    \"endpointName\": \"Your-Endpoint\",\n    \"preset\": \"_Your_MemoryChatSummary_Preset\",\n    \"maxResponseSizeInTokens\": 2000,\n    \"returnToUser\": false\n  },\n  {\n    \"title\": \"LLM Format Memories into JSON Array\",\n    \"agentName\": \"Memory Finder Agent Three\",\n    \"type\": \"Standard\",\n    \"systemPrompt\": \"You are a JSON formatting agent. When given a list of structured memories, format them into a valid JSON array of objects and respond ONLY with the formatted JSON. Do not include any other text, comments, or markdown.\",\n    \"prompt\": \"Below is a generated list of memories:\\n\\n\u003cmemories\u003e\\n{agent2Output}\\n\u003c/memories\u003e\\n\\nPlease take the above memories and reformat them into a single, valid JSON array of objects. The final output must be only the raw JSON text.\",\n    \"endpointName\": \"Your-Endpoint\",\n    \"preset\": \"_Your_MemoryChatSummary_Preset\",\n    \"maxResponseSizeInTokens\": 2000,\n    \"returnToUser\": true\n  }\n]\n```\n\n* **Workflow Input**: The system automatically injects the raw conversation chunk into your workflow as `{agent1Input}`.\n* **Final Workflow Output**: The final node (`returnToUser: true`) must produce a raw JSON string like this:\n\n\u003c!-- end list --\u003e\n\n```json\n[\n  {\n    \"title\": \"Inspection Report for House in Florida\",\n    \"summary\": \"The inspection for the prospective white house in Florida revealed some termite damage in the garage, but was otherwise positive. A repair quote is being obtained.\",\n    \"entities\": [\n      \"Florida\"\n    ],\n    \"key_phrases\": [\n      \"inspection report\",\n      \"termite damage\",\n      \"repair quote\"\n    ]\n  },\n  {\n    \"title\": \"Bob's Annual Summer Resort\",\n    \"summary\": \"Bob visits an annual summer resort that features waterfront access, a private beach, spa facilities, and is built into a former country club.\",\n    \"entities\": [\n      \"Bob\"\n    ],\n    \"key_phrases\": [\n      \"annual summer resort\",\n      \"private beach\",\n      \"spa facilities\"\n    ]\n  }\n]\n```\n\n**Step 3: Retrieving Vector Memories**\n\nUse the `VectorMemorySearch` node in your workflow. It takes a single string input containing your search terms. *\n*Keywords must be separated by semicolons (`;`)**. The system sanitizes each term and searches for any of them (using\n`OR` logic), ranking the results by relevance.\n\n* **Example Input for `VectorMemorySearch` node**: `\"Florida house inspection; termite damage; bob's vacation\"`\n* **Note**: The system will process a maximum of 60 keywords to prevent overly complex queries.\n\n#### How to Create \u0026 Use File-Based Memories\n\nThis is the classic, chronological memory system.\n\n**Step 1: Configure for File-Based Memory**\n\nIn `_DiscussionId-MemoryFile-Workflow-Settings.json`, ensure vector memory is disabled.\n\n```json\n{\n  \"useVectorForQualityMemory\": false\n}\n```\n\n**Step 2: Choose a Creation Method**\n\n* **A) Workflow-Based (Recommended):** Specify a workflow name in the settings file via `fileMemoryWorkflowName`. The\n  system will execute this workflow, and the final output **must be a single summarized text block**. The system injects\n  the following context, available in your prompts:\n    * `{agent1Input}`: The raw text chunk to be summarized.\n    * `{agent2Input}`: The most recent memory chunks.\n    * `{agent3Input}`: The full history of all memory chunks.\n    * `{agent4Input}`: The current rolling chat summary.\n* **B) Direct LLM Call (Legacy):** If `fileMemoryWorkflowName` is not set, the system falls back to a direct LLM call\n  using the `prompt` and `systemPrompt` from the settings file. You can use these special variables in your prompts:\n    * `[TextChunk]`: The chunk of messages to be summarized.\n    * `[Memory_file]`: The last 3 memories generated.\n    * `[Full_Memory_file]`: All currently generated memories.\n    * `[Chat_Summary]`: The current rolling chat summary.\n\n**Step 3: Retrieving File-Based Memories**\n\nUse the `RecentMemory` or `RecentMemorySummarizerTool` node in your workflow. It will read the most recent summary\nchunks from the `\u003cid\u003e_memories.jsonl` file.\n\n#### How to Create \u0026 Use the Rolling Chat Summary\n\nThe rolling summary is generated by a separate, dedicated workflow node.\n\n**Step 1: The `chatSummarySummarizer` Node**\n\nTo create or update the rolling summary, you must include a node with `\"type\": \"chatSummarySummarizer\"` in your\nworkflow. This is a special, single-node workflow.\n\n**Step 2: Building the Summary Workflow**\n\nHere is a complete example of a chat summary workflow. Note the special properties `loopIfMemoriesExceed` and\n`minMemoriesPerSummary` which are required.\n\n```json\n[\n  {\n    \"title\": \"Chat Summarizer\",\n    \"agentName\": \"Chat Summarizer Agent\",\n    \"type\": \"chatSummarySummarizer\",\n    \"systemPrompt\": \"You are an expert summarizer. Condense the provided information into a coherent narrative summary.\",\n    \"prompt\": \"The current chat summary is:\\n[CHAT_SUMMARY]\\n\\nThe newest memories since the last summary are:\\n[LATEST_MEMORIES]\\n\\nPlease update the summary to incorporate the new information from the latest memories.\",\n    \"endpointName\": \"Your-Endpoint\",\n    \"preset\": \"_Your_MemoryChatSummary_Preset\",\n    \"maxResponseSizeInTokens\": 2000,\n    \"loopIfMemoriesExceed\": 3,\n    \"minMemoriesPerSummary\": 2\n  }\n]\n```\n\n* **Workflow Properties**:\n    * `loopIfMemoriesExceed`: When regenerating many memories at once, this tells the system to update the summary after\n      every `N` new memories are created. This improves quality by feeding the summary back into the next memory\n      generation cycle.\n    * `minMemoriesPerSummary`: During a normal conversation, this prevents the summary from updating until at least `N`\n      new memories have been generated since the last update.\n* **Prompt Variables**:\n    * `[CHAT_SUMMARY]`: The current rolling chat summary from the file.\n    * `[LATEST_MEMORIES]`: The new memory chunks created since the last summary was written.\n\n**Step 3: Retrieving the Summary**\n\nUse the `FullChatSummary` node to get the complete text from the `\u003cid\u003e_summary.jsonl` file.\n\n-----\n\n### Advanced Topics: Tracking and Regeneration\n\n#### How Memories are Stored and Tracked\n\nThe system has robust mechanisms to avoid reprocessing the same conversation history.\n\n* **File-Based Tracking**: Each summary in `\u003cid\u003e_memories.jsonl` is stored alongside a **hash** of the last message it\n  was based on. When the `QualityMemory` node runs, it finds the last hash and compares it to the chat history to see\n  where to resume processing. **Pitfall**: If you edit or delete a message that was hashed, the system may lose its\n  place and re-summarize a large portion of your chat, potentially creating duplicate memories.\n\n* **Vector-Based Tracking**: This system is more robust. Inside the `\u003cid\u003e_vector_memory.db` file, a dedicated\n  `vector_memory_tracker` table stores the hash of the last message processed. This prevents reprocessing messages even\n  if earlier parts of the conversation are edited.\n\n#### Redoing Memories\n\nTo regenerate memories from scratch (e.g., after improving your prompts), you must **delete all memory files** for that\n`DiscussionId`.\n\n\u003e **NOTE:** Always back up your memory files before modifying or deleting them, unless you are certain you want to\n\u003e rebuild them completely.\n\nDelete all associated files from the `Public/` directory:\n\n1. `\u003cid\u003e_memories.jsonl` (Long-Term Memory)\n2. `\u003cid\u003e_summary.jsonl` (Rolling Summary)\n3. `\u003cid\u003e_vector_memory.db` (Vector Memory)\n\nThe next time you run a workflow containing the creator nodes (`QualityMemory`, `chatSummarySummarizer`), the system\nwill see the files are missing and regenerate everything from the complete chat history. This is also a useful way to *\n*consolidate memories**, as regeneration prioritizes the `chunkEstimatedTokenSize` over `maxMessagesBetweenChunks`,\noften resulting in fewer, more comprehensive memory chunks.\n\n### Parallel Processing\n\nFor handling extensive conversations, the app employs a parallel processing node for chat summaries and recent memories.\nThis allows you to distribute the workload across multiple LLMs. For example, if you have a conversation with 200,000\ntokens resulting in about 200 memory chunks, you can assign these chunks to different LLMs. In a setup with three 8b\nLLMs on separate computers, each LLM processes a chunk simultaneously, significantly reducing the processing time.\n\n**Current Limitations:**\n\n- Custom prompts are not yet supported for parallel processing but will be in the future. Currently, this feature is\n  limited to processing memories and summaries.\n\n---\n\n### Presets\n\nPresets in this project are highly customizable and not hardcoded. You can include any parameters you need in the JSON\nfile, allowing for flexibility and adaptability. If a new preset type is introduced tomorrow, you can simply add it to\nthe JSON file, and it will be sent over to the API without waiting for a new implementation.\n\n#### Example Preset JSON\n\nHere is an example of a preset JSON:\n\n```json\n{\n  \"truncation_length\": 16384,\n  \"max_tokens\": 3000,\n  \"temperature\": 1,\n  \"top_p\": 1\n}\n```\n\nThe current preset JSONs are a collection of parameters commonly used by SillyTavern and other front-end applications,\nextracted directly from their payloads. Note that while most API endpoints are tolerant of additional parameters, some\nlike the OpenAI API will throw an error if you send parameters they do not know. Therefore, it's essential to include\nonly what is needed for those particular endpoints.\n\n---\n\n## Quick Troubleshooting Tips\n\n### I don't see a memories file or summary file!\n\nA) Make sure that those nodes exist in your workflow. Take a look at one of the\nexample workflows called FullCustomWorkflow-WithRecent-ChatSummary for an example.\n\nB) Make sure the FOLDER exists. You can modify where these files are being\nwritten to in your Users/username.json file.\n\n### I'm not seeing a response coming in on the front end!\n\nIt could be a front end that doesn't work well with Wilmer, but the first\nthing I'd check is that \"streaming\" matches on both sides. Both the front end\nand Wilmer have to match for Stream being true or false. You can change this\nin Wilmer in your Users/username.json file, and on SillyTavern it's in the\nfar left icon, around where Temperature is set.\n\n### I'm getting an error that my LLM doesn't like some of the presets.\n\nSome LLMs, like ChatGPT, don't accept presets that they don't recognize\n(like dynamic temperature). You'll need to go through your workflows and\nswap out all the presets with one that only has fields the API accepts.\n\n### I want to update, but I don't want to lose all my stuff.\n\nThe public folder should be where all your settings are saved. I'd back\nthat folder up and move it between installations. This is still in heavy\ndevelopment, so ultimately that folder may get broken at some point\nbecause of changes, but I'll try my best not to.\n\n### My routing is terrible/the outputs are awful/the LLM is really confused.\n\nCheck your prompt templates, check your prompts, etc. Small LLMs may have\na hard time with Wilmer, but a prompt template can make or break one, so\ndefinitely be careful there.\n\n### I keep getting out of memory/truncate length errors!\n\nWilmer currently has no token length checks to ensure that you aren't going\nover the model's max length, so be careful there. If you have 200,000 tokens\nof messages, there's nothing in Wilmer to stop you from trying to send all\n200,000 to the LLM. That, of course, would cause it to fail.\n\n### Getting some error about None type can't do something...\n\nMore than likely the LLM api either broke, didn't send back a response, or\nsend back something Wilmer didn't know what to do with. Or that something else\nbroke within Wilmer. Look at the output and you may see the cause.\n\n### It looks like Wilmer is sending a prompt to the LLM, but nothing is happening.\n\nMake sure that your endpoint's address and port are correct, and make sure that\nyou are using the right user. Everything may look fine, but you could have\nthe wrong user set as your current, in which case you're hitting a workflow\nwith endpoints that aren't set up. Wilmer just kind of stalls out if you try\nto hit a link that doesn't exist, since the timeout is set for a really long\nperiod of time due to some LLMs taking forever to respond.\n\n---\n\n## Contact\n\nFor feedback, requests, or just to say hi, you can reach me at:\n\nWilmerAI.Project@gmail.com\n\n---\n\n## Third Party Libraries\n\nWilmerAI imports five libraries within its requirements.txt, and imports the libraries via import statements; it does\nnot extend or modify the source of those libraries.\n\nThe libraries are:\n\n* Flask : https://github.com/pallets/flask/\n* requests: https://github.com/psf/requests/\n* scikit-learn: https://github.com/scikit-learn/scikit-learn/\n* urllib3: https://github.com/urllib3/urllib3/\n* jinja2: https://github.com/pallets/jinja\n* pillow: https://github.com/python-pillow/Pillow\n\nFurther information on their licensing can be found within the README of the ThirdParty-Licenses folder, as well as the\nfull text of each license and their NOTICE files, if applicable, with relevant last updated dates for each.\n\n## Wilmer License and Copyright\n\n    WilmerAI\n    Copyright (C) 2025 Christopher Smith\n\n    This program is free software: you can redistribute it and/or modify\n    it under the terms of the GNU General Public License as published by\n    the Free Software Foundation, either version 3 of the License, or\n    (at your option) any later version.\n\n    This program is distributed in the hope that it will be useful,\n    but WITHOUT ANY WARRANTY; without even the implied warranty of\n    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the\n    GNU General Public License for more details.\n\n    You should have received a copy of the GNU General Public License\n    along with this program.  If not, see \u003chttps://www.gnu.org/licenses/\u003e.","funding_links":[],"categories":["Python","ai","A01_文本生成_文本对话"],"sub_categories":["大语言对话模型及数据"],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FSomeOddCodeGuy%2FWilmerAI","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FSomeOddCodeGuy%2FWilmerAI","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FSomeOddCodeGuy%2FWilmerAI/lists"}