{"id":13494661,"url":"https://github.com/dzhng/llamaflow","last_synced_at":"2025-03-08T12:35:00.241Z","repository":{"id":143371328,"uuid":"614115704","full_name":"dzhng/llamaflow","owner":"dzhng","description":"The Typescript-first prompt engineering toolkit for working with chat based LLMs.","archived":false,"fork":false,"pushed_at":"2023-07-10T17:45:55.000Z","size":43774,"stargazers_count":108,"open_issues_count":0,"forks_count":6,"subscribers_count":6,"default_branch":"main","last_synced_at":"2025-03-07T04:02:57.224Z","etag":null,"topics":["ai","chatgpt","gpt-3","gpt-4","llm","nodejs","prompt-engineering","typescript"],"latest_commit_sha":null,"homepage":"https://www.npmjs.com/package/llama-flow","language":"TypeScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/dzhng.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null}},"created_at":"2023-03-14T23:28:06.000Z","updated_at":"2025-01-30T23:00:22.000Z","dependencies_parsed_at":"2024-01-16T09:02:30.027Z","dependency_job_id":"d3e1183d-7235-42aa-9bb5-a0f5b2b21a88","html_url":"https://github.com/dzhng/llamaflow","commit_stats":null,"previous_names":[],"tags_count":35,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dzhng%2Fllamaflow","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dzhng%2Fllamaflow/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dzhng%2Fllamaflow/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dzhng%2Fllamaflow/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/dzhng","download_url":"https://codeload.github.com/dzhng/llamaflow/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":242552824,"owners_count":20148252,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai","chatgpt","gpt-3","gpt-4","llm","nodejs","prompt-engineering","typescript"],"created_at":"2024-07-31T19:01:26.960Z","updated_at":"2025-03-08T12:34:57.896Z","avatar_url":"https://github.com/dzhng.png","language":"TypeScript","funding_links":[],"categories":["TypeScript"],"sub_categories":[],"readme":"NOTE: this library is deprecated, it is now split into [zod-gpt](https://github.com/dzhng/zod-gpt) and [llm-api](https://github.com/dzhng/llm-api).\n\n# 🦙 LLamaFlow\n\n[![test](https://github.com/dzhng/llamaflow/actions/workflows/test.yml/badge.svg?branch=main\u0026event=push)](https://github.com/dzhng/llamaflow/actions/workflows/test.yml)\n\nThe Typescript-first prompt engineering toolkit for working with chat based large language models (LLMs).\n\n- [Introduction](#-introduction)\n- [Usage](#-usage)\n  - [Install](#install)\n  - [Chats](#chats)\n  - [Prompts](#prompts)\n  - [Custom Prompts](#custom-prompts)\n- [Text Splitter](#-text-splitter)\n- [Debugging](#-debugging)\n- [Azure](#-azure)\n- [API Reference](#-api-reference)\n\n## 👋 Introduction\n\nLLamaFlow is the middleware layer that sits between your software and the AI model, it adds the following capabilities on top of the standard chat completion API:\n\n- Support for structured outputs from models with complete type safety. All responses are fully validated \u0026 typed, works with [zod](https://github.com/colinhacks/zod) as a peer dep.\n- Schema definition, serialization / parsing, and **automatically asking the model to correct outputs**.\n- Custom content validation hook that allows you to add your own valider for all model outputs, including logic on how to reask the model.\n- Nicer API for sending \u0026 retriving chat messages from models, no need to keep track of message memory manually.\n- Handle rate limit and any other API errors as gracefully as possible (e.g. exponential backoff for rate-limit).\n\nWith LLamaFlow, you can simply query OpenAI's ChatGPT model like so:\n\n```typescript\nimport { OpenAI } from 'llama-flow';\n\nconst model = new OpenAI({ apiKey: 'YOUR_OPENAI_KEY' });\n\nconst chat = model.chat({\n  systemMessage:\n    \"You are a smart and honest AI assistant. Follow the user's requirements carefully \u0026 to the letter, minimize any other prose.\",\n});\n\nconst response = await chat.request(\n  prompt.json({\n    message:\n      'What are some good names for childrens book about the renaissance? Respond as a JSON array',\n    schema: z.array(z.string().max(200)),\n  }),\n);\n\nconsole.log(response.content); // content will be typed as string[];\n```\n\n## 🔨 Usage\n\n### Install\n\nThis package is hosted on npm:\n\n```\nnpm i llama-flow\n```\n\n```\nyarn add llama-flow\n```\n\nTo setup in your codebase, initialize a new instance with the model you want (only `OpenAI` is suported for now). Note that you can also add default model and chat config (like temperature, timeouts, retries) when initializing. These are just defaults, and can always be overwritten later on a per-chat or per-request basis.\n\n```typescript\nimport { OpenAI } from 'llama-flow';\n\nconst model = new OpenAI(\n  { apiKey: 'YOUR_OPENAI_KEY' },\n  { model: 'gpt-3.5-turbo' },\n);\n```\n\n### Chats\n\nA chat is a conversation between the \"user\" (your software), and the AI agent. LLamaFlow will take care of managing chat memory, so you can simply continue the conversation by sending another request. Note that different memory management strategies will be added in the future, such as pruning the memory as needed in order to fit the context window.\n\n```typescript\nconst chat = model.chat({\n  systemMessage: 'You are an AI writer.',\n  retainMemory: true,\n});\n\n// You can ask the AI model with a simple string, or a dedicated `Prompt` object.\nconst response = await chat.request(\n  prompt.text(\n    'Write a script for a tiktok video that talks about the artistic contribution of the renaissance.',\n  ),\n);\n\n// The results, as well as any usage stats, will be returned.\nconsole.log(\n  `The AI writer's response is: ${response.content}. Token used: ${response.usage.totalTokens}.`,\n);\n\n// You can follow up on this chat by prompting further, using the `bulletPrompt` object that was created earlier.\nconst bulletPoints = await chat.request(bulletPrompt);\n\n// `bulletPoints.content` will be automatically casted in the correct type as defined in the schema field of `bulletPrompt`\nconsole.log(\n  `The structured version of this response is: ${JSON.stringify(\n    bulletPoints.content,\n  )}`,\n);\n```\n\n### Prompts\n\nA prompt is a message to an AI chat with an expectation of a specific response format. Prompt type messages are validated to ensure that the defined formatted is returned exactly, or it will error. There are different kinds of prompts for different formats. Here is an example of a JSON prompt.\n\n```typescript\nimport { prompt } from 'llama-flow';\nimport { z } from 'zod'; // JSON prompt uses Zod for schema validation.\n\nconst bulletPrompt = prompt.json({\n  message:\n    'Please rewrite this in a list of bullet points. Respond as a JSON array, where each element in the array is one bullet point. Keep each bullet point to be 200 characters max. For example: [\"bullet point 1\", \"bullet point 2\"]',\n  schema: z.array(z.string().max(200)),\n});\n```\n\nNote that the `Prompt` object seperates out the main `message`, and `formatMessage`. This is used for retries. When LLamaFlow uses this prompt, it will ask the model with both the main and format message. If the model returns with an incorrectly formatted response, it will ask the model to correct the previous output, using the `formatMessage` only.\n\n### Custom Prompts\n\nYou can build your own Prompt objects with custom validators as well. LLamaFlow provide an easy \u0026 extensible way to build any type of validators. Here is a few examples of custom validators:\n\nTaking the Prompt example above, but this time, it will ask the model to just respond in actual bullet points instead of JSON arrays. This is useful because sometimes the model (esp \u003c GPT-4) is not the best at following specific formatting instructions, especially when it comes to complicated data structures.\n\n```typescript\nimport { prompt } from 'llama-flow';\n\nconst bulletPrompt = prompt.json({\n  message:\n    'Please rewrite this in a list of bullet points. Respond as a list of bullet points, where each bullet point begins with the \"-\" character. Each bullet point should be less than 200 characters. Put each bullet point on a new line.',\n\n  // parse the response from the model so it can be fed into the schema validator\n  parseResponse: (res) =\u003e res.split('\\n').map((s) =\u003e s.replace('-', '').trim()),\n\n  // it's useful to define custom error messages, any schema parse errors will be automatically fed back into the model on retry, so the model knows exactly what to correct.\n  schema: z.array(\n    z.string().max(200, {\n      message: 'This bullet point should be less than 200 characters.',\n    }),\n  ),\n});\n```\n\nNow, let's take this even further. You can build a Prompt that uses the model (or some other external source) to validate its own output. You can do this by passing in a custom async `validate` method. Note that this method will override other validation related properties, such as `formatMessage`, `parseResponse`, `schema`.. etc.\n\n```typescript\nimport { prompt, Chat } from 'llama-flow';\n\nconst factCheckerChat = model.chat({\n  systemMessage:\n    'You are a fact checker that responds to if the user\\'s messages are true or not, with just the word \"true\" or \"false\". Do not add punctuations or any other text. If the user asks a question, request, or anything that cannot be fact checked, ignore the user\\'s request and just say \"null\".',\n\n  // The fact checker is designed to fulfill each request independently (e.g. the current request does not depend on the content of the previous request). So no need to keep message memory to save on tokens.\n  retainMemory: false,\n});\n\nconst buildFactCheckedPrompt = (article: string) =\u003e\n  prompt.text({\n    message: `Please write a summary about the following article: ${article}`,\n\n    // Because LLM driven validation can get expensive, set a lower retry count.\n    promptRetries: 2,\n\n    parse: async (response) =\u003e {\n      // Check if this summary is true or not\n      const { response } = await factCheckerChat.request(\n        prompt.boolean({\n          message: response.content,\n        }),\n      );\n\n      if (response.content === true) {\n        return { success: true, data: response.content };\n      } else {\n        // if `retryPrompt` is set, LLamaFlow will automatically retry with the text in this property.\n        return {\n          success: false,\n          retryPrompt:\n            'This summary is not true, please rewrite with only true facts.',\n        };\n      }\n    },\n  });\n\n// now, every content generated by this chat will be fact checked by the LLM itself, and this request will throw an error if the content can't be fixed (once the maximum number of retries has been reached).\nconst factCheckedContent = await chat.request(\n  buildFactCheckedPrompt(\n    'Write a script for a tiktok video that talks about the artistic contribution of the renaissance.',\n  ),\n);\n```\n\nBecause this is an API, it's often useful to keep requesting from the same chat. Often the message history will serve as context for the next request. A good example use case is a prompt to first write some content, then extract entities, and lastly, give some options for the title.\n\n```typescript\n// You can reset chat history anytime with `reset()`, however, this is an anti-pattern, as it is prone to mistakes. It's much safer to just initialize a new chat.\nchat.reset();\n\nconst article = await chat.request(\n  prompt.text('Write a blog post about the financial crisis of 2008'),\n);\n\nconst entities = await chat.request(\n  prompt.json({\n    message:\n      'What are the different entities in the above blog post? Respond in a JSON array, where the items in the array are just the names of the entities.',\n    schema: z.array(z.string()),\n  }),\n);\n\nconst titles = await chat.request(\n  prompt.bulletPoints({\n    message: 'Write a good title for this post',\n    amount: 10,\n  }),\n);\n```\n\n## 📃 Text Splitter\n\nA common error with LLM APIs is token usage - you are only allowed to fit a certain amount of data in the context window. In the case of LLamaFlow, this means you are limited in the total number of messages you can send (if `retainMemory` is set to `true`) and the length of the content of the messages.\n\nLLamaFlow will automatically determine if the request will breach the token limit BEFORE sending the actual request to the model provider (e.g. OpenAI). This will save one network round-trip call and let you handle these type of errors in a responsive manner. The typical way of handling these errors are to remove messages in the message history (if you are using chat with `retainMemory` set), or split your content into smaller clusters and process them in multiple requests.\n\nHere is an example of catching the token overflow error. Note that `minimumResponseTokens` is set to a high value to explicitly trigger this error (`gpt-3.5-turbo` has a max context limit of 4096, so setting the minimum limit to 4095 means there is only 1 token left for the actual prompt, which is not enough for the example below.)\n\n```typescript\ntry {\n  // make sure to set the `contextSize` to enable automatic token checking\n  const model = new OpenAI(\n    { apiKey: 'YOUR_OPENAI_KEY' },\n    { model: 'gpt-3.5-turbo', contextSize: 4096 },\n  );\n\n  const chat = model.chat({\n    systemMessage: 'You are an AI assistant',\n  });\n  await chat.request(\n    { message: 'hello world, testing overflow logic' },\n    { minimumResponseTokens: 4095 },\n  );\n} catch (e) {\n  if (e instanceof TokenError) {\n    console.info(\n      `Caught token overflow, overflowed tokens: ${e.overflowTokens}`,\n    );\n  }\n}\n```\n\nA common way to handle token limit issues is to split your content. LLamaFlow provides a useful helper method that wraps the `chat.request` method and will automatically split your text based on an input chunk config. It's smart enough to only split your text if it determines that it is above the token limit, and will try to preserve as much of the original text as possible.\n\n```typescript\nconst response = await chat.requestWithSplit(\n  'hello world, testing overflow logic',\n  (text) =\u003e\n    prompt.text({\n      message: `Add other required prompts first, then add your content: ${text}`,\n    }),\n);\n```\n\nNote that now, the main content of the prompt is submitted first. This is the content that will be split by the text splitter (along the `\\n`, `.`, `,`, and ` ` characters first, to chunk it). You can add any additional required prompts and combine it with the content prompt in the `responseFn` parameter.\n\n## 🤓 Debugging\n\nLLamaFlow usese the `debug` module for logging \u0026 error messages. To run in debug mode, set the `DEBUG` env variable:\n\n`DEBUG=llamaflow:* yarn playground`\n\nYou can also specify different logging types via:\n\n`DEBUG=llamaflow:error yarn playground`\n`DEBUG=llamaflow:log yarn playground`\n\n## 🔷 Azure\n\nLLamaFlow also comes with support for Azure's OpenAI models. The Azure version is usually much faster and more reliable than OpenAI's own API endpoints. In order to use the Azure endpoints, you must include 2 Azure specific options when initializing the OpenAI model, `azureDeployment` and `azureEndpoint`. The `apiKey` field will also now be used for the Azure API key.\n\nYou can find the Azure API key and endpoint in the [Azure Portal](https://portal.azure.com/). The Azure Deployment must be created under the [Azure AI Portal](https://oai.azure.com/).\n\nNote that the `model` parameter in `ModelConfig` will be ignored when using Azure. This is because in the Azure system, the `model` is selected on deployment creation, not on run time.\n\n```typescript\nconst model = new OpenAI({\n  apiKey: 'AZURE_OPENAI_KEY',\n  azureDeployment: 'AZURE_DEPLOYMENT_NAME',\n  azureEndpoint: 'AZURE_ENDPOINT',\n});\n```\n\n## ✅ API Reference\n\n### Model\n\nThe only model LLamaFlow supports currently is OpenAI's chat based models.\n\n```typescript\nconst model = new OpenAI(openAiConfig, modelConfig);\n```\n\n#### OpenAI Config\n\n```typescript\ninterface OpenAIConfig {\n  apiKey: string;\n}\n```\n\n#### Model Config\n\nThese model config map to OpenAI's config directly, see doc:\nhttps://platform.openai.com/docs/api-reference/chat/create\n\n```typescript\ninterface ModelConfig {\n  model?: string;\n  maxTokens?: number;\n  temperature?: number;\n  topP?: number;\n  stop?: string | string[];\n  presencePenalty?: number;\n  frequencyPenalty?: number;\n  logitBias?: Record\u003cstring, number\u003e;\n  user?: string;\n  stream?: boolean;\n}\n```\n\nWhen `stream` is set to `true`, you can access partial outputs of the model's requests by passing in an event emitter to `ChatRequestOptions` when making requests. The partial outputs will be sent as a string over the `data` event.\n\n### Prompt\n\nTo make a request to the model, you need to first build the prompt object. prompts provide a way to add validation and retry logic to each request.\n\n```typescript\nimport { prompt } from 'llama-flow';\n\nprompt.text(prompt: string);\nprompt.text(prompt: RawPrompt);\n\nprompt.json(prompt: JSONPrompt);\nprompt.bulletPoints(prompt: BulletPointsPrompt);\nprompt.boolean(prompt: BooleanPrompt);\n```\n\n#### Text\n\nYou can either request as a string, or as a `RawPrompt`.\n\n```typescript\ninterface RawPrompt\u003cT = string\u003e {\n  message: string;\n  parse?: (\n    response: ChatResponse\u003cstring\u003e,\n  ) =\u003e MaybePromise\u003c\n    { success: false; retryPrompt?: string } | { success: true; data: T }\n  \u003e;\n  promptRetries?: number;\n}\n```\n\n**message**\nThis is the text that is sent to the model.\n\n**parse**\nYou can implement a custom parser by defining a your own `parse` method.\n\nWhen defining a custom `parse` method that returns a custom data type, you can add a generic type to `RawPrompt`, which will automatically cast the return type of `parse` to the generic. It will also propagate the type all the way through the `chat.request` method.\n\nIf the data returned by the model is malformed, you can return a custom `retryPrompt` string, which will cause LLamaFlow to reask the model.\n\n**promptRetries**\nDefines how many times to reask the model before the request will throw an error. Defaults to 3. Note that `parse` has to return a valid `retryPrompt` for any retries to be attempted.\n\n#### Boolean\n\n```typescript\ninterface BooleanPrompt {\n  message: string;\n  promptRetries?: number;\n}\n```\n\nUse this prompt if you want to ask the model a question where you only expect a `true` or `false` response.\n\n**message**\nThe query to send to the model. This prompt will automatically append formatting instructions to the message that is sent to the model that tells the model to format its response as a boolean, so you can just include the query in `message`, without writing any additional formatting statements.\n\n#### Bullet\n\n```typescript\ninterface BulletPointsPrompt {\n  message: string;\n  amount?: number;\n  length?: number;\n  promptRetries?: number;\n}\n```\n\nUse this prompt if you want the model to return a list of strings.\n\n**message**\nThe query to send to the model. This prompt will automatically append formatting instructions to the message that tells the model how to format the response.\n\n**amount**\nThe number of bullet points that should be returned.\n\n**length**\nThe maximum number of characters that should be in each bullet point.\n\n#### JSON\n\n```typescript\ninterface JSONPrompt\u003cT extends z.ZodType\u003e {\n  message: string;\n  schema: T;\n  parseResponse?: (res: string) =\u003e MaybePromise\u003cz.infer\u003cT\u003e\u003e;\n  retryMessage?: string;\n  promptRetries?: number;\n}\n```\n\n**message**\nThe message to send to the model. Unlike boolean or bullet point prompts, this prompt does not automatically generate formating instructions for the model. So as part of your message to the model, you should include formatting instructions to return data in JSON format, as well as the shape of the JSON.\n\n**schema**\nThis is the [zod](https://github.com/colinhacks/zod) schema that will be used to parse and typecast the response from the model.\n\n**parseResponse**\nIf you ask the model to _not_ return data in JSON format, you can define a custom parser to parse the return string into JSON, before sending it for to `schema` for validation.\n\n**retryMessage**\nIf schema parsing fails, this will be used as part of the message sent to the model to reask for a correctly formatted response. Note that this prompt will automatically generate the reask message depending on schema parsing errors (e.g. if a specific key is missing, LLamaFlow will ask the model to include that specific key). So this field is purely to give additional context to the model on reask.\n\n### Chat\n\nThe chat object stores a chat session with the model. The session will take care of storing message history, so you can simply continue the conversation with the model by making another request.\n\n```typescript\nconst chat = model.chat(config: ChatConfig);\n```\n\n**options**\nYou can set the memory retention behavior as well as the default request options for every request sent in this chat.\n\n```typescript\nexport interface ChatConfig {\n  // the message injected at the start of every chat to steer the agent\n  systemMessage: string;\n\n  // if chat memory should be retained after every request. when enabled, the chat's behavior will be similar to a normal user chat room, and model can have access to history when making inferences. defaults to false\n  retainMemory?: boolean;\n\n  // set default request options. note that this can be overridden on a per-request basis\n  options?: ChatRequestOptions;\n}\n```\n\n#### Request\n\nTo send a request to a chat session:\n\n```typescript\nconst res: ChatResponse = await chat.request(prompt, options: ChatRequestOptions);\n```\n\n**options**\nYou can override the default request options via this parameter. A request will automatically be retried if there is a ratelimit or server error.\n\nNote that a retry in the request does not count towards a prompt reask defined in the Prompt section above.\n\n```typescript\ntype ChatRequestOptions = {\n  // the number of time to retry this request due to rate limit or recoverable API errors\n  retries?: number;\n  retryInterval?: number;\n  timeout?: number;\n\n  // the minimum amount of tokens to allocate for the response. if the request is predicted to not have enough tokens, it will automatically throw a 'TokenError' without sending the request\n  minimumResponseTokens?: number;\n\n  // override the messages used for completion, only use this if you understand the API well\n  messages?: Message[];\n\n  // pass in an event emitter to receive message stream events\n  events?: EventEmitter;\n};\n```\n\n#### Response\n\nChat responses are in the following format:\n\n```typescript\ninterface ChatResponse\u003cT = string\u003e {\n  content: T;\n  model: string;\n\n  // set to true if this content was streamed. note to actually access the stream, you have to pass in an event emitter via ChatRequestOptions\n  isStream: boolean;\n\n  usage: {\n    promptTokens: number;\n    completionTokens: number;\n    totalTokens: number;\n  };\n}\n```\n\n**content**\nParsed and typecasted content from the prompt. The types will be set automatically depending on which prompt you used.\n\n**model**\nThe specific model used for the completion (e.g. `gpt-3.5-turbo-0301`)\n\n**usage**\nToken usage data, this maps directly OpenAI's usage response.\n\n#### Reset\n\nIf you would like to reset the message history in a chat history, there is a simple helper method:\n\n```typescript\nchat.reset();\n```\n\nNote that this method is an escape hatch. It's better to just instantiate a new chat session if you'd like to make a new request with a clean slate. Complex logic where you are resetting a chat session multiple times can be hard to track and hard to debug.\n\n### Misc\n\nNote that if you want to bypass LLamaFlow's chat management logic, and send a request to the underlaying model directly, you can send a request to the model directly without instantiating a chat:\n\n```typescript\nconst model = new OpenAI(openAiConfig, modelConfig);\nconst res = await model.request(messages: Message[], options: ChatRequestOptions);\n```\n\nThis will bypass any chat history management, prompt formatting \u0026 parsing, as well as persona logic. You can still make use of the API retries feature via `ChatRequestOptions`.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdzhng%2Fllamaflow","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdzhng%2Fllamaflow","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdzhng%2Fllamaflow/lists"}