{"id":13564383,"url":"https://github.com/microsoft/aici","last_synced_at":"2025-05-14T09:06:53.696Z","repository":{"id":220212275,"uuid":"697007681","full_name":"microsoft/aici","owner":"microsoft","description":"AICI: Prompts as (Wasm) Programs","archived":false,"fork":false,"pushed_at":"2025-01-22T21:14:57.000Z","size":10179,"stargazers_count":2017,"open_issues_count":40,"forks_count":83,"subscribers_count":23,"default_branch":"main","last_synced_at":"2025-04-19T09:21:02.905Z","etag":null,"topics":["ai","inference","language-model","llm","llm-framework","llm-inference","llm-serving","llmops","model-serving","rust","transformer","wasm","wasmtime"],"latest_commit_sha":null,"homepage":"","language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/microsoft.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":"SECURITY.md","support":"SUPPORT.md","governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-09-26T21:33:16.000Z","updated_at":"2025-04-18T13:36:50.000Z","dependencies_parsed_at":"2024-07-31T01:14:02.857Z","dependency_job_id":"0023a5eb-2aab-430e-a5c8-1f06fa555cf5","html_url":"https://github.com/microsoft/aici","commit_stats":null,"previous_names":["microsoft/aici"],"tags_count":14,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/microsoft%2Faici","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/microsoft%2Faici/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/microsoft%2Faici/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/microsoft%2Faici/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/microsoft","download_url":"https://codeload.github.com/microsoft/aici/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254110374,"owners_count":22016391,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai","inference","language-model","llm","llm-framework","llm-inference","llm-serving","llmops","model-serving","rust","transformer","wasm","wasmtime"],"created_at":"2024-08-01T13:01:30.453Z","updated_at":"2025-05-14T09:06:53.646Z","avatar_url":"https://github.com/microsoft.png","language":"Rust","readme":"# Artificial Intelligence Controller Interface (AICI)\n\n**[LLGuidance library](https://github.com/guidance-ai/llguidance) is an actively maintained evolution and specialization of AICI, recommended if all you want is constrained decoding.**\n\nThe Artificial Intelligence Controller Interface (AICI) lets you build Controllers that constrain and direct output of a Large Language Model (LLM) in real time.\nControllers are flexible programs capable of implementing constrained decoding, dynamic editing of prompts and generated text, and coordinating execution across multiple, parallel generations.\nControllers incorporate custom logic during the token-by-token decoding and maintain state during an LLM request. This allows diverse Controller strategies, from programmatic or query-based decoding to multi-agent conversations to execute efficiently in tight integration with the LLM itself.\n\n**The purpose of AICI is to make it easy to build and experiment with both existing and entirely new Controller strategies for improving LLM generations.**\nBy abstracting away implementation details of the underlying LLM inference and serving engine, AICI aims to simplify the development of Controllers, make it easier to \nwrite fast Controllers, and ease compatibility across LLM inference and serving engines.\n\nAICI is designed for both local and cloud execution, including (eventually) multi-tenant LLM deployments.\nControllers are implemented as light-weight WebAssembly (Wasm) modules which run on the same machine as the LLM inference engine, utilizing the CPU while the GPU is busy with token generation.\nAICI is one layer in the inference stack, and is designed to allow control libraries such as Guidance, LMQL, and others to run on top of it and gain both efficiency and performance improvements, as well as portability across LLM inference and serving engines.\n\nAICI currently integrates with llama.cpp, HuggingFace Transformers, and rLLM (custom tch-based LLM inference engine), with vLLM in the works.\n\nAICI is:\n\n- [Flexible](#flexibility): Controllers can be written in any language that can compile to Wasm (Rust, C, C++, ...),\n  or be interpreted inside Wasm (Python, JavaScript, ...)\n- [Secure](#security): Controllers are sandboxed and cannot access the filesystem, network, or any other resources\n- [Fast](#performance): Wasm modules are compiled to native code and run in parallel with the LLM inference engine, inducing only a\n  minimal overhead to the generation process\n\nAICI is a prototype, designed and built at [Microsoft Research](https://www.microsoft.com/en-us/research/).\n\n# Table of Contents\n\n- [Artificial Intelligence Controller Interface (AICI)](#artificial-intelligence-controller-interface-aici)\n- [QuickStart: Example Walkthrough](#quickstart-example-walkthrough)\n  - [Development Environment Setup](#development-environment-setup)\n  - [Build and start rLLM server and AICI Runtime](#build-and-start-rllm-server-and-aici-runtime)\n  - [Control AI output using AICI controllers](#control-ai-output-using-aici-controllers)\n- [Comprehensive Guide: Exploring Further](#comprehensive-guide-exploring-further)\n- [Architecture](#architecture)\n- [Security](#security)\n- [Performance](#performance)\n- [Flexibility](#flexibility)\n- [Acknowledgements](#acknowledgements)\n- [Contributing](#contributing)\n- [Trademarks](#trademarks)\n\n# QuickStart: Example Walkthrough\n\nIn this quickstart, we'll guide you through the following steps:\n\n* Set up **rLLM Server** and **AICI Runtime**.\n* Build and deploy a **Controller**.\n* Use AICI to control LLM output, so you can **customize a LLM to follow specific rules** when generating text.\n\n## Development Environment Setup\n\nTo compile AICI components, you need to set up your development environment for Rust. For this quickstart you also need Python 3.11 or later to create a controller.\n\n### Windows WSL / Linux / macOS\n\n\u003e [!NOTE]\n\u003e **Windows users**: please use WSL2 or the included [devcontainer](https://containers.dev). Adding native Windows support [is tracked here](https://github.com/microsoft/aici/issues/42).\n\u003e \n\u003e **MacOS users**: please make sure you have XCode command line tools installed by running `xcode-select -p` and, if not installed, run `xcode-select --install`.\n\u003e\n\u003e **CUDA**: the CUDA build relies on specific libtorch installation. It's highly recommended you use the included devcontainer.\n\nIf you're using devcontainer, you can skip to the [next section](#build-and-start-rllm-server-and-aici-runtime).\n\nUsing the system package manager, install the necessary tools for building code in the repository, including `git`, `cmake` and `ccache`. \n\nFor instance in WSL / Ubuntu using `apt`:\n\n    sudo apt-get install --assume-yes --no-install-recommends \\\n        build-essential cmake ccache pkg-config libssl-dev libclang-dev clang llvm-dev git-lfs\n\nor using Homebrew on macOS:\n\n    brew install git cmake ccache\n\nThen install **Rust, Rustup and Cargo**, following the instructions provided [here](https://doc.rust-lang.org/cargo/getting-started/installation.html) and [here](https://www.rust-lang.org/learn/get-started):\n\n    curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh\n\nAfter installation, verify that the `rustup --version` command is accessible by running it from the terminal. If the command isn't recognized, try opening a new terminal session.\n  \nNext install wasm32-wasi Rust component:\n    \n    rustup target add wasm32-wasi\n\nIf you already had Rust installed, or are getting complaints from Cargo about outdated versions, run:\n\n    rustup update\n\nLast, to work with **Python** controllers and scripts (like this tutorial), run this command to install the required packages:\n\n    pip install pytest pytest-forked ujson posix_ipc numpy requests\n\n\n## Build and start rLLM server and AICI Runtime\n\nThe rLLM server has two backends, one based on `libtorch` and CUDA\n(`rllm-cuda`), and the other based on `llama.cpp` (`rllm-llamacpp`).\n\nThe `rllm-cuda` backend only works with NVidia GPUs with compute capability 8.0 or later\n(A100 and later; RTX 30x0 and later) and requires a fiddly setup of libtorch\n-- it's strongly recommended to use the included devcontainer.\nWhile this guide focuses on the `rllm-llamacpp` backend,\nthe build steps are the same for `rllm-cuda`, modulo the folder name.\n\nAfter [dev env setup](#development-environment-setup) above,\nclone the AICI repository and proceed with the next steps outlined below.\n\nUse the following command to build and run `aicirt` and `rllm-llamacpp`:\n\n    cd rllm/rllm-llamacpp\n    ./server.sh phi2\n\nYou can pass other model names as argument (run `./server.sh` without arguments to see available models).\nYou can also use a HuggingFace URL to `.gguf` file or a local path to a `.gguf` file.\n(For `rllm-cuda` use HuggingFace model id or path to folder).\n\n    ./server.sh orca\n\nYou can find more details about `rllm-llamacpp` [here](rllm/rllm-llamacpp/README.md).\n\nThe rLLM server provides a HTTP interface, utilized for configuration tasks and processing requests. You can also use this interface to promptly verify its status. For instance, if you open http://127.0.0.1:4242/v1/models, you should see:\n\n```json\n{\n  \"object\": \"list\",\n  \"data\": [\n    {\n      \"object\": \"model\",\n      \"id\": \"TheBloke/phi-2-GGUF\",\n      \"created\": 946810800,\n      \"owned_by\": \"owner\"\n    }\n  ]\n}\n```\n\nconfirming that the selected model is loaded.\n\n## Control AI output using AICI controllers\n\nAICI allows hosting custom logic, called **Controllers**, that initiate, terminate, and interact with LLMs token generation. Controllers take input arguments, process them, and return a result with logs, LLM tokens, and variables.\n\nThe repository includes some examples, in particular:\n\n* **jsctrl**: a controller that accepts JavaScript code as input for execution. This code can interact with the model to generate text and tokens.\n* **pyctrl**: a controller that accepts Python code as input for execution. This code can also interact with the model to generate text and tokens.\n\nIn this example we'll utilize **pyctrl** to manage token generation using a simple **Python script**.\nIf you want, you can [build and upload pyctrl](./controllers/pyctrl/README.md),\nhowever by default the server will automatically\ndownload the [latest release](https://github.com/microsoft/aici/releases/latest) of pyctrl from GitHub.\n\nIn general, controllers require building and deployment, while scripts (Python or JavaScript) are sent with each request.\n\nThe following illustrates the relationship between the rLLM server, the AICI runtime, and the controller:\n\n```mermaid\nerDiagram\n    Host    ||--|{ CPU : \"\"\n    Host    ||--|{ GPU : \"\"\n    \n    CPU     ||--|| \"rLLM Server\" : execute\n    CPU     ||--|{ \"AICI Runtime\" : execute\n\n    \"AICI Runtime\" ||--|| \"Controller\" : instantiate\n\n    GPU     ||--|{ \"LLM token generation\" : execute\n```\n\n### Controlling the LLM token generation\n\nSuppose we aim for a model to generate a list, adhering to a specific format and containing only five items.\n\nTypically, achieving this involves prompt engineering, crafting the prompt precisely with clear instructions, such as:\n\n    What are the five most popular types of vehicles?\n    Return the result as a numbered list.\n    Do not add explanations, only the list.\n\nThe prompt would also vary depending on the model in use, given that each model tends to add explanations and understands instructions in different ways.\n\nWith AICI, we shift control back to code, and we can simplify the prompt to:\n\n    What are the most popular types of vehicles?\n\nusing code to:\n\n1. Limit the list to 5 items\n2. Prevent the model from adding some initial explanation\n3. Format to a numbered list\n4. Stop the model from adding some text after the list.\n\nLet's create a `list-of-five.py` python file with the following content:\n\n```python\nimport pyaici.server as aici\n\n# Force the model to generate a well formatted list of 5 items, e.g.\n#   1. name 1\n#   2. name 2\n#   3. name 3\n#   4. name 4\n#   5. name 5\nasync def main():\n    \n    # This is the prompt we want to run.\n    # Note how the prompt doesn't mention a number of vehicles or how to format the result.\n    prompt = \"What are the most popular types of vehicles?\\n\"\n\n    # Tell the model to generate the prompt string, ie. let's start with the prompt \"to complete\"\n    await aici.FixedTokens(prompt)\n\n    # Store the current position in the token generation process\n    marker = aici.Label()\n\n    for i in range(1,6):\n      # Tell the model to generate the list number\n      await aici.FixedTokens(f\"{i}.\")\n\n      # Wait for the model to generate a vehicle name and end with a new line\n      await aici.gen_text(stop_at = \"\\n\")\n\n    await aici.FixedTokens(\"\\n\")\n\n    # Store the tokens generated in a result variable\n    aici.set_var(\"result\", marker.text_since())\n\naici.start(main())\n```\n\nRunning the script is not too different from sending a prompt. In this case, we're sending control logic and instructions all together.\n\nTo see the final result, execute the following command:\n\n    ./aici.sh run list-of-five.py\n\nResult:\n```\nRunning with tagged AICI Controller: gh:microsoft/aici/pyctrl\n[0]: FIXED 'What are the most popular types of vehicles?\\n'\n[0]: FIXED '1.'\n[0]: GEN ' Cars\\n'\n[0]: FIXED '2.'\n[0]: GEN ' Motorcycles\\n'\n[0]: FIXED '3.'\n[0]: GEN ' Bicycles\\n'\n[0]: FIXED '4.'\n[0]: GEN ' Trucks\\n'\n[0]: FIXED '5.'\n[0]: GEN ' Boats\\n'\n[0]: FIXED '\\n'\n[DONE]\n[Response] What are the most popular types of vehicles?\n1. Cars\n2. Motorcycles\n3. Bicycles\n4. Trucks\n5. Boats\n\nresponse saved to tmp/response.json\nUsage: {'sampled_tokens': 16, 'ff_tokens': 37, 'cost': 69}\nTiming: {'http_response': 0.05193686485290527, 'data0': 0.05199289321899414, 'first_token': 0.0658726692199707, 'last_token': 0.1784682273864746}\nTokens/sec: {'prompt': 861.0913072488067, 'sampling': 89.65181217019571}\nStorage: {'result': '1. Cars\\n2. Motorcycles\\n3. Bicycles\\n4. Trucks\\n5. Boats\\n\\n'}\n```\n\n# Comprehensive Guide: Exploring Further\n\nThis repository contains a number of components, and which ones you need depends on your use case.\n\nYou can **use an existing controller module**.\nWe provide [PyCtrl](./controllers/pyctrl) and [JsCtrl](./controllers/jsctrl)\nthat let you script controllers using server-side Python and JavaScript, respectively.\nThe [pyaici](./py/pyaici) package contains `aici` command line tool that lets you\n[upload and run scripts](./docs/proxy.md) with any controller\n(we also provide [REST API definition](./docs/REST.md) for the curious).\n\u003e 🧑‍💻[Python code samples for scripting PyCtrl](./controllers/pyctrl) and a [JavaScript Hello World for JSCtrl](./controllers/jsctrl/samples/hello.js)\n\nWe anticipate [libraries](#architecture) will be built on top of controllers.\nWe provide an example in [promptlib](./py/promptlib) - a client-side Python library\nthat generates interacts with [DeclCtrl](./controllers/declctrl) via the pyaici package.\n\u003e 🧑‍💻 [Example notebook that uses PromptLib to interact with DeclCtrl](./py/promptlib/notebooks/basics_tutorial.ipynb).\n\nThe controllers can be run in a cloud or local AICI-enabled LLM inference engine.\nYou can **run the provided reference engine (rLLM) locally** with either\n[libtorch+CUDA](./rllm/rllm-cuda) or [llama.cpp backend](./rllm/rllm-llamacpp).\n\nTo **develop a new controller**, use a Rust [starter project](./controllers/uppercase) that shows usage of [aici_abi](./controllers/aici_abi)\nlibrary, which simplifies implementing the [low-level AICI interface](controllers/aici_abi/README.md#low-level-interface).\n\u003e 🧑‍💻[Sample code for a minimal new controller](./controllers/uppercase) to get you started\n\nTo **add AICI support to a new LLM inference engine**,\nyou will need to implement LLM-side of the [protocol](docs/aicirt-proto.md)\nthat talks to [AICI runtime](aicirt).\n\nFinally, you may want to modify any of the provided components - PRs are most welcome!\n\n# Architecture\n\nAICI abstracts LLM inference engine from the controller and vice-versa, as in the picture below.\nThe rounded nodes are aspirational.\nAdditional layers can be built on top - we provide [promptlib](py/promptlib),\nbut we strongly believe that\n[Guidance](https://github.com/guidance-ai/guidance),\n[LMQL](https://lmql.ai/),\n[SGLang](https://github.com/sgl-project/sglang),\n[Outlines](https://github.com/outlines-dev/outlines),\n[jsonformer](https://github.com/1rgs/jsonformer),\n[LMFE](https://github.com/noamgat/lm-format-enforcer),\netc.\ncan also run on top of AICI (either with custom controllers or utilizing PyCtrl or JsCtrl).\n\n```mermaid\ngraph TD\n    PyCtrl -- AICI --\u003e aicirt[AICI-runtime]\n    JsCtrl -- AICI --\u003e aicirt\n    guidance([GuidanceCtrl]) -- AICI --\u003e aicirt\n    lmql([LMQL Ctrl]) -- AICI --\u003e aicirt\n    aicirt -- POSIX SHM --\u003e rLLM\n    aicirt -- POSIX SHM --\u003e llama[llama.cpp]\n    aicirt -- POSIX SHM --\u003e pyaici\n    pyaici -- Python --\u003e vLLM(vLLM)\n    pyaici -- Python --\u003e hf[HF Transformers]\n```\n\nThe [pyaici](py/pyaici) package makes it easier to integrate AICI with Python-based LLM inference engines.\nTake a look at integration with [HuggingFace Transformers](scripts/py/run_hf.py),\nthough note that it doesn't support forking (generation of multiple sequences in parallel).\nThe [vLLM REST server](scripts/py/vllm_server.py) is currently out of date.\nPlease use the [rLLM-cuda](rllm/rllm-cuda) or [rLLM-llama.cpp](rllm/rllm-llamacpp) for now.\n\n# Security\n\n- `aicirt` runs in a separate process, and can run under a different user than the LLM engine\n- Wasm modules are [sandboxed by Wasmtime](https://docs.wasmtime.dev/security.html)\n- Wasm only have access to [`aici_host_*` functions](controllers/aici_abi/src/host.rs),\n  implemented in [hostimpl.rs](aicirt/src/hostimpl.rs)\n- `aicirt` also exposes a partial WASI interface; however almost all the functions are no-op, except\n  for `fd_write` which shims file descriptors 1 and 2 (stdout and stderr) to print debug messages\n- each Wasm module runs in a separate process, helping with Spectre/Meltdown mitigation\n  and allowing limits on CPU usage\n\nIn particular, Wasm modules cannot access the filesystem, network, or any other resources.\nThey also cannot spin threads or access any timers (this is relevant for Spectre/Meltdown attacks).\n\n# Performance\n\nMost of computation in AICI Controllers occurs on the CPU, in parallel with the logit generation on the GPU.\nThe generation occurs in steps, where logits are generated in parallel for a new token for each sequence in a batch\n(typically between 1 and 50).\nThis involves reading the whole model and KV caches for sequences in the batch from the GPU memory.\nFor optimal batch throughput, the model and KV caches should utilize a major fraction of the GPU memory,\nand reading the whole memory takes about 40ms on A100 GPU (80GB).\n\nThus, each step of generation takes on the order of 20-50ms.\nWith careful engineering,\nthis is more than enough to compute the set of allowed tokens in Rust compiled to Wasm.\nThese can be combined either natively in Rust, or via Python or JavaScript interpreters\nwe provide.\n\nFor example, computing allowed token set in the 32000-strong vocabulary of Llama model takes:\n\n- about 2.0ms for Yacc grammar of the C programming language\n- about 0.3ms for a regular expression\n- about 0.2ms for a substring constraint, from 4kB string\n\nThe above numbers are for a single sequence, however each sequence is processed in separate process,\nand thus if there is more cores than sequences (which is typical), they do not change.\nThey also include overhead of calling into Python interpreter implemented in Wasm, and then back into\nRust-generated Wasm code for the constraint itself.\nThey are all well within the 20-50ms budget, so do not affect the generation time at all.\n\nThere is also some overhead in the critical path of sampling. It comes down to about 0.3ms per generation step\nwhen executing 10 sequences in parallel (this is irrespective of the constraint used).\nThe overhead goes up to around 0.7ms for 40 sequences (though it has not been fully optimized yet).\n\nWebAssembly is designed to have minimal overhead, compared to native code.\nIn our experience, [highly optimized](controllers/aici_abi/implementation.md#token-trie)\nRust code is less than 2x slower when run in\n[Wasmtime](https://wasmtime.dev/) than native.\nThis is 10-100x better than JavaScript or Python.\n\nAll measurements done on AMD EPYC 7V13 with nVidia A100 GPU with 80GB of VRAM.\n\n# Flexibility\n\nThe low-level interface that AICI runtime provides allows for:\n\n- interaction with the LLM inference engine before, during, and after every generated token\n- constraining decoding to a set of tokens\n- backtracking KV-cache to a previous state\n- fast-forwarding several tokens at a time (if they are known)\n- forking generation into multiple branches\n- communication between forks via shared variables\n- utility functions for converting between tokens and byte strings\n\nIt can be utilized from any language that compiles to Wasm.\n\nThis repository provides a Rust library that makes it easy to implement controllers in Rust,\nand provides [efficient implementations](controllers/aici_abi/implementation.md)\nof specific constraints ([regular expressions](controllers/aici_abi/README.md#regular-expressions),\n[yacc grammars](controllers/aici_abi/README.md#lr1-grammars), substrings).\nWe also provide [Python](controllers/pyctrl) and [JavaScript](controllers/jsctrl) interpreters\nthat allow to glue these constraints together.\nAll of these can be easily extended.\n\n# Acknowledgements\n\n- [Flash Attention kernels](rllm/tch-cuda/kernels/flash_attn/) are copied from\n  [flash-attention repo](https://github.com/Dao-AILab/flash-attention);\n  see [BSD LICENSE](rllm/tch-cuda/kernels/flash_attn/LICENSE)\n- [Paged Attention kernels](rllm/tch-cuda/kernels/vllm/) are copied from\n  [vLLM repo](https://github.com/vllm-project/vllm);\n  see [Apache LICENSE](rllm/tch-cuda/kernels/vllm/LICENSE)\n- [OpenAI API definitions](rllm/rllm-base/src/server/openai/) are copied and modified from\n  [candle-vllm](https://github.com/EricLBuehler/candle-vllm);\n  see [MIT LICENSE](rllm/rllm-base/src/server/openai/LICENSE)\n- [cache_engine.rs](rllm/rllm-cuda/src/llm/paged/cache_engine.rs),\n  [config.rs](rllm/rllm-base/src/config.rs),\n  and [scheduler.rs](rllm/rllm-base/src/scheduler.rs)\n  are loosely based on [vLLM](https://github.com/vllm-project/vllm)\n- [llama.rs](rllm/rllm-cuda/src/llm/llama.rs), [phi.rs](rllm/rllm-cuda/src/llm/phi.rs)\n  and [logits.rs](rllm/rllm-base/src/logits.rs) are based on\n  [candle-transformers](https://github.com/huggingface/candle/tree/main/candle-transformers)\n- specific [Python library](./controllers/pyctrl/Lib/) files are copied from\n  [RustPython](https://github.com/RustPython/RustPython)\n  (as we only use a subset of them)\n- the [example ANSI C grammar](controllers/aici_abi/grammars/c.y) is based on\n  https://www.lysator.liu.se/c/ANSI-C-grammar-y.html by Jeff Lee (from 1985)\n\n# Citing this package\n\nIf you find the AI Controller Interface and its ideas for defining a new layer in the LLM inference stack useful, please cite the package using the following reference: \n\n* Michal Moskal, Madan Musuvathi, Emre Kıcıman. AI Controller Interface, (2024), GitHub repository. https://github.com/microsoft/aici\n\nBibtex:\n\n```bibtex\n@misc{Moskal2024,\n  author = {Moskal, Michal and Musuvathi, Madan and {K\\i c\\i man}, Emre},\n  title = {{AI Controller Interface}},\n  year = {2024},\n  publisher = {{GitHub}},\n  journal = {{GitHub} repository},\n  howpublished = {\\url{https://github.com/microsoft/aici/}}\n}\n```\n\n# Contributing\n\nThis project welcomes contributions and suggestions. Most contributions require you to agree to a\nContributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us\nthe rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.\n\nWhen you submit a pull request, a CLA bot will automatically determine whether you need to provide\na CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions\nprovided by the bot. You will only need to do this once across all repos using our CLA.\n\nThis project has adopted the [Microsoft Open Source Code of Conduct](https://opensource.microsoft.com/codeofconduct/).\nFor more information see the [Code of Conduct FAQ](https://opensource.microsoft.com/codeofconduct/faq/) or\ncontact [opencode@microsoft.com](mailto:opencode@microsoft.com) with any additional questions or comments.\n\n# Trademarks\n\nThis project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft\ntrademarks or logos is subject to and must follow\n[Microsoft's Trademark \u0026 Brand Guidelines](https://www.microsoft.com/en-us/legal/intellectualproperty/trademarks/usage/general).\nUse of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship.\nAny use of third-party trademarks or logos are subject to those third-party's policies.\n","funding_links":[],"categories":["ai","Rust","A01_文本生成_文本对话","LLM Application Frameworks \u0026 Prompting Libraries","Repos","rust","LLM Frameworks","Libraries"],"sub_categories":["大语言对话模型及数据"],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmicrosoft%2Faici","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmicrosoft%2Faici","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmicrosoft%2Faici/lists"}