{"id":14608230,"url":"https://github.com/Dan-wanna-M/formatron","last_synced_at":"2025-09-06T02:31:22.257Z","repository":{"id":250795117,"uuid":"729290121","full_name":"Dan-wanna-M/formatron","owner":"Dan-wanna-M","description":"Formatron empowers everyone to control the format of language models' output with minimal overhead.","archived":false,"fork":false,"pushed_at":"2024-12-21T05:27:39.000Z","size":6849,"stargazers_count":169,"open_issues_count":9,"forks_count":6,"subscribers_count":1,"default_branch":"master","last_synced_at":"2024-12-25T04:36:50.695Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Dan-wanna-M.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-12-08T20:24:41.000Z","updated_at":"2024-12-21T05:22:47.000Z","dependencies_parsed_at":"2024-09-05T23:53:16.704Z","dependency_job_id":"71761f72-581b-4d57-8385-15e2a2e45a96","html_url":"https://github.com/Dan-wanna-M/formatron","commit_stats":null,"previous_names":["dan-wanna-m/formatron"],"tags_count":17,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Dan-wanna-M%2Fformatron","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Dan-wanna-M%2Fformatron/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Dan-wanna-M%2Fformatron/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Dan-wanna-M%2Fformatron/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Dan-wanna-M","download_url":"https://codeload.github.com/Dan-wanna-M/formatron/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":232079084,"owners_count":18469654,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-09-09T01:00:45.823Z","updated_at":"2025-09-06T02:31:22.232Z","avatar_url":"https://github.com/Dan-wanna-M.png","language":"Python","funding_links":[],"categories":["Python Libraries","structured generation","Libraries"],"sub_categories":[],"readme":"\u003cp align='center'\u003e\n\u003cimage src=\"logo.svg\"\u003e\n\u003c/p\u003e\n\n[![PyPI](https://img.shields.io/pypi/v/formatron.svg)](https://pypi.python.org/pypi/formatron)\n![PyPI Downloads](https://static.pepy.tech/badge/formatron)\n\n**[Formatron's technical report](https://arxiv.org/abs/2506.01151) is now available!**\n\nFormatron allows users to control the output format of language models\nwith minimal overhead. It is lightweight, user-friendly,\nand seamlessly integrates into existing codebases and frameworks.\n\n## Installation\n\n`pip install formatron`\n\n## Features\n\n- **🔗 Popular Library Integrations**: Supports transformers, exllamav2, vllm and RWKV.\n- **🔌 Plugins, not wrappers**:\nInstead of wrapping third-party libraries in large, cumbersome classes,\nFormatron offers convenient, clean plugins for different libraries.\n- **💡 Library, not framework**:\nInstead of unifying everything into a bulky framework,\nFormatron is a flexible library that can be embedded anywhere.\n- **✍️ Fluent Formatting**: Describe your format as easily as writing natural language.\n- **📜 Regex and CFG Support**:\nEffortlessly interleave regular expressions and context-free grammars (CFG) in formats.\n- **⚙️ Efficient JSON Generation**: Feature-complete JSON generation based on Pydantic models or json schemas.\n- **📤 Batched Inference**:\nFreely specify different formats for each sequence in one batch!\n- **🚀 Minimal Runtime Overhead**:\nWith Leo optimization, a specialized compacting algorithm,\nand CFG caches across generations, Earley algorithm implemented in Rust is\naymptotically and practically the fastest algorithm.\n- **🔧 Customizable**: Everything is configurable, including schema generation,\ngrammar generation, and post-generation processing (such as function calls).\n\n## Comparison to other libraries\n\n| Capability                                   | Formatron                          | [LM Format Enforcer](https://github.com/noamgat/lm-format-enforcer)                           | [Guidance](https://github.com/guidance-ai/guidance) | [Outlines](https://github.com/outlines-dev/outlines)                                    | [LMQL](https://github.com/eth-sri/lmql)                                                         |\n|:---------------------------------------------|------------------------------------|:----------------------------------------------------------------------------------------------|:----------------------------------------------------|:----------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------|\n| Regular Expressions                          | ✅                                  | ✅                                                                                             | ✅                                                   | ✅                                                                                       | 🟡([preview feature](https://lmql.ai/docs/language/constraints.html#regex-constraints-preview)) |\n| Efficient Regex-constrained Generation       | ✅                                  | 🟡([performance issues still exist](https://github.com/noamgat/lm-format-enforcer/issues/36)) | ❌                                                   | 🟡([scalablity currently suffers](https://github.com/outlines-dev/outlines/issues/680)) | ❌                                                                                               |\n| Context Free Grammars(CFG)                   | ✅                                  | ❌                                                                                             | ✅                                                   | 🟡([some bugs exist](https://github.com/outlines-dev/outlines/issues/959))              | ❌                                                                                               |\n| Efficient CFG-constrained Generation         | ✅                                  | ❌                                                                                             | ❌                                                   | ❌                                                                                       | ❌                                                                                               |\n| Custom Format Extractor                      | 🟡([some limitations exist](#ast)) | ❌                                                                                             | ✅                                                   | ✅                                                                                       | ✅                                                                                               |\n| JSON Schema                                  | ✅([indirectly](#json-schema))      | ✅                                                                                             | ✅                                                   | ✅                                                                                       | ❌                                                                                               |\n| Function Call From Callable                  | ✅                                  | ❌                                                                                             | ✅                                                   | ✅                                                                                       | ✅                                                                                               |\n| Interleave Python control flow in generation | ❌                                  | ❌                                                                                             | ✅                                                   | ❌                                                                                       | ✅                                                                                               |\n| Batched Generation                           | ✅                                  | ✅                                                                                             | ❌                                                   | ✅                                                                                       | ❌                                                                                               |\n| Beam Search                                  | ❌                                  | ✅                                                                                             | ❌                                                   | ✅                                                                                       | ✅                                                                                               |\n| Integrates into existing pipelines           | ✅                                  | ✅                                                                                             | ❌                                                   | 🟡([some integrations crash](https://github.com/outlines-dev/outlines/issues/1115))     | ❌                                                                                               |\n| Optional JSON Fields                         | ✅                                  | ✅                                                                                             | ❌                                                   | ❌                                                                                       | ❌                                                                                               |\n| LLM Controls JSON field whitespaces          | ✅                                  | ✅                                                                                             | ❌                                                   | ✅                                                                                       | ❌                                                                                               |\n| LLM Controls JSON field orderings            | ❌                                  | ✅                                                                                             | ❌                                                   | ❌                                                                                       | ❌                                                                                               |\n| JSON Schema with recursive classes           | ✅                                  | ✅                                                                                             | ❌                                                   | ❌                                                                                       | ❌                                                                                               |\n|Extractive generation(substringOf)           | ✅                                  | ❌                                                                                             | ✅                                                   | ❌                                                                                       | ❌                                                                                               |\n\nFeel free to open up an [issue](https://github.com/Dan-wanna-M/formatron/issues) if something is missing or incorrect!\n\n## Examples\n\n### Regex-constrained Generation\n\n```python\nimport torch\nfrom formatron.integrations.transformers import create_formatter_logits_processor_list\nfrom formatron.formatter import FormatterBuilder\nfrom transformers import AutoModelForCausalLM\nimport transformers\ntorch.manual_seed(514)\nmodel = AutoModelForCausalLM.from_pretrained(\"microsoft/Phi-3-mini-128k-instruct\",\n                                                device_map=\"cuda\",\n                                                torch_dtype=torch.float16)\ntokenizer = transformers.AutoTokenizer.from_pretrained(\n    \"microsoft/Phi-3-mini-128k-instruct\")\n\nf = FormatterBuilder()\ndigit = f.regex('([1-9][0-9]*)', capture_name='digit')\nf.append_line(f\"My favorite integer is {digit}.\")\nf.append_str(f\"I think integer {digit} is also very interesting.\")\nlogits_processor = create_formatter_logits_processor_list(tokenizer, f)\ninputs = tokenizer([\"\"\"\u003c|system|\u003e\nYou are a helpful assistant.\u003c|end|\u003e\n\u003c|user|\u003eWhich integer is your favourite?\u003c|end|\u003e\n\u003c|assistant|\u003e\"\"\"], return_tensors=\"pt\").to(\"cuda\")\nprint(tokenizer.batch_decode(model.generate(**inputs, top_p=0.5, temperature=1,\n                                            max_new_tokens=100, logits_processor=logits_processor)))\nprint(logits_processor[0].formatters_captures)\n# possible output:\n# [{'digit': [\u003cre.Match object; span=(0, 2), match='42'\u003e, \u003cre.Match object; span=(0, 2), match='42'\u003e]}]\n```\n\nNote that only\n[Rust regex's syntax](https://docs.rs/regex/latest/regex/#syntax) is supported, which notably\ndoes not include arbitrary lookaheads.\n\n### Json Generation\n\n#### Pydantic Model\n\n```python\nimport torch\nfrom formatron.integrations.transformers import create_formatter_logits_processor_list\nfrom formatron.formatter import FormatterBuilder\nfrom transformers import AutoModelForCausalLM\nimport transformers\nfrom formatron.schemas.dict_inference import infer_mapping\ntorch.manual_seed(520)\nmodel = AutoModelForCausalLM.from_pretrained(\"microsoft/Phi-3-mini-128k-instruct\",\n                                                device_map=\"cuda\",\n                                                torch_dtype=torch.float16)\ntokenizer = transformers.AutoTokenizer.from_pretrained(\n    \"microsoft/Phi-3-mini-128k-instruct\")\n\nf = FormatterBuilder()\nschema = infer_mapping({\"name\": \"foo\", \"age\": 28})\nf.append_line(f\"{f.json(schema, capture_name='json')}\")\nlogits_processor = create_formatter_logits_processor_list(tokenizer, f)\ninputs = tokenizer([\"\"\"\u003c|system|\u003e\nYou are a helpful assistant.\u003c|end|\u003e\n\u003c|user|\u003eI am 周明瑞. My age is 24. Extract information from this sentence into json.\u003c|end|\u003e\n\u003c|assistant|\u003e\"\"\"], return_tensors=\"pt\").to(\"cuda\")\nprint(tokenizer.batch_decode(model.generate(**inputs, top_p=0.5, temperature=1,\n                                            max_new_tokens=100, logits_processor=logits_processor)))\nprint(logits_processor[0].formatters_captures)\n# possible output:\n# [{'json': {'name': '周明瑞', 'age': 34}}]\n```\n\n#### Json Example\n\n```python\nfrom formatron.schemas.pydantic import ClassSchema\nfrom formatron.integrations.transformers import create_formatter_logits_processor_list\nfrom formatron.formatter import FormatterBuilder\nfrom transformers import AutoModelForCausalLM\nimport transformers\nimport torch\n\nclass Goods(ClassSchema):\n    name: str\n    price: float\n    remaining: int\n\ntorch.manual_seed(520)\nmodel = AutoModelForCausalLM.from_pretrained(\"microsoft/Phi-3-mini-128k-instruct\",\n                                                device_map=\"cuda\",\n                                                torch_dtype=torch.float16)\ntokenizer = transformers.AutoTokenizer.from_pretrained(\n    \"microsoft/Phi-3-mini-128k-instruct\")\n\nf = FormatterBuilder()\nschema = Goods\nf.append_line(f\"{f.json(schema, capture_name='json')}\")\nlogits_processor = create_formatter_logits_processor_list(tokenizer, f)\ninputs = tokenizer([\"\"\"\u003c|system|\u003e\nYou are a helpful assistant.\u003c|end|\u003e\n\u003c|user|\u003eWe have 14 apples left with each price 14.4$. Extract information from this sentence into json.\u003c|end|\u003e\n\u003c|assistant|\u003e\"\"\"], return_tensors=\"pt\").to(\"cuda\")\nprint(tokenizer.batch_decode(model.generate(**inputs, top_p=0.5, temperature=1,\n                                            max_new_tokens=100, logits_processor=logits_processor)))\nprint(logits_processor[0].formatters_captures)\n# possible output:\n# [{'json': Goods(name='apples', price=14.4, remaining=14)}]\n```\n\n### Batched Inference\n\n```python\nimport transformers\nfrom transformers import GPT2LMHeadModel\n\nfrom formatron.formatter import FormatterBuilder\nfrom formatron.integrations.transformers import create_formatter_logits_processor_list\nf = FormatterBuilder()\nf.append_line(f\"Hello, Huggingface!\")\nf3 = FormatterBuilder()\nf3.append_line(\"Hello, Huggingface! Hello, Huggingface!\")\nmodel = GPT2LMHeadModel.from_pretrained(\"openai-community/gpt2\")\ntokenizer = transformers.AutoTokenizer.from_pretrained(\"openai-community/gpt2\",\n                                                       padding_side='left')\ntokenizer.pad_token = tokenizer.eos_token  # Needed for padding\nmodel.generation_config.pad_token_id = tokenizer.pad_token_id\nlogits_processor = create_formatter_logits_processor_list(tokenizer, [f, f, f3])\ninputs = tokenizer([\"I am GPT2. \", \"I am another GPT2. \", \"I am yet another GPT2. \"], return_tensors=\"pt\",\n                   padding=True)\nprint(tokenizer.batch_decode(model.generate(**inputs,\n                                            max_new_tokens=100,\n                                            logits_processor=logits_processor),\n                             skip_special_tokens=True))\n```\n\n### Function Calls\n\n```python\nimport torch\nfrom formatron import schemas\nfrom formatron.formatter import FormatterBuilder\nfrom transformers import AutoModelForCausalLM\nimport transformers\nfrom formatron.integrations.transformers import create_formatter_logits_processor_list\n\n@schemas.pydantic.callable_schema\ndef add(a: int, b: int, /, *, c: int):\n    return a + b + c\n\nmodel = AutoModelForCausalLM.from_pretrained(\"NurtureAI/Meta-Llama-3-8B-Instruct-32k\",\n                                                device_map=\"cuda\",\n                                                torch_dtype=torch.float16)\ntokenizer = transformers.AutoTokenizer.from_pretrained(\n    \"NurtureAI/Meta-Llama-3-8B-Instruct-32k\")\ninputs = tokenizer([\"\"\"\u003c|system|\u003e\nYou are a helpful assistant.\u003c|end|\u003e\n\u003c|user|\u003ea is 1, b is 6 and c is 7. Generate a json containing them.\u003c|end|\u003e\n\u003c|assistant|\u003e\"\"\"], return_tensors=\"pt\").to(\"cuda\")\nf = FormatterBuilder()\nf.append_line(f\"{f.json(add, capture_name='json')}\")\nlogits_processor = create_formatter_logits_processor_list(tokenizer, f)\nprint(tokenizer.batch_decode(model.generate(**inputs, top_p=0.5, temperature=1,\n                                            max_new_tokens=100, logits_processor=logits_processor)))\nprint(logits_processor[0].formatters_captures)\n# possible output:\n# [{'json': 14}]\n```\n\n### CFG-Constrained generation\n\nContext free grammars use [kbnf's syntax](https://docs.rs/kbnf/latest/kbnf/#kbnf-grammar) which is a variant of EBNF.\nSince formatron uses [kbnf](https://github.com/Dan-wanna-M/kbnf?tab=readme-ov-file#features) under the hood, all kbnf's claims on performance hold.\n\n```python\nimport torch\nfrom formatron.formatter import FormatterBuilder\nfrom transformers import AutoModelForCausalLM\nimport transformers\nfrom formatron.integrations.transformers import create_formatter_logits_processor_list\nfrom formatron.extractor import NonterminalExtractor\nimport typing\n\nclass ArithmeticExpressionExtractor(NonterminalExtractor):\n    def __init__(self, nonterminal: str, capture_name: typing.Optional[str] = None):\n        super().__init__(nonterminal, capture_name)\n\n    def extract(self, input_str: str) -\u003e typing.Optional[tuple[str, typing.Any]]:\n        i = 0\n        left_bracket = 0\n        while i \u003c len(input_str):\n            if input_str[i].isdigit() or input_str[i] in \"+-*/.\":\n                i += 1\n                continue\n            if input_str[i] == \"(\":\n                i += 1\n                left_bracket += 1\n                continue\n            if input_str[i] == \")\":\n                i += 1\n                left_bracket -= 1\n                continue\n            else:\n                break\n        if left_bracket != 0:\n            return None\n        return input_str[i:], input_str[:i]\n\n    @property\n    def kbnf_definition(self) -\u003e str:\n        return  \"\"\"\nexpression ::=  term { (\"+\" | \"-\") term };\nterm       ::= factor { (\"*\" | \"/\") factor };\nfactor     ::= number | \"(\" expression \")\";\nnumber     ::= #\"[0-9]+(\\\\\\\\.[0-9]+)?\";\n\"\"\".replace(\"expression\", self.nonterminal)\n\nmodel = AutoModelForCausalLM.from_pretrained(\"NurtureAI/Meta-Llama-3-8B-Instruct-32k\",\n                                                device_map=\"cuda\",\n                                                torch_dtype=torch.float16)\ntokenizer = transformers.AutoTokenizer.from_pretrained(\n    \"NurtureAI/Meta-Llama-3-8B-Instruct-32k\")\ninputs = tokenizer([\"\"\"\u003c|system|\u003e\n    You are a helpful assistant.\u003c|end|\u003e\n    \u003c|user|\u003eRepeat it: ((32+43)*114)\u003c|end|\u003e\n    \u003c|assistant|\u003e((32+43)*114)\u003c|end|\u003e\n    \u003c|user|\u003eRepeat it: ((32+43)*(114-514))\u003c|end|\u003e\n    \u003c|assistant|\u003e\"\"\"], return_tensors=\"pt\").to(\"cuda\")\nf = FormatterBuilder()\nf.append_line(\n    f\"{f.extractor(lambda nonterminal: ArithmeticExpressionExtractor(nonterminal, 'json'))}\")\nlogits_processor = create_formatter_logits_processor_list(tokenizer, f)\nprint(tokenizer.batch_decode(model.generate(**inputs, top_p=0.5, temperature=1,\n                                            max_new_tokens=100, logits_processor=logits_processor)))\nprint(logits_processor[0].formatters_captures)\n# possible output: [{'json': '(((32+43)*(114-514)))*1.5'}]\n```\n\n### Json Schema\n\nFormatron supports a subset of json schemas that cover most useful features natively.\n\n```python\nfrom formatron.schemas import json_schema\nfrom formatron.integrations.transformers import create_formatter_logits_processor_list\nfrom formatron.formatter import FormatterBuilder\nfrom transformers import AutoModelForCausalLM\nimport transformers\nimport torch\n\nschema = {\n    \"$id\": \"https://example.com/person.json\",\n    \"$schema\": \"https://json-schema.org/draft/2020-12/schema\",\n    \"type\": \"object\",\n    \"properties\": {\n        \"name\": {\n            \"type\": \"string\"\n        },\n        \"age\": {\n            \"type\": \"integer\"\n        }\n    },\n    \"required\": [\"name\", \"age\"]\n}\nschema = json_schema.create_schema(schema)\ntorch.manual_seed(520)\nmodel = AutoModelForCausalLM.from_pretrained(\"microsoft/Phi-3-mini-128k-instruct\",\n                                                device_map=\"cuda\",\n                                                torch_dtype=torch.float16)\ntokenizer = transformers.AutoTokenizer.from_pretrained(\n    \"microsoft/Phi-3-mini-128k-instruct\")\n\nf = FormatterBuilder()\nf.append_line(f\"{f.json(schema, capture_name='json')}\")\nlogits_processor = create_formatter_logits_processor_list(tokenizer, f)\ninputs = tokenizer([\"\"\"\u003c|system|\u003e\nYou are a helpful assistant.\u003c|end|\u003e\n\u003c|user|\u003eExtract information from this sentence into json: my name is Genov and I am 28 years old.\u003c|end|\u003e\n\u003c|assistant|\u003e```\"\"\"], return_tensors=\"pt\").to(\"cuda\")\nprint(tokenizer.batch_decode(model.generate(**inputs, top_p=0.5, temperature=1,\n                                            max_new_tokens=100, logits_processor=logits_processor)))\nprint(logits_processor[0].formatters_captures)\n# possible output:\n# [{'json': {'name': 'Genov', 'age': 28}}]\n```\n\n### Extractive generation\n\nStarting from `v0.4.7`, extractive generation is supported with suffix automata. This means that you can constrain the output to be a substring of a given input.\n\n```python\nfrom formatron.integrations.transformers import create_formatter_logits_processor_list\nfrom formatron.formatter import FormatterBuilder\nfrom transformers import AutoModelForCausalLM\nimport transformers\nimport torch\n\ntorch.manual_seed(520)\nmodel = AutoModelForCausalLM.from_pretrained(\"microsoft/Phi-3-mini-128k-instruct\",\n                                                device_map=\"cuda\",\n                                                torch_dtype=torch.float16)\ntokenizer = transformers.AutoTokenizer.from_pretrained(\n    \"microsoft/Phi-3-mini-128k-instruct\")\n\nf = FormatterBuilder()\nf.append_line(f\"{f.substr('The quick brown fox jumps over the lazy dog.', capture_name='animal')}\")\nlogits_processor = create_formatter_logits_processor_list(tokenizer, f)\ninputs = tokenizer([\"\"\"\u003c|system|\u003e\nYou are a helpful assistant.\u003c|end|\u003e\n\u003c|user|\u003eWhat animal is mentioned in the phrase \"The quick brown fox jumps over the lazy dog\"?\u003c|end|\u003e\n\u003c|assistant|\u003eThe animal mentioned in the phrase is the \"\"\"], return_tensors=\"pt\").to(\"cuda\")\noutput = tokenizer.batch_decode(model.generate(**inputs, top_p=0.5, temperature=1,\n                                                max_new_tokens=100, logits_processor=logits_processor))\nprint(output)\nprint(logits_processor[0].formatters_captures)\n# possible output:\n# [{'animal': 'fox'}]\n```\n\nWhat's more, you can embed fields that need extractive generation into pydantic models or json schemas.\n\n```python\nfrom formatron.schemas.pydantic import ClassSchema\nfrom formatron.integrations.transformers import create_formatter_logits_processor_list\nfrom formatron.schemas.schema import SubstringOf\nfrom formatron.formatter import FormatterBuilder\nfrom transformers import AutoModelForCausalLM\nimport transformers\nimport torch\nimport typing\nfrom pydantic import Field\n\nclass Person(ClassSchema):\n    name: typing.Annotated[str, Field(..., substring_of=\"Alice Bob Charlie David Eve\"), SubstringOf(\"Alice Bob Charlie David Eve\")]\n    age: int\n\ntorch.manual_seed(520)\nmodel = AutoModelForCausalLM.from_pretrained(\"microsoft/Phi-3-mini-128k-instruct\",\n                                                device_map=\"cuda\",\n                                                torch_dtype=torch.float16)\ntokenizer = transformers.AutoTokenizer.from_pretrained(\n    \"microsoft/Phi-3-mini-128k-instruct\")\n\nf = FormatterBuilder()\nf.append_line(f\"{f.json(Person, capture_name='json')}\")\nlogits_processor = create_formatter_logits_processor_list(tokenizer, f)\ninputs = tokenizer([\"\"\"\u003c|system|\u003e\nYou are a helpful assistant.\u003c|end|\u003e\n\u003c|user|\u003eExtract information from this sentence into json: Bob is 32 years old.\u003c|end|\u003e\n\u003c|assistant|\u003e```\"\"\"], return_tensors=\"pt\").to(\"cuda\")\nprint(tokenizer.batch_decode(model.generate(**inputs, top_p=0.5, temperature=1,\n                                            max_new_tokens=100, logits_processor=logits_processor)))\nprint(logits_processor[0].formatters_captures)\n# possible output:\n# [{'json': {'name': 'Bob', 'age': 32}}]\n```\n\n```python\nfrom formatron.schemas import json_schema\nfrom formatron.integrations.transformers import create_formatter_logits_processor_list\nfrom formatron.formatter import FormatterBuilder\nfrom transformers import AutoModelForCausalLM\nimport transformers\nimport torch\n\nschema = {\n    \"$id\": \"https://example.com/animal.json\",\n    \"$schema\": \"https://json-schema.org/draft/2020-12/schema\",\n    \"type\": \"object\",\n    \"properties\": {\n        \"animal\": {\n            \"type\": \"string\",\n            \"substring_of\": \"The quick brown fox jumps over the lazy dog.\"\n        }\n    },\n    \"required\": [\"animal\"]\n}\nschema = json_schema.create_schema(schema)\n\ntorch.manual_seed(520)\nmodel = AutoModelForCausalLM.from_pretrained(\"microsoft/Phi-3-mini-128k-instruct\",\n                                                device_map=\"cuda\",\n                                                torch_dtype=torch.float16)\ntokenizer = transformers.AutoTokenizer.from_pretrained(\n    \"microsoft/Phi-3-mini-128k-instruct\")\n\nf = FormatterBuilder()\nf.append_line(f\"{f.json(schema, capture_name='json')}\")\nlogits_processor = create_formatter_logits_processor_list(tokenizer, f)\ninputs = tokenizer([\"\"\"\u003c|system|\u003e\nYou are a helpful assistant.\u003c|end|\u003e\n\u003c|user|\u003eWhat animal is mentioned in the phrase \"The quick brown fox jumps over the lazy dog\"?\u003c|end|\u003e\n\u003c|assistant|\u003eThe animal mentioned in the phrase is the \"\"\"], return_tensors=\"pt\").to(\"cuda\")\noutput = tokenizer.batch_decode(model.generate(**inputs, top_p=0.5, temperature=1,\n                                                max_new_tokens=100, logits_processor=logits_processor))\nprint(output)\nprint(logits_processor[0].formatters_captures)\n# possible output:\n# [{'json': {'animal': 'fox'}}]\n```\n\n### Integrations\n\nCheck out integration examples in the [tests](https://github.com/Dan-wanna-M/formatron/tree/master/tests) directory.\nYou may also want to check the minimum compatible version in [pyproject.toml](https://github.com/Dan-wanna-M/formatron/blob/master/pyproject.toml).\n\n## API Reference\n\nCheck out the API reference [here](https://dan-wanna-m.github.io/formatron/).\n\n## Benchmark\n\nCheck out the benchmark [here](benchmarks/readme.md).\n\n## What Formatron Won't Do\n\n### Implement an End-to-End Inference Pipeline\n\nEvery library related to large language models(LLM) must consider that LLMs\nare rapidly evolving. Many libraries, such as Guidance, Outlines, and LMQL,\naddress this by offering their own end-to-end inference pipelines,\nwhich are constantly updated to incorporate the latest techniques.\n\nFormatron, however, takes a different approach.\nRather than providing a full-fledged inference pipeline,\nFormatron focuses on being modular and easily embeddable into existing\nand future pipelines.\nWhile this may require users to write a bit more code initially,\nit makes maintaining and updating the pipeline painless in the long run.\n\n## What Formatron Can't Do Now\n\n### Support OpenAI or in general API-based LLM solutions\n\nThey don't support efficient logits masking per token, nullifying most benefits\nof constrained decoding.\n\n### Semantic Validation\n\nAlthough constrained decoding can enforce certain formats\nin generated text, they cannot guarantee that the output aligns\nwith the users' intention. In other words, if the model is inadequate\nor the prompt is poorly written, it's possible to generate well-formatted\nbut meaningless output.\n\n### Context-Sensitive Validation\n\nUnfortunately, many formats require context-sensitive validation.\nFor example, two keys in a JSON object must not be equal to each other.\nUnlike CFGs, there is no efficient, generic algorithm to validate\nsuch constraints. However, for a specific format, it is possible to validate\nthem efficiently with a specialized algorithm. In a future release,\nFormatron will support context-sensitive validation for popular formats like JSON.\n\n### Abstract Syntax Tree (AST) Construction\u003ca id='ast'\u003e\u003c/a\u003e\n\nFormatron uses an Earley recognizer rather than a parser under the hood.\nThis approach allows for more efficient generation and validation\nbut also means that the AST of a given format is not available.\nIn most cases, this is not a problem,\nas it is usually possible to extract the format from the generated string\nusing simple algorithms and then parse it with an existing parser.\nHowever, in some cases, obtaining the AST might be necessary.\nIn a future release, Formatron will support AST construction.\n\n### Process batch logits in parallel\n\nWhile it is *technically possible* to process batch logits in parallel CPU threads\nsince Formatron uses Rust internally, most frameworks sequentially\ncall Formatron's plugin for each logits in a batch. Altering\nthis behaviour requires a breaking change to the frameworks' API or letting\nFormatron take over the control flow. Both options imply\nsubstantial work.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FDan-wanna-M%2Fformatron","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FDan-wanna-M%2Fformatron","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FDan-wanna-M%2Fformatron/lists"}