https://github.com/olivierduchenne/llm_json_schema

Guaranty the output of an LLM to follow a json schema.
https://github.com/olivierduchenne/llm_json_schema

ai generative-ai jsonschema large-language-models llamacpp llm

Last synced: 3 months ago
JSON representation

Guaranty the output of an LLM to follow a json schema.

Host: GitHub
URL: https://github.com/olivierduchenne/llm_json_schema
Owner: olivierDuchenne
License: mit
Created: 2023-11-19T15:53:25.000Z (over 1 year ago)
Default Branch: main
Last Pushed: 2023-12-06T16:00:01.000Z (over 1 year ago)
Last Synced: 2025-03-25T23:24:11.053Z (4 months ago)
Topics: ai, generative-ai, jsonschema, large-language-models, llamacpp, llm
Language: Python
Homepage:
Size: 30.3 KB
Stars: 25
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

# What is LLM_json_schema?

LLM_json_schema can enforce the output of an LLM model to follow a given json schema. The following types are available: string, number, boolean, array, object.

The output is guaranteed to have the correct format.

# Examples

```bash
python3 LLM_json_schema.py \
--model models/Mistral-7B-Instruct-v0.1.gguf \
--json-schema '{"type":"object", "properties":{"country":{"type":"string"}, "capital":{"type":"string"}}}' \
--prompt "What is the capital of France?\n\n"
```

output:
```json
{"country":"France", "capital":"Paris"}
```

```bash
python3 LLM_json_schema.py \
--model models/Mistral-7B-Instruct-v0.1.gguf \
--json-schema '{"type":"array", "items":{"type":"number"}}' \
--prompt "Count until 20.\n\n"
```

output:
```json
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20]
```

# How does it work?

It adds biases to the logits outputted by the LLM to enforce that only valid tokens can be chosen.

# Installation

## Install LLM_json_schema

```bash
cd LLM_json_schema
pip3 install -r requirements.txt
```

## Download an convert an LLM model

Download an LLM model, and convert it to the gguf format.

Example:
```bash
mkdir models
cd models
git clone https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.1
git clone https://github.com/ggerganov/llama.cpp.git
pip install -r llama.cpp/requirements.txt
python3 llama.cpp/convert.py Mistral-7B-Instruct-v0.1 \
--outfile Mistral-7B-Instruct-v0.1.gguf \
--outtype q8_0
cd ..
```

# Usage from CLI

```
usage: LLM_json_schema.py [-h] --model-path MODEL_PATH --prompt PROMPT [--json-schema JSON_SCHEMA]

options:
-h, --help show this help message and exit
--model-path MODEL_PATH
Path to the LLM model in gguf format
--prompt PROMPT Input prompt
--json-schema JSON_SCHEMA
JSON schema to enforce
```

```bash
python3 LLM_json_schema.py --model models/Mistral-7B-Instruct-v0.1.gguf --json-schema '{"type":"object", "properties":{"country":{"type":"string"}, "captial":{"type":"string"}}}' --prompt "What is the capital of France?\n\n"
```

# Usage from Python

```python
from LLM_json_schema import run_inference_constrained_by_json_schema
import os
script_path = os.path.dirname(os.path.realpath(__file__))
model_path=os.environ.get('MODEL_PATH', os.path.join(script_path, "./models/Mistral-7B-Instruct-v0.1.gguf"))
prompt = "\n\n### Instruction:\nWhat is the capital of France?\n\n### Response:\n"
json_schema = {"type":"object", "properties":{"country":{"type":"string"}, "capital":{"type":"string"}}}
for chunk in run_inference_constrained_by_json_schema(model_path=model_path, json_schema=json_schema, prompt=prompt):
print(chunk, end="", flush=True)
print("")
```

# Citation

If you use this work please cite the following:

```
@article{duchenne2023llm_json_schema,
title={LLM Json Schema},
author={Olivier Duchenne},
journal={Github},
url={https://github.com/olivierDuchenne/LLM_json_schema},
year={2023}
}
```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/olivierduchenne/llm_json_schema

Awesome Lists containing this project

README