https://github.com/aastroza/structured-generation-benchmark

Structured Generation Evals
https://github.com/aastroza/structured-generation-benchmark

generative-ai instructor llms modal outlines structured-generation transformers

Last synced: 16 days ago
JSON representation

Structured Generation Evals

Host: GitHub
URL: https://github.com/aastroza/structured-generation-benchmark
Owner: aastroza
License: apache-2.0
Created: 2024-04-05T04:34:43.000Z (over 1 year ago)
Default Branch: main
Last Pushed: 2024-09-25T17:50:38.000Z (about 1 year ago)
Last Synced: 2025-06-03T14:51:56.098Z (4 months ago)
Topics: generative-ai, instructor, llms, modal, outlines, structured-generation, transformers
Language: Jupyter Notebook
Homepage:
Size: 19.3 MB
Stars: 12
Watchers: 3
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

          # structured-generation-benchmark

To use Large Language Models (LLMs) effectively and reliably, it's essential to include structured generation techniques. Being able to get outputs like regular expressions, JSON, or a Pydantic data model is key for making useful software.

But what's the real effect of using libraries like [Outlines](https://github.com/outlines-dev/outlines) or [Instructor](https://github.com/jxnl/instructor/) to achieve that goal?

This repository has put together evaluations to answer this question.

## Function Calling

The ability of the LLM to call functions.

### Datasets

- [Berkeley Function Calling Leaderboard](https://huggingface.co/datasets/gorilla-llm/Berkeley-Function-Calling-Leaderboard/tree/64d44ccd13f3351d17c33951af5ef1bd6e10153c) [April 16, 2024 update]

### Evaluation

- We deployed a [modal function](modal/transformers_outlines.py) to run open-source models using [Transformers](https://github.com/huggingface/transformers) + [Outlines](https://github.com/outlines-dev/outlines).

- We created different [model handlers](evals/bfcl/scripts) to run the [Gorilla BFCL scripts](https://github.com/ShishirPatil/gorilla/tree/c6221060a9d50d0c7e7705f1ac95b9e5c4a95252) [April 6, 2024 version] for the `AST simple` evaluation category.

- We [evaluated](evals/bfcl/score) and reported the [results](evals/bfcl/result) comparing them with the [Leaderboard Website](https://github.com/ShishirPatil/gorilla/tree/46e959b73be6a40c233e36c71c268ce3a9eabe36) [April 26, 2024 version].

### Reports

- [Outlines Function Calling Evaluation](reports/bfcl_outlines.md)

- [Instructor Function Calling Evaluation](reports/bfcl_instructor.md)

## Synthetic Data Generation

Using an LLM to create artificial data.

### Reports

- [Outlines Synthetic Data Generation](reports/sdg_outlines.md)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/aastroza/structured-generation-benchmark

Awesome Lists containing this project

README