https://github.com/sap-samples/llm-round-trip-correctness

This repo provides code for evaluation of llm round-trip-correctness on text to process model and vice versa
https://github.com/sap-samples/llm-round-trip-correctness

benchmarking business evaluation-framework genai processes round-trip-correctness

Last synced: about 1 year ago
JSON representation

This repo provides code for evaluation of llm round-trip-correctness on text to process model and vice versa

Host: GitHub
URL: https://github.com/sap-samples/llm-round-trip-correctness
Owner: SAP-samples
License: apache-2.0
Created: 2024-05-29T07:41:19.000Z (almost 2 years ago)
Default Branch: main
Last Pushed: 2025-03-07T13:59:49.000Z (about 1 year ago)
Last Synced: 2025-03-27T05:41:36.461Z (about 1 year ago)
Topics: benchmarking, business, evaluation-framework, genai, processes, round-trip-correctness
Language: Jupyter Notebook
Homepage:
Size: 4.45 MB
Stars: 3
Watchers: 6
Forks: 1
Open Issues: 2
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

[![REUSE status](https://api.reuse.software/badge/github.com/SAP-samples/llm-round-trip-correctness)](https://api.reuse.software/info/github.com/SAP-samples/llm-round-trip-correctness)

# Round-trip-correctness to evaluate BPMN generation

## Description
This repository tests the idea of a proxy evaluation method for text to BPMN model pipeline.
The proxy evaluation involves a round-trip pipeline, "text to bpmn to text" or "bpmn to text to bpmn" and calculating an average similarity metric we call RTC in the absence of a ground truth BPMN/text.
To show if the proxy method is effective, we first must investigate how the existing BPMN to BPMN evaluation from [model_evaluation](./model_evaluation) module correlates with the text to text similarity methods from [text_evaluation](./text_evaluation) by running the pipelines in this repository.
This work is inspired by [this](https://arxiv.org/abs/2402.08699) publication on text to code round-tripping.

## Requirements and set up

The requirements are in this [pyproject.toml](./pyproject.toml) file. After cloning the repository, run:

```shell
poetry install
```

## Getting started

To run the LLM specific pipeline, use a command similar to this:
```shell
screen -d -m python genai_gpt_pipeline.py --model-path ./data/pet/ground_json --text-path ./data/pet/process_descriptions --example pet --direction t2t
```
or run the LLM agnostic pipeline as:
```shell
screen -d -m python universal_pipeline.py --llm gemini --model-path ./data/pet/ground_json --text-path ./data/pet/process_descriptions --example pet --direction t2t
```
The csv files are written to the results directory. The jupyter notebooks are used to visualize the results.

## Known Issues
No known issue.

## How to obtain support
[Create an issue](https://github.com/SAP-samples/model-to-model-evaluation-code/issues) in this repository if you find a bug or have questions about the content.

## Contributing
If you wish to contribute code, offer fixes or improvements, please send a pull request. Due to legal reasons, contributors will be asked to accept a DCO when they create the first pull request to this project. This happens in an automated fashion during the submission process. SAP uses [the standard DCO text of the Linux Foundation](https://developercertificate.org/).

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/sap-samples/llm-round-trip-correctness

Awesome Lists containing this project

README