https://github.com/viadee/process-document-coherence-checker

research thesis

Last synced: 12 months ago
JSON representation

Host: GitHub
URL: https://github.com/viadee/process-document-coherence-checker
Owner: viadee
License: bsd-3-clause
Created: 2025-04-29T12:23:59.000Z (about 1 year ago)
Default Branch: main
Last Pushed: 2025-07-01T10:06:45.000Z (12 months ago)
Last Synced: 2025-07-01T10:44:22.847Z (12 months ago)
Topics: research, thesis
Language: Python
Homepage:
Size: 3.99 MB
Stars: 1
Watchers: 6
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

# Digital Appendix: LLM-Enabled Business Process Coherence Checking

## Description

This repository serves as the digital appendix for the paper:

**Title:** 'LLM-Enabled Business Process Coherence Checking Based on Multi-Level Process Documentation'
**Authors:** Schulte, M.; Franzoi\*, S.; Köhne, F.; vom Brocke, J.
**Submitted to:** _Process Science_ (Springer) - [Journal Link](https://link.springer.com/journal/44311)

The files provide supplementary information referenced in the paper.

## Contents

This appendix contains the following documents:

* `Section A - Methodology - aProCheCk Appendix.pdf`
* `Section B - Proof of Concept - aProCheCk Appendix.pdf`
* `Section C - Interviews and Framework - aProCheCk Appendix.pdf`
* `Section D - Development - aProCheCk Appendix.pdf`
* `Section E - Dataset and Focus Group - aProCheCk Appendix.pdf`
* `Section F - Notification Examples - aProCheCk Appendix.pdf`

## Related Dataset

The dataset used for the empirical validation in the paper can be found here:

[Business Process Coherence Checking Dataset](https://github.com/viadee/Thesis-Business-Process-Coherence-Checking)

## Citation

If you use this appendix or the concepts from the paper, please cite the main publication.

_Preliminary Citation:_

Schulte, M., Franzoi, S., Köhne, F., & vom Brocke, J. (2025). LLM-Enabled Business Process Coherence Checking Based on Multi-Level Process Documentation. _Submitted to Process Science_.

# Software Prototype

This project leverages advanced AI models to check for incoherencies in business process documentation. The project directory is structured as follows:

Link to open-source dataset: https://github.com/viadee/Thesis-Business-Process-Coherence-Checking

## Project Structure

```
ProjectRoot/
│
├── main.py
│ └── (Entry point for running experiments or single process checks)
│
├── experiment_runner.py
│ └── run_experiment_for_directory(directory, num_runs)
│ └── log_results(directory, results)
│ └── (Calls functions in coherence_checker.py, utils.py, and result_comparison.py)
│
├── single_process_checker.py
│ └── run_single_process_check(directory)
│ └── (Handles single coherence check for files within a directory and generates a management summary)
│
├── coherence_checker.py
│ └── run_coherence_check(directory)
│ └── (Processes BPMN and text files with preprocessing & utils, then calls Azure OpenAI via gpt_interaction.py)
│
├── gpt_interaction.py
│ └── chat_completion(task)
│ └── calculate_api_costs(model, input_tokens, output_tokens)
│ └── llm_coherence_check(filename1, content1, filename2, content2, txt_filename, txt_content)
│ └── (Manages Azure OpenAI API calls, processes tasks and responses)
│
├── bpmn_preprocessing.py
│ └── BPMNPreprocessing
│ └── preprocess_files(file_paths)
│ └── remove_visual_elements(root)
│ └── save_modified_file(tree, filename)
│
├── result_comparison.py
│ └── compare_results_with_config(config, result)
│ └── (Compares AI results to expected config)
│
├── utils.py
│ └── Utility functions:
│ └── parse_xml_to_string(file_path)
│ └── find_recent_files_in_directory(directory, extension, file_count)
│ └── extract_outermost_json(input_str)
│ └── calculate_api_costs(model, input_tokens, output_tokens)
│ └── convert_to_markdown(text)
│
├── llm_prompt_generator.py
│ └── Prompt generators for AI tasks:
│ └── task_bpmn_comparison_generator(filename1, content1, filename2, content2)
│ └── task_txt_coherence_generator(filename1, content1, txt_filename, txt_content, stripped_bpmn_comparison_data)
│ └── management_summary_generator(filename1, content1, txt_filename, txt_content, txt_coherence_result)
│
├── .env
│ └── (Environment variables like AZURE_API_KEY, AZURE_API_URL)
│
├── Data/
│ └── (Contains folders with BPMN and text files for experiments)
```

## How to Run

**Setup Environment Variables:**

* Create a `.env` file in the project root.
* Add the necessary environment variables, including `AZURE_API_KEY` and `AZURE_API_URL`.

**Run the Program:**

* Execute `main.py` to run experiments on all eligible subfolders within the `Data/` directory or a single process check if there are no eligible subfolders.

**Experiment Run:**

* If there are subfolders in the `Data/` directory (excluding those named "modified"), the program will run experiments.
* Each subfolder should contain:
* At least two BPMN files.
* At least one text file.
* `solution_config.json` specifying the expected changes.
* You can set the number of runs per folder by modifying the `num_runs` variable in `main.py`:

**Single Process Check:**

* If there are no eligible subfolders (excluding "modified") in the `Data/` directory, the program will run a single process check.
* The `Data` directory should contain:
* At least two BPMN files.
* At least one text file.
* The management summary and total API costs will be printed to the console.

**Review Results:**

* For experiments: Results are logged in each folder within the `Data/` directory, under an `ExperimentLog` subdirectory. The logs include CSV files with detailed accuracy and cost metrics.
* For a single process check: The management summary and total API costs are printed to the console.

This setup ensures a clear, modular approach to checking coherence in business process documentation, leveraging pre-built utilities and advanced AI capabilities.

## File Descriptions

**main.py**

Entry point for running multiple experiments or a single process check. It iterates over each folder in the `Data` directory, executing the coherence-checking experiments specified in each folder's `solution_config.json`. If no eligible subfolders are found, it triggers a single process check for the files directly in the `Data` directory.

**experiment\_runner.py**

Contains functions to run coherence checking experiments and log results.

* `run_experiment_for_directory(directory, num_runs)`: Runs the coherence checking experiment for a specific directory.
* `log_results(directory, results)`: Logs the results of the experiments into CSV and JSON files and calculates the summary.

**single\_process\_checker.py**

Handles a single coherence check for files directly within a directory and generates a management summary.

* `run_single_process_check(directory)`: Orchestrates the coherence checking process for a single set of files, including preprocessing, AI interaction, and generating a markdown summary.

**coherence\_checker.py**

The core module that processes BPMN and text files, performs preprocessing, and calls the Azure OpenAI model via `gpt_interaction.py`.

* `run_coherence_check(directory)`: Orchestrates the coherence checking process, including preprocessing and interaction with the AI model.

**gpt\_interaction.py**

Handles interactions with the Azure OpenAI API. It includes detailed functions to calculate API costs and manage different tasks.

* `chat_completion(task)`: Manages chat completions with the AI model, handling retries and exceptions.
* `llm_coherence_check(...)`: Conducts coherence checking for BPMN and text files, using AI for verification and producing management summaries.

**bpmn\_preprocessing.py**

Contains functionality to preprocess BPMN files by removing visual elements and saving the modified files.

* `BPMNPreprocessing`: A class handling BPMN file preprocessing.
* `preprocess_files(file_paths)`: Preprocesses BPMN files.
* `remove_visual_elements(root)`: Removes visual elements from BPMN XML.
* `save_modified_file(tree, filename)`: Saves the modified XML tree to a new file.

**result\_comparison.py**

Compares results from the AI model with the expected changes defined in the configuration files.

* `compare_results_with_config(config, result)`: Compares AI results with expected configurations and calculates accuracy and coherence indicators.

**utils.py**

Utility functions for various operations such as reading files, parsing XML, extracting JSON, and calculating API costs.

* `parse_xml_to_string(file_path)`: Parses an XML file to a string representation.
* `find_recent_files_in_directory(directory, extension, file_count)`: Finds the most recent files in a directory.
* `extract_outermost_json(input_str)`: Extracts the outermost JSON object from a string.
* `calculate_api_costs(model, input_tokens, output_tokens)`: Calculates the costs of API usage.
* `convert_to_markdown(text)`: Converts plain text to markdown format.

**llm\_prompt\_generator.py**

Stores all prompt templates used for generating tasks for the AI model.

* `task_bpmn_comparison_generator(filename1, content1, filename2, content2)`: Generates a comparison task for BPMN files.
* `task_txt_coherence_generator(filename1, content1, txt_filename, txt_content, stripped_bpmn_comparison_data)`: Generates a coherence check task for BPMN and text files.
* `management_summary_generator(filename1, content1, txt_filename, txt_content, txt_coherence_result)`: Generates a management summary for the detected incoherency.

**.env**

Contains environment variables required for the project, such as `AZURE_API_KEY` and `AZURE_API_URL`.

**Data/**

Directory containing folders with BPMN and text files for running experiments. Each folder should have a `solution_config.json` file specifying the expected changes.

```python
num_runs = 5 # Adjust the number of runs as needed
```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/viadee/process-document-coherence-checker

Awesome Lists containing this project

README