{"id":29188035,"url":"https://github.com/viadee/process-document-coherence-checker","last_synced_at":"2025-07-01T22:08:56.756Z","repository":{"id":290901644,"uuid":"974864015","full_name":"viadee/process-document-coherence-checker","owner":"viadee","description":null,"archived":false,"fork":false,"pushed_at":"2025-07-01T10:06:45.000Z","size":4189,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":6,"default_branch":"main","last_synced_at":"2025-07-01T10:44:22.847Z","etag":null,"topics":["research","thesis"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"bsd-3-clause","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/viadee.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-04-29T12:23:59.000Z","updated_at":"2025-07-01T10:06:49.000Z","dependencies_parsed_at":"2025-06-18T13:37:16.450Z","dependency_job_id":"b8371ec1-ab63-40b3-9bdf-7d5931a2e29c","html_url":"https://github.com/viadee/process-document-coherence-checker","commit_stats":null,"previous_names":["viadee/process-document-coherence-checker"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/viadee/process-document-coherence-checker","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/viadee%2Fprocess-document-coherence-checker","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/viadee%2Fprocess-document-coherence-checker/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/viadee%2Fprocess-document-coherence-checker/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/viadee%2Fprocess-document-coherence-checker/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/viadee","download_url":"https://codeload.github.com/viadee/process-document-coherence-checker/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/viadee%2Fprocess-document-coherence-checker/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":263042353,"owners_count":23404459,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["research","thesis"],"created_at":"2025-07-01T22:08:54.995Z","updated_at":"2025-07-01T22:08:56.740Z","avatar_url":"https://github.com/viadee.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Digital Appendix: LLM-Enabled Business Process Coherence Checking\n\n## Description\n\nThis repository serves as the digital appendix for the paper:\n\n**Title:** 'LLM-Enabled Business Process Coherence Checking Based on Multi-Level Process Documentation'  \n**Authors:** Schulte, M.; Franzoi\\*, S.; Köhne, F.; vom Brocke, J.  \n**Submitted to:** _Process Science_ (Springer) - [Journal Link](https://link.springer.com/journal/44311)\n\nThe files provide supplementary information referenced in the paper.\n\n## Contents\n\nThis appendix contains the following documents:\n\n*   `Section A - Methodology - aProCheCk Appendix.pdf`\n*   `Section B - Proof of Concept - aProCheCk Appendix.pdf`\n*   `Section C - Interviews and Framework - aProCheCk Appendix.pdf`\n*   `Section D - Development - aProCheCk Appendix.pdf`\n*   `Section E - Dataset and Focus Group - aProCheCk Appendix.pdf`\n*   `Section F - Notification Examples - aProCheCk Appendix.pdf`\n\n## Related Dataset\n\nThe dataset used for the empirical validation in the paper can be found here:\n\n[Business Process Coherence Checking Dataset](https://github.com/viadee/Thesis-Business-Process-Coherence-Checking)\n\n## Citation\n\nIf you use this appendix or the concepts from the paper, please cite the main publication.\n\n_Preliminary Citation:_\n\nSchulte, M., Franzoi, S., Köhne, F., \u0026 vom Brocke, J. (2025). LLM-Enabled Business Process Coherence Checking Based on Multi-Level Process Documentation. _Submitted to Process Science_.\n\n# Software Prototype\n\nThis project leverages advanced AI models to check for incoherencies in business process documentation. The project directory is structured as follows:\n\nLink to open-source dataset: https://github.com/viadee/Thesis-Business-Process-Coherence-Checking\n\n## Project Structure\n\n```\nProjectRoot/\n│\n├── main.py\n│   └── (Entry point for running experiments or single process checks)\n│\n├── experiment_runner.py\n│   └── run_experiment_for_directory(directory, num_runs)\n│       └── log_results(directory, results)\n│       └── (Calls functions in coherence_checker.py, utils.py, and result_comparison.py)\n│\n├── single_process_checker.py\n│   └── run_single_process_check(directory)\n│       └── (Handles single coherence check for files within a directory and generates a management summary)\n│\n├── coherence_checker.py\n│   └── run_coherence_check(directory)\n│       └── (Processes BPMN and text files with preprocessing \u0026 utils, then calls Azure OpenAI via gpt_interaction.py)\n│\n├── gpt_interaction.py\n│   └── chat_completion(task)\n│       └── calculate_api_costs(model, input_tokens, output_tokens)\n│       └── llm_coherence_check(filename1, content1, filename2, content2, txt_filename, txt_content)\n│       └── (Manages Azure OpenAI API calls, processes tasks and responses)\n│\n├── bpmn_preprocessing.py\n│   └── BPMNPreprocessing\n│       └── preprocess_files(file_paths)\n│       └── remove_visual_elements(root)\n│       └── save_modified_file(tree, filename)\n│\n├── result_comparison.py\n│   └── compare_results_with_config(config, result)\n│       └── (Compares AI results to expected config)\n│\n├── utils.py\n│   └── Utility functions:\n│       └── parse_xml_to_string(file_path)\n│       └── find_recent_files_in_directory(directory, extension, file_count)\n│       └── extract_outermost_json(input_str)\n│       └── calculate_api_costs(model, input_tokens, output_tokens)\n│       └── convert_to_markdown(text)\n│\n├── llm_prompt_generator.py\n│   └── Prompt generators for AI tasks:\n│       └── task_bpmn_comparison_generator(filename1, content1, filename2, content2)\n│       └── task_txt_coherence_generator(filename1, content1, txt_filename, txt_content, stripped_bpmn_comparison_data)\n│       └── management_summary_generator(filename1, content1, txt_filename, txt_content, txt_coherence_result)\n│\n├── .env\n│   └── (Environment variables like AZURE_API_KEY, AZURE_API_URL)\n│\n├── Data/\n│   └── (Contains folders with BPMN and text files for experiments)\n```\n\n## How to Run\n\n**Setup Environment Variables:**\n\n*   Create a `.env` file in the project root.\n*   Add the necessary environment variables, including `AZURE_API_KEY` and `AZURE_API_URL`.\n\n**Run the Program:**\n\n*   Execute `main.py` to run experiments on all eligible subfolders within the `Data/` directory or a single process check if there are no eligible subfolders.\n\n**Experiment Run:**\n\n*   If there are subfolders in the `Data/` directory (excluding those named \"modified\"), the program will run experiments.\n*   Each subfolder should contain:\n    *   At least two BPMN files.\n    *   At least one text file.\n    *   `solution_config.json` specifying the expected changes.\n*   You can set the number of runs per folder by modifying the `num_runs` variable in `main.py`:\n\n**Single Process Check:**\n\n*   If there are no eligible subfolders (excluding \"modified\") in the `Data/` directory, the program will run a single process check.\n*   The `Data` directory should contain:\n    *   At least two BPMN files.\n    *   At least one text file.\n*   The management summary and total API costs will be printed to the console.\n\n**Review Results:**\n\n*   For experiments: Results are logged in each folder within the `Data/` directory, under an `ExperimentLog` subdirectory. The logs include CSV files with detailed accuracy and cost metrics.\n*   For a single process check: The management summary and total API costs are printed to the console.\n\nThis setup ensures a clear, modular approach to checking coherence in business process documentation, leveraging pre-built utilities and advanced AI capabilities.\n\n## File Descriptions\n\n**main.py**\n\nEntry point for running multiple experiments or a single process check. It iterates over each folder in the `Data` directory, executing the coherence-checking experiments specified in each folder's `solution_config.json`. If no eligible subfolders are found, it triggers a single process check for the files directly in the `Data` directory.\n\n**experiment\\_runner.py**\n\nContains functions to run coherence checking experiments and log results.\n\n*   `run_experiment_for_directory(directory, num_runs)`: Runs the coherence checking experiment for a specific directory.\n*   `log_results(directory, results)`: Logs the results of the experiments into CSV and JSON files and calculates the summary.\n\n**single\\_process\\_checker.py**\n\nHandles a single coherence check for files directly within a directory and generates a management summary.\n\n*   `run_single_process_check(directory)`: Orchestrates the coherence checking process for a single set of files, including preprocessing, AI interaction, and generating a markdown summary.\n\n**coherence\\_checker.py**\n\nThe core module that processes BPMN and text files, performs preprocessing, and calls the Azure OpenAI model via `gpt_interaction.py`.\n\n*   `run_coherence_check(directory)`: Orchestrates the coherence checking process, including preprocessing and interaction with the AI model.\n\n**gpt\\_interaction.py**\n\nHandles interactions with the Azure OpenAI API. It includes detailed functions to calculate API costs and manage different tasks.\n\n*   `chat_completion(task)`: Manages chat completions with the AI model, handling retries and exceptions.\n*   `llm_coherence_check(...)`: Conducts coherence checking for BPMN and text files, using AI for verification and producing management summaries.\n\n**bpmn\\_preprocessing.py**\n\nContains functionality to preprocess BPMN files by removing visual elements and saving the modified files.\n\n*   `BPMNPreprocessing`: A class handling BPMN file preprocessing.\n    *   `preprocess_files(file_paths)`: Preprocesses BPMN files.\n    *   `remove_visual_elements(root)`: Removes visual elements from BPMN XML.\n    *   `save_modified_file(tree, filename)`: Saves the modified XML tree to a new file.\n\n**result\\_comparison.py**\n\nCompares results from the AI model with the expected changes defined in the configuration files.\n\n*   `compare_results_with_config(config, result)`: Compares AI results with expected configurations and calculates accuracy and coherence indicators.\n\n**utils.py**\n\nUtility functions for various operations such as reading files, parsing XML, extracting JSON, and calculating API costs.\n\n*   `parse_xml_to_string(file_path)`: Parses an XML file to a string representation.\n*   `find_recent_files_in_directory(directory, extension, file_count)`: Finds the most recent files in a directory.\n*   `extract_outermost_json(input_str)`: Extracts the outermost JSON object from a string.\n*   `calculate_api_costs(model, input_tokens, output_tokens)`: Calculates the costs of API usage.\n*   `convert_to_markdown(text)`: Converts plain text to markdown format.\n\n**llm\\_prompt\\_generator.py**\n\nStores all prompt templates used for generating tasks for the AI model.\n\n*   `task_bpmn_comparison_generator(filename1, content1, filename2, content2)`: Generates a comparison task for BPMN files.\n*   `task_txt_coherence_generator(filename1, content1, txt_filename, txt_content, stripped_bpmn_comparison_data)`: Generates a coherence check task for BPMN and text files.\n*   `management_summary_generator(filename1, content1, txt_filename, txt_content, txt_coherence_result)`: Generates a management summary for the detected incoherency.\n\n**.env**\n\nContains environment variables required for the project, such as `AZURE_API_KEY` and `AZURE_API_URL`.\n\n**Data/**\n\nDirectory containing folders with BPMN and text files for running experiments. Each folder should have a `solution_config.json` file specifying the expected changes.\n\n```python\nnum_runs = 5  # Adjust the number of runs as needed\n```","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fviadee%2Fprocess-document-coherence-checker","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fviadee%2Fprocess-document-coherence-checker","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fviadee%2Fprocess-document-coherence-checker/lists"}