https://github.com/nicolay-r/llm-prompt-checking

Toolset for checking differences in recognising semantic relation presence by: (1) large language models 🤖 and (2) annotators / experts ✍️
https://github.com/nicolay-r/llm-prompt-checking

alignment benchmark chatgpt gpt llm nlp prompt reasoning relationextraction sentimentanalysis

Last synced: 3 months ago
JSON representation

Toolset for checking differences in recognising semantic relation presence by: (1) large language models 🤖 and (2) annotators / experts ✍️

Host: GitHub
URL: https://github.com/nicolay-r/llm-prompt-checking
Owner: nicolay-r
License: mit
Created: 2023-08-11T11:52:16.000Z (almost 2 years ago)
Default Branch: master
Last Pushed: 2024-10-01T16:51:41.000Z (9 months ago)
Last Synced: 2025-02-12T00:39:03.889Z (5 months ago)
Topics: alignment, benchmark, chatgpt, gpt, llm, nlp, prompt, reasoning, relationextraction, sentimentanalysis
Language: Python
Homepage: https://www.youtube.com/watch?v=zeKg1MAQrYA
Size: 2.59 MB
Stars: 0
Watchers: 2
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

        # LLM Reasoning Checking in Semantic Relation Extraction

![](https://img.shields.io/badge/Python-3.9-brightgreen.svg)

> **Update November 01 2024:** ⭐ Implemented a separated [bulk-chain](https://github.com/nicolay-r/bulk-chain) project for handling massive amount of prompts with CoT. This concept was used in this studies.

This project aimed at assessing advances in generative large language models in the domain of Information Retrieval and Relation **Extraction in particular**.

Please refer to the related findings and presentation (RU language) for greater details:

[![](https://markdown-videos-api.jorgenkh.no/youtube/zeKg1MAQrYA)](https://youtu.be/zeKg1MAQrYA)

**Presentation**: https://nicolay-r.github.io/website/data/report_llm2023-nerel.pdf

## Usage

This project shares three main scripts:

1. `eval.py` -- dedicated for evaluating accuracies of the binary classification of the manually annotated relations. The output accuaracy shows the alignment with the annotation provided by experts ✍️

2. `graph.py` -- script that adopts D3JS library for visualizing radial graphs that highlights: alignment 🟢 and misalignment 🔴.

3. `launch.py` -- launches the application of the LLM model towards the composed input prompts to perform **binary semantic relations classification** of knonw semantic relations (present / absent, i.e. agree or disagree with experts annotaiton)

> **Update 08/08/2024:** ⚠️ This project adopts `RevGPT` API, which is obsolete at present. Please use the other project such as [QuickCoT](https://github.com/nicolay-r/quick_cot) for inferring other LLM models useing the official API.

## Experiment 

We refer to the [NEREL collection](https://github.com/nerel-ds/NEREL) which is used as a sorce for the semantic relations.

We experiment with `OpenAI/ChatGPT-3.5-0613` model.

### Results

Results are shortly higlighted in [the related presentation](https://nicolay-r.github.io/website/data/report_llm2023-nerel.pdf)

Top 3 relation types are bolded.

High: 70-100% agreement

|Relation Type |Accuracy | Number of Relations |

|-|-|-|

|**CAUSE_OF_DEATH** |0.93|41|

|**DATE_OF_BIRTH** |0.92|114|

|**DATE_OF_DEATH** |0.86|87|

|PLACE_OF_DEATH |0.86|63|

|END_TIME |0.86|22|

|START_TIME |0.82|38|

|PLACE_OF_BIRTH |0.76|97|

|DATE_OF_CREATION |0.75|117|

|SCHOOLS_ATTENDED |0.74|84|

|PART_OF |0.73|45|

|MEMBER_OF |0.72|218|

|WORKS_AS |0.71|3053|

|LOCATED_IN |0.71|611|

Medium: 70-50% agreement

|Relation Type |Accuracy | Number of Relations |

|-|-|-|

|TAKES_PLACE_IN |0.7|1222|

|SUBEVENT_OF |0.68|212|

|DATE_FOUNDED_IN |0.68|44|

|DATE_DEFUNCT_IN |0.67|6|

|AWARDED_WITH |0.64|401|

|ORIGINS_FROM |0.61|956|

|RELIGION_OF |0.61|31|

|RELATIVE |0.57|30|

|MEDICAL_CONDITION |0.57|196|

|OWNER_OF |0.54|94|

|PENALIZED_AS |0.54|123|

|WORKPLACE |0.53|804|

|PARENT_OF |0.5|200|

Low: Less than 40% agreement

|Relation Type |Accuracy | Number of Relations |

|-|-|-|

|SIBLING |0.48|86|

|SPOUSE |0.47|119|

|PARTICIPANT_IN |0.42|2764|

|HAS_CAUSE |0.42|481|

|KNOWS |0.41|264|

|EXPENDITURE |0.41|29|

|ALTERNATIVE_NAME |0.4|942|

|FOUNDED_BY |0.37|86|

|ORGANIZES |0.3|123|

|CONVICTED_OF |0.22|286|

|SUBORDINATE_OF |0.15|105|

## References

The visualization has been taken from the side [ARElight](https://github.com/nicolay-r/ARElight) project

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/nicolay-r/llm-prompt-checking

Awesome Lists containing this project

README