Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/drewthomasson/auto-scalable-speaker-attribution-dataset
This is my attempt at implementing a method for scaling speaker attribution datasets for literature by using GPT
https://github.com/drewthomasson/auto-scalable-speaker-attribution-dataset
Last synced: about 9 hours ago
JSON representation
This is my attempt at implementing a method for scaling speaker attribution datasets for literature by using GPT
- Host: GitHub
- URL: https://github.com/drewthomasson/auto-scalable-speaker-attribution-dataset
- Owner: DrewThomasson
- Created: 2023-12-04T19:24:29.000Z (11 months ago)
- Default Branch: main
- Last Pushed: 2024-11-04T21:16:32.000Z (4 days ago)
- Last Synced: 2024-11-04T22:19:03.981Z (4 days ago)
- Language: Python
- Size: 83.4 MB
- Stars: 2
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
\# Auto-Scalable Speaker Attribution Dataset
This is my attempt at implementing a method for scaling speaker attribution datasets for literature using GPT or any other capable Large Language Model (LLM).
The hope for this is to use this methology to create a speaker attribution dataset for nearly all books found in project gutenburg
## How to run
- python run_gui.py## manual_results_checker.py
-once you click the "save changes" button itll show you the accuracy rating in a popup like so:
- This python script will give you a easy to use gui to manually check the output results of the speaker attribution via LLM
- give it the refrence txt file and then give it the quotes.csv file that was generated after running the speaker_find_attribute.py## Test Results
For the first test, I used a snippet from "Guardians of Ga'Hoole" in the `ebooks` folder. The results are impressive:
- GPT-4 achieved a remarkable 98.33% accuracy rate for speaker attributio for the first run.
- On the second run of the same piece of text GPT achieved 96.67% accuracy rating.
- given this it appears to still have a arguable very high accuracy rate these could be imporived through improved prompting or increasing the context length given to the LLM
- the second reults can be found under the file "quotes_updated.csv" in the ebooks folder in this repo## Speaker Attribution Results
I meticulously reviewed each speaker attribution assigned by GPT-4 and categorized the results for all 60 quotes found in the snippet from "Guardians of Ga'Hoole" as follows:
- **True:** Correct attribution
- **False:** Incorrect attribution
- **True/Incorrect Quotation:** The quote may not have been said by a character, but if assigned, this would be the correct answer.### Attribution Breakdown
- True
- True/Incorrect Quotation
- True
- True
- True
- True
- True
- True
- True
- True
- True
- True
- True
- True
- True
- True
- True
- True
- True
- True
- True
- True
- True
- True
- True
- True
- True
- True
- True
- True
- True
- False
- True/Incorrect Quotation
- True
- True
- True
- True
- True
- True
- True
- True
- True
- True
- True
- True
- True
- True
- True
- True
- True
- True
- True
- True
- True
- True
- True
- True