Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/drewthomasson/auto-scalable-speaker-attribution-dataset

This is my attempt at implementing a method for scaling speaker attribution datasets for literature by using GPT
https://github.com/drewthomasson/auto-scalable-speaker-attribution-dataset

Last synced: about 9 hours ago
JSON representation

This is my attempt at implementing a method for scaling speaker attribution datasets for literature by using GPT

Host: GitHub
URL: https://github.com/drewthomasson/auto-scalable-speaker-attribution-dataset
Owner: DrewThomasson
Created: 2023-12-04T19:24:29.000Z (11 months ago)
Default Branch: main
Last Pushed: 2024-11-04T21:16:32.000Z (4 days ago)
Last Synced: 2024-11-04T22:19:03.981Z (4 days ago)
Language: Python
Size: 83.4 MB
Stars: 2
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

\# Auto-Scalable Speaker Attribution Dataset

This is my attempt at implementing a method for scaling speaker attribution datasets for literature using GPT or any other capable Large Language Model (LLM).

The hope for this is to use this methology to create a speaker attribution dataset for nearly all books found in project gutenburg

## How to run
- python run_gui.py

Screenshot 2023-12-04 at 5 06 55 PM

Screenshot 2023-12-04 at 5 07 36 PM

## manual_results_checker.py
Screenshot 2023-12-04 at 5 03 13 PM

-once you click the "save changes" button itll show you the accuracy rating in a popup like so:
Screenshot 2023-12-04 at 5 40 00 PM

- This python script will give you a easy to use gui to manually check the output results of the speaker attribution via LLM
- give it the refrence txt file and then give it the quotes.csv file that was generated after running the speaker_find_attribute.py

## Test Results

For the first test, I used a snippet from "Guardians of Ga'Hoole" in the `ebooks` folder. The results are impressive:

- GPT-4 achieved a remarkable 98.33% accuracy rate for speaker attributio for the first run.
- On the second run of the same piece of text GPT achieved 96.67% accuracy rating.
- given this it appears to still have a arguable very high accuracy rate these could be imporived through improved prompting or increasing the context length given to the LLM
- the second reults can be found under the file "quotes_updated.csv" in the ebooks folder in this repo

## Speaker Attribution Results

I meticulously reviewed each speaker attribution assigned by GPT-4 and categorized the results for all 60 quotes found in the snippet from "Guardians of Ga'Hoole" as follows:

- **True:** Correct attribution
- **False:** Incorrect attribution
- **True/Incorrect Quotation:** The quote may not have been said by a character, but if assigned, this would be the correct answer.

### Attribution Breakdown

- True
- True/Incorrect Quotation
- True
- True
- True
- True
- True
- True
- True
- True
- True
- True
- True
- True
- True
- True
- True
- True
- True
- True
- True
- True
- True
- True
- True
- True
- True
- True
- True
- True
- True
- False
- True/Incorrect Quotation
- True
- True
- True
- True
- True
- True
- True
- True
- True
- True
- True
- True
- True
- True
- True
- True
- True
- True
- True
- True
- True
- True
- True
- True