https://github.com/animesh-chourey/character-classification-distributional-semantics

Vector Space Semantics for Similarity between Eastenders Characters
https://github.com/animesh-chourey/character-classification-distributional-semantics

n-grams nlp pos-tagging tf-idf transformer

Last synced: 3 months ago
JSON representation

Vector Space Semantics for Similarity between Eastenders Characters

Host: GitHub
URL: https://github.com/animesh-chourey/character-classification-distributional-semantics
Owner: Animesh-Chourey
Created: 2022-08-31T13:23:47.000Z (over 2 years ago)
Default Branch: master
Last Pushed: 2022-08-31T13:28:49.000Z (over 2 years ago)
Last Synced: 2025-01-11T19:45:38.299Z (4 months ago)
Topics: n-grams, nlp, pos-tagging, tf-idf, transformer
Language: Jupyter Notebook
Homepage:
Size: 4.64 MB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# Vector Space Semantics for Similarity between Eastenders Characters

A vector representation is created of a document (Eastenders script data). Then that representation is improved in such a way that each character vector is maximially distinguished from the other character documents. This distinction is measured by how well a simple information retrieval classification method can select documents from validation and test data as belonging to the correct class of document (i.e. deciding which character spoke the lines by measuring the similarity of those document vectors to those built in training).

The following tasks have been performed here:
* Pre-processing is preformed by converting the tokens into lowercase. Then, lemmatizing and stemming the tokens consecutively. Finally, the stopwords have been removed.
* Feature extraction have been used by extracting n-grams of different lengths and including their POS-tags.
* Added dialogue context data and features so that the data incorporates the context of the line spoken by the characters in terms of the lines spoken by other characters in the same scene (immediately before and after).
* Matrix transformation technique has been used, here TF-IDF tranformer.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/animesh-chourey/character-classification-distributional-semantics

Awesome Lists containing this project

README