https://github.com/inzva/Turkish-GloVe

Türkçe GloVe - Repository for Turkish GloVe Word Embeddings
https://github.com/inzva/Turkish-GloVe

turkish-nlp

Last synced: 3 months ago
JSON representation

Türkçe GloVe - Repository for Turkish GloVe Word Embeddings

Host: GitHub
URL: https://github.com/inzva/Turkish-GloVe
Owner: inzva
License: mit
Created: 2020-12-27T12:58:25.000Z (over 4 years ago)
Default Branch: main
Last Pushed: 2023-04-12T10:49:46.000Z (about 2 years ago)
Last Synced: 2024-10-11T12:46:05.826Z (8 months ago)
Topics: turkish-nlp
Language: Jupyter Notebook
Homepage:
Size: 227 KB
Stars: 66
Watchers: 17
Forks: 6
Open Issues: 3
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

turkish-nlp-resources - TurkishGloVe

README

        # TurkishGloVe

Türkçe GloVe - Repository for Turkish GloVe Word Embeddings

## Training

We used official GloVe repository both to create word embeddings and evaluation.

GloVe Github Repository

## Download pre-trained word vectors

1. 570K Vocab, cased, 300d vectors, 1.6 GB Text, 2.6 GB Binary link

2. 253K Vocab, uncased, 300d vectors, 720 MB Text 1.2 GB Binary link:

## Corpus

Corpus collected from January-December 2018 Commoncrawl.

This corpus has 2,736B tokens.

Corpus size: 5.4GB

Corpus Link \

Paper Link

## Intrinsic Evaluation

This benchmark dataset is used for intrinsic evaluation on analogy task.

We used synonyms, capitals, and antonyms for analogy task. 

Benchmark Dataset Link

### Results

| Semantic Evaluation |   Antonyms Analogy Task   | Capitals Analogy Task | Synonyms Analogy Task |   Total Accuracy   |

|:-------------------:|:-------------------------:|:---------------------:|:---------------------:|:------------------:|

|  GloVe Uncased      |          21.70            |        47.74          |		     19.48          |        27.88       |

## Extrinsic Evaluation

This dataset is used for extrinsic evaluation on text categorization.

The dataset has 7 different classes. 

### Accuracy

|                  |    SVC    | Logistic Regression | 

|:----------------:|:---------:|:-------------------:|

|  GloVe Cased     |  0.89306  |	  0.89959 	       |  

|  GloVe Uncased   |  0.89956  |		0.90530          |

### Precision

|                 |    SVC    | Logistic Regression | 

|:---------------:|:---------:|:-------------------:|  

|  GloVe Cased    |  0.89388  |		  0.89864  	      |  

|  GloVe Uncased  |  0.90015  |			0.90619		      |

### Recall

|                 |    SVC    | Logistic Regression | 

|:---------------:|:---------:|:-------------------:| 

|  GloVe Cased    |  0.89306  |		  0.89796 	      |  

|  GloVe Uncased  |  0.89959  |			0.90531	        |

We used the given machine learning techniques with default hyperparameters in scikit-learn.

Text Categorization Dataset Link

## Examples

```

model.most_similar(positive=['fransa', 'berlin'], negative=['almanya'])

```

![city](/image/city.png)

```

model.most_similar(positive=['geliyor', 'gitmek'], negative=['gelmek'])

```

![verb](/image/verb.png)

```

model.most_similar("kedi")

```

![animal](/image/animal.png)

## References

https://cs224d.stanford.edu/lecture_notes/notes2.pdf \

https://nlp.stanford.edu/pubs/glove.pdf

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/inzva/Turkish-GloVe

Awesome Lists containing this project

README