https://github.com/simonepri/wn16s

WordNet dataset with semantic relations only
https://github.com/simonepri/wn16s

dataset semantic synsets wn wn16s wordnet

Last synced: 7 months ago
JSON representation

WordNet dataset with semantic relations only

Host: GitHub
URL: https://github.com/simonepri/wn16s
Owner: simonepri
License: mit
Created: 2019-05-29T12:45:47.000Z (over 6 years ago)
Default Branch: master
Last Pushed: 2019-10-31T15:34:23.000Z (almost 6 years ago)
Last Synced: 2025-02-04T18:23:17.976Z (8 months ago)
Topics: dataset, semantic, synsets, wn, wn16s, wordnet
Language: Python
Size: 10.2 MB
Stars: 0
Watchers: 3
Forks: 0
Open Issues: 1
Metadata Files:
- Readme: readme.md
- License: license

Awesome Lists containing this project

README

          


  WN16S





  

  

  

  

  








  WordNet dataset with semantic relations only



## Motivation

In [WordNet][wn] two kinds of relations are recognized: lexical and semantic. Lexical relations hold between word forms (lemmas); semantic relations hold between word meanings (synsets).

I wanted to have a dataset with the lexical relations filtered out to build synset embeddings based only on the semantic relations of the WN graph.

## Structure

In the [dataset folder][dataset], you can find many `tsv` and `txt` files the meaning of which is explained hereafter.

| file name | purpose | notes |

| --------- | ------- | ----- |

| `count_synsets.txt` | File that contains the number of synsets. | |

| `count_relations.txt`   | Files that contain the number of relations. | |

| `count_edges_all.txt` | File that contains the number of total edges. | |

| `count_edges_*.tsv`   | Files that contain the number of edges of type *. | |

| `synset_name_to_id.tsv`   | File that maps each synset's name to a numeric id starting from 0. | The file is sorted on the first column. |

| `synset_id_to_name.tsv`   | File that maps each synset id to a synset's name. | The file is sorted on the first column. |

| `relation_name_to_id.tsv` | File that maps each relation to a numeric id starting from 0. | The file is sorted on the first column. |

| `relation_id_to_name.tsv` | File that maps each relation id to a relation's name. | The file is sorted on the first column. |

| `edges_as_id_all.tsv` | File that contains all the edges of the WordNet's semantic subgraph as triples of ids (id synset 1, id relation, id synset 2). | The file is sorted on the second column. |

| `edges_as_id_*.tsv`   | Files that contain only the edges of type *. | The file is sorted on the second column. |

| `edges_as_name_all.tsv` | File that contains all the edges of the WordNet's semantic subgraph as triples of names (name synset 1, name relation, name synset 2). | The file is sorted on the second column. |

| `edges_as_name_*.tsv`   | Files that contain only the edges of type *. | The file is sorted on the second column. |

## Download

A compressed version of the dataset can be downloaded from the [release page][releases] or by clicking [here][download].

## Source

The dataset is generated using [nltk][nltk] and is a subset of the [WordNet][wn] dataset.

## License

All source code of this project is licensed under the MIT License - see the [license][license] file for details.

[dataset]: https://github.com/simonepri/WN16S/tree/master/dataset

[releases]: https://github.com/simonepri/WN16S/releases/latest

[download]: https://github.com/simonepri/WN16S/releases/latest/download/WN16S.tgz

[license]: https://github.com/simonepri/WN16S/tree/master/license

[wn]: https://wordnet.princeton.edu

[nltk]: https://github.com/nltk/nltk

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/simonepri/wn16s

Awesome Lists containing this project

README

WN16S