An open API service indexing awesome lists of open source software.

https://github.com/simonepri/wn16s

WordNet dataset with semantic relations only
https://github.com/simonepri/wn16s

dataset semantic synsets wn wn16s wordnet

Last synced: 7 months ago
JSON representation

WordNet dataset with semantic relations only

Awesome Lists containing this project

README

          


WN16S



github downloads
dataset download
dataset format
dataset source
software license





WordNet dataset with semantic relations only

## Motivation
In [WordNet][wn] two kinds of relations are recognized: lexical and semantic. Lexical relations hold between word forms (lemmas); semantic relations hold between word meanings (synsets).

I wanted to have a dataset with the lexical relations filtered out to build synset embeddings based only on the semantic relations of the WN graph.

## Structure
In the [dataset folder][dataset], you can find many `tsv` and `txt` files the meaning of which is explained hereafter.

| file name | purpose | notes |
| --------- | ------- | ----- |
| `count_synsets.txt` | File that contains the number of synsets. | |
| `count_relations.txt` | Files that contain the number of relations. | |
| `count_edges_all.txt` | File that contains the number of total edges. | |
| `count_edges_*.tsv` | Files that contain the number of edges of type *. | |
| `synset_name_to_id.tsv` | File that maps each synset's name to a numeric id starting from 0. | The file is sorted on the first column. |
| `synset_id_to_name.tsv` | File that maps each synset id to a synset's name. | The file is sorted on the first column. |
| `relation_name_to_id.tsv` | File that maps each relation to a numeric id starting from 0. | The file is sorted on the first column. |
| `relation_id_to_name.tsv` | File that maps each relation id to a relation's name. | The file is sorted on the first column. |
| `edges_as_id_all.tsv` | File that contains all the edges of the WordNet's semantic subgraph as triples of ids (id synset 1, id relation, id synset 2). | The file is sorted on the second column. |
| `edges_as_id_*.tsv` | Files that contain only the edges of type *. | The file is sorted on the second column. |
| `edges_as_name_all.tsv` | File that contains all the edges of the WordNet's semantic subgraph as triples of names (name synset 1, name relation, name synset 2). | The file is sorted on the second column. |
| `edges_as_name_*.tsv` | Files that contain only the edges of type *. | The file is sorted on the second column. |

## Download
A compressed version of the dataset can be downloaded from the [release page][releases] or by clicking [here][download].

## Source
The dataset is generated using [nltk][nltk] and is a subset of the [WordNet][wn] dataset.

## License
All source code of this project is licensed under the MIT License - see the [license][license] file for details.

[dataset]: https://github.com/simonepri/WN16S/tree/master/dataset
[releases]: https://github.com/simonepri/WN16S/releases/latest
[download]: https://github.com/simonepri/WN16S/releases/latest/download/WN16S.tgz
[license]: https://github.com/simonepri/WN16S/tree/master/license

[wn]: https://wordnet.princeton.edu
[nltk]: https://github.com/nltk/nltk