https://github.com/izuna385/attucker

Toy Experiments @ Nov, 2019. https://speakerdeck.com/izuna385/entity-representation-with-relational-attention
https://github.com/izuna385/attucker

Last synced: 8 months ago
JSON representation

Toy Experiments @ Nov, 2019. https://speakerdeck.com/izuna385/entity-representation-with-relational-attention

Host: GitHub
URL: https://github.com/izuna385/attucker
Owner: izuna385
Created: 2020-05-10T06:19:01.000Z (over 5 years ago)
Default Branch: master
Last Pushed: 2020-05-24T06:19:32.000Z (over 5 years ago)
Last Synced: 2025-01-01T17:12:16.501Z (10 months ago)
Language: Python
Homepage:
Size: 28.3 KB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# AtTuckER

Knowledge Graph Emedding with relation-definition and relation-entity attention.

## Preprocess:
* Check in `./data/`, whether specific dataset exist.

### Preprocessing FB**
```
data--FB15k
--FB15k237---train.txt
---valid.txt
---text.txt
---entity2multisentdesc.txt
---entity2type.txt
---entity_symbol2type.txt
```

### NOTE: when preprocessing, you first need

```
train.txt : head_symbol \t relation \t tail+symbol in each one col.
valid.txt : head_symbol \t relation \t tail+symbol in each one col.
test.txt : head_symbol \t relation \t tail+symbol in each one col.
entity_symbol2cano.txt : entity_symbol \t entity canonical name in each one col.
entity_symbol2type.txt : entity_symbol \t type1 \t type2 \t type3 ... in each one col.
entity_symbol2multisentdesc.txt : entity_symbol \t multisentence description in each one col.
```
Preprocess run command example:
`python3 preprocess_ent2type_desc_cano_reladj.py -entity_symbol2cano_type_desc_alreadydumped False -reverse_rel_data_alreadydumped False -KBdataset FB15k-237 -spacy_model_str en_core_web_lg -multiprocess True`

### Preprocessing WN**

* First, download WordNet-3.0.tar.gz to `./misc_data/` and do `tar -xzvf *`
* run `python3 preprocess_wordnet.py -KBdataset WN18` (and `python3 preprocess_wordnet.py -KBdataset WN18RR` )

* If train/dev/test/ exists in `./data/WN**/`, you can use this preprocessor to another WN** datasets.

* run `python3 preprocess_ent2type_desc_cano_reladj.py -entity_symbol2cano_type_desc_alreadydumped False -reverse_rel_data_alreadydumped False -KBdataset WN18 -spacy_model_str en_core_web_lg -multiprocess True`

### Preprocessing DBpedia**

* First Download
```
wget http://downloads.dbpedia.org/2016-10/core-i18n/en/nif_context_en.ttl.bz2
wget http://downloads.dbpedia.org/2016-10/core-i18n/en/infobox_properties_en.ttl.bz2
wget http://downloads.dbpedia.org/2016-10/core-i18n/en/labels_en.ttl.bz2
```
to `./misc_data/dbpedia2016/`

then
`cd misc_data/dbpedia2016/`
`python dbpedia_ents_prep.py infobox_properties_en.ttl.bz2 labels_en.ttl.bz2 nif_context_en.ttl.bz2 > dbpedia_ents.text.jsonl`

Next, run context preprocessor.

```
python3 preprocess_dbpedia.py -jsonl2json_alreadydumped False -entitycheckalreadydone False -parsed_json_squeezing_by_entities_on_db50_and_db500_already False
nohup sh preprocess_dbpedia.sh > 191123_preprocess_dbpedia.log &
```
This will take 6-10 hours on 72 core cpus.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/izuna385/attucker

Awesome Lists containing this project

README