https://github.com/losyer/compact_reconstruction

This repository is about the paper 'Subword-based Compact Reconstruction of Word Embeddings. Sasaki et al. NAACL2019'
https://github.com/losyer/compact_reconstruction

Last synced: about 2 months ago
JSON representation

This repository is about the paper 'Subword-based Compact Reconstruction of Word Embeddings. Sasaki et al. NAACL2019'

Host: GitHub
URL: https://github.com/losyer/compact_reconstruction
Owner: losyer
Created: 2019-04-04T04:39:18.000Z (about 6 years ago)
Default Branch: master
Last Pushed: 2023-06-09T05:05:48.000Z (about 2 years ago)
Last Synced: 2024-11-17T01:32:52.383Z (7 months ago)
Language: Python
Homepage:
Size: 102 KB
Stars: 9
Watchers: 0
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

awesome-sentence-embedding - CompactReconstruction - based Compact Reconstruction of Word Embeddings](https://www.aclweb.org/anthology/N19-1353) (OOV Handling)

README

        # Compact Reconstruction

- This repository is about *Subword-based Compact Reconstruction of Word Embeddings. Sasaki et al. NAACL2019*

## Table of contents

  - [Usage](#usage)

    - [Requirements](#requirements)

    - [How to train](#how-to-train)

    - [How to estimate (OOV) word vectors](#how-to-estimate-oov-word-vectors)    

  - [Preprocessing of setting files](#preprocessing-of-setting-files)

  - [Resources](#resources)

## Usage

### Requirements

- Python version >= 3.7

- chainer

- numpy

### How to train

```

$ python src/train.py \

--gpu 0 \

--ref_vec_path crawl-300d-2M-subword.vec \

--freq_path resources/freq_count.crawl-300d-2M-subword.txt \

--multi_hash two \

--maxlen 200 \

--codecs_path resources/ngram_dic.max30.min3 \

--network_type 2 \

--subword_type 4 \

--limit_size 1000000 \

--bucket_size 100000 \

--result_dir ./result \

--hashed_idx \

--unique_false

```

||network_type  |subword_type  |hashed_idx  |

|---|---|---|---|

|SUM-F  |2  |0  |✘  |

|SUM-H  |2  |0  |✓  |

|KVQ-H  |3  |0  |✓  |

|SUM-FH  |2  |4  |✓  |

|KVQ-FH  |3  |4  |✓  |

### How to estimate (OOV) word vectors

For estimating OOV word vectors:

```

$ python src/inference.py \

--gpu 0 \

--model_path \

result/sum/20190625_00_57_18/model_epoch_300 \

--codecs_path resources/ngram_dic.max30.min3 \

--oov_word_path resources/oov_words.txt

```

For reconstructing original word embeddings:

```

$ python src/save_embedding.py \

--gpu 0 \

--inference \

--model_path result/sum/20190625_00_57_18/model_epoch_300

```

## Preprocessing of setting files

- See [preprocessing page](https://github.com/losyer/compact_reconstruction/tree/master/src/preprocess)

## Resources

- See [resource page](https://github.com/losyer/compact_reconstruction/tree/master/resources)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/losyer/compact_reconstruction

Awesome Lists containing this project

README