https://github.com/alvesmarcos/research-thesis

A minimalist Automated Machine Learning tool to optimize dynamic vocabularies for translation models :snake: :gem:
https://github.com/alvesmarcos/research-thesis

automation conda machine-learning neural-network pipeline python tensorflow

Last synced: 18 days ago
JSON representation

A minimalist Automated Machine Learning tool to optimize dynamic vocabularies for translation models :snake: :gem:

Host: GitHub
URL: https://github.com/alvesmarcos/research-thesis
Owner: alvesmarcos
License: mit
Created: 2020-03-17T19:52:41.000Z (over 5 years ago)
Default Branch: master
Last Pushed: 2020-04-16T11:23:06.000Z (over 5 years ago)
Last Synced: 2024-06-25T05:34:39.115Z (over 1 year ago)
Topics: automation, conda, machine-learning, neural-network, pipeline, python, tensorflow
Homepage:
Size: 2.72 MB
Stars: 3
Watchers: 2
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

awesome-automl -

README

# Research Thesis :mortar_board:

Repositório com artefatos de pesquisa para tese do mestrado em Informática do PPGI (Programa de Pós Graduação em Informática) da Universidade Federal da Paraíba.

## Topics :scroll:

## Papers :books:

2020 | Exploring Benefits of Transfer Learning in Neural Machine Translation | Tom Kocmi | arXiv | [`PDF`](https://arxiv.org/pdf/2001.01622.pdf)

2020 | Benchmark and Survey of Automated Machine Learning Frameworks | Marc-Andre Zoller, et al. | arXiv | [`PDF`](https://arxiv.org/pdf/1904.12054.pdf)

2019 | Transfer Learning across Languages from Someone Else’s NMT Model | Tom Kocmi, et al. | arXiv | [`PDF`](https://arxiv.org/pdf/1909.10955.pdf)

2019 | AutoML: A Survey of the State-of-the-Art | Xin He, et al. | arXiv | [`PDF`](https://arxiv.org/pdf/1908.00709.pdf)

2019 | Pay Less Attention With Lightweight And Dynamic Convolutions | Felix Wu, et al. | arXiv | [`PDF`](https://arxiv.org/pdf/1901.10430.pdf)

2019 | Multi-Round Transfer Learning for Low-Resource NMT Using Multiple High-Resource Languages | Yang Liu, et al. | ACM | [`PDF`](https://dl.acm.org/doi/abs/10.1145/3314945)

2019 | Hierarchical Transfer Learning Architecture for Low-Resource Neural Machine Translation | Gongxu Luo, et al. | IEEE | [`PDF`](https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=8805098)

2018 | Transfer Learning in Multilingual Neural Machine Translation with Dynamic Vocabulary | Surafel M. Lakew, et al. | arXiv | [`PDF`](https://arxiv.org/pdf/1811.01137.pdf)

2018 | Neural Machine Translation with Dynamic Selection Network | Fei Han, et al. | IEEE | [`PDF`](https://ieeexplore.ieee.org/document/8781050)

2018 | Twitter Sentiment Analysis using Dynamic Vocabulary | Hrithik Katiyar, et al. | IEEE | [`PDF`](https://ieeexplore.ieee.org/document/8722407)

2018 | Incorporating Statistical Machine Translation Word Knowledge into Neural Machine Translation | Xing Wang, et al. | IEEE | [`PDF`](https://ieeexplore.ieee.org/document/8421063)

2018 | Neural Machine Translation Advised by Statistical Machine Translation: The Case of Farsi–Spanish Bilingually Low–Resource Scenario | Benyamin Ahmadnia, et al. | IEEE | [`PDF`](https://ieeexplore.ieee.org/document/8614221)

2018 | Trivial Transfer Learning for Low-Resource Neural Machine Translation | Tom Kocmi, et al. | IEEE | [`PDF`](https://arxiv.org/pdf/1809.00357.pdf)

2017 | Dynamic Data Selection for Neural Machine Translation | Marlies van der Wees, et al. | arXiv | [`PDF`](https://arxiv.org/pdf/1708.00712.pdf)

2017 | Google’s Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation | Melvin Johnson, et al. | arXiv | [`PDF`](https://arxiv.org/pdf/1611.04558.pdf)

2017 | Translating Low-Resource Languages by Vocabulary Adaptation from Close Counterparts | Qun Liu, et al. | ACM | [`PDF`](https://dl.acm.org/doi/abs/10.1145/3099556)

2017 | Convolutional Sequence to Sequence Learning | Jonas Gehring, et al. | arXiv | [`PDF`](https://arxiv.org/pdf/1705.03122.pdf)

2017 | Neural Response Generation with Dynamic Vocabularies | Yu Wu, et al. | arXiv | [`PDF`](https://arxiv.org/pdf/1711.11191.pdf)

2017 | A Comparative Study of Word Embeddings for Reading Comprehension | Bhuwan Dhingra, et al. | arXiv | [`PDF`](https://arxiv.org/pdf/1703.00993.pdf)

2017 | Attention Is All You Need | Ashish Vaswani, et al. | arXiv | [`PDF`](https://arxiv.org/pdf/1706.03762.pdf)

2016 | Text Understanding from Scratch | Xiang Zhang, et al. | arXiv | [`PDF`](https://arxiv.org/pdf/1502.01710.pdf)

2016 | Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation | Yonghui Wu, et al. | arXiv | [`PDF`](https://arxiv.org/pdf/1609.08144.pdf)

2015 | How to Generate a Good Word Embedding? | Siwei Lai, et al. | arXiv | [`PDF`](https://arxiv.org/pdf/1507.05523.pdf)

2015 | Transfer Learning for Bilingual Content Classification | Qian Sun, et al. | ACM | [`PDF`](https://dl.acm.org/doi/abs/10.1145/2783258.2788575)

2015 | Incorporation of Syntactic-Semantic Aspects in a LIBRAS Machine Translation Service to Multimedia Platforms | Tiago, et al. | ACM | [`PDF`](https://dl.acm.org/doi/pdf/10.1145/2820426.2820434)

## Repositories :octocat:

2020 | Awesome AutoML Papers | @hibayesian | 2.3k | [`GitHub`](https://github.com/hibayesian/awesome-automl-papers)

2020 | Auptimizer | @LGE-ARC-AdvancedAI | 133 | [`GitHub`](https://github.com/LGE-ARC-AdvancedAI/auptimizer)

2020 | TPOT | @EpistasisLab | 6.9k | [`GitHub`](https://github.com/EpistasisLab/tpot)

## Analysis Papers :nerd_face:

Nesse documento iremos abordar 4 (quatro) artigos relacionados ao tema de pesquisa. As palavras chaves utilizadas foram *Dynamic Vocabulary*, *Neural Machine Translation*, *Transfer Learning* e *Statistical Machine Translation*.

Cada *paper* foi dividido em 4 (quatro) subtópicos: **Paper Goals**, **Approach**, **Experiments** e **Results**.

Os artigos escolhidos são apresentadados abaixo ordenado pelo ano de publicação.

|Ano|Título|Autor|Link|
|---|---|---|---|
|2019|Transfer Learning in Multilingual Neural Machine Translation with Dynamic Vocabulary|Surafel M. Lakew, et al.|[`PDF`](https://arxiv.org/pdf/1811.01137.pdf)|
|2018|Trivial Transfer Learning for Low-Resource Neural Machine Translation|Tom Kocmi, et al. |[`PDF`](https://arxiv.org/pdf/1809.00357.pdf)|
|2019|Multi-Round Transfer Learning for Low-Resource NMT Using Multiple High-Resource Languages|Yang Liu, et al.|[`PDF`](https://dl.acm.org/doi/abs/10.1145/3314945)|
|2019|Transfer Learning across Languages from Someone Else’s NMT Model | Tom Kocmi, et al.|[`PDF`](https://arxiv.org/pdf/1909.10955.pdf)|

### 1. Knowledge

### 2. Hierarchical Transfer Learning Architecture for Low-Resource Neural Machine Translation

#### Authors

Surafel M. Lakew, Aliia Erofeeva, Matteo Negri, Marcello Federico e Marco Turchi

#### Abstract

We propose a method to **transfer knowledge** across neural machine translation (NMT) models by means of a shared **dynamic vocabulary**. Our approach allows to extend an initial model for a given language pair to cover new languages by **adapting its vocabulary as long as new data become available** (i.e., introducing new vocabulary items if they are not included in the initial model). The parameter transfer mechanism is evaluated in two scenarios: i) to adapt a trained single language NMT system to work with a new language pair and ii) to continuously add new language pairs to grow to a multilingual NMT system. In both the scenarios our goal is to improve the translation performance, while minimizing the training convergence time. Preliminary experiments spanning five languages with different training data sizes (i.e., 5k and 50k parallel sentences) show a significant performance **gain ranging from +3.85 up to +13.63 BLEU** in different language directions. Moreover, when compared with training an NMT model from scratch, **our transfer-learning approach** allows us to reach higher performance after training up to 4% of the total training steps.

#### 2.1. Paper Goals

Explorar técnica de *Transfer Learning* para o problema de **Multilingual Neural Machine Translation** utilizando vocabulário dinâmico (e.g German para English, Italy para English).

Paper Goals

#### 2.2. Approach

![Image](resources/Approach.png)

Os autores do artigo apresentam duas estratégias de treinamento chamadas *progAdapt* e *progGrow*.

1. **progAdapt** - Treina uma cadeia sequencial de redes transferindo os parâmetros de um modelo inicial L₁ para uma novo par de linguagem L₂ até L_n. (source ⇔ target para cada L)
2. **progGrow** - Progressivamente introduz um novo par de linguagem ao modelo inicial. (source → target para cada L)

Para o **Vocabulário Dinâmico**, a abordagem simplesmente mantém a interseção (mesmas entradas) entre as novas entradas e a do treinamento anterior. No momento do treinamento, essas novas entradas são inicializadas aleatoriamente, enquanto os itens que já se encontravam no vocabulário mantém seu peso (*Word Embedding*).

O exemplo utilizado abaixo foi utilizando a linguagem **Python** com framework **Pytorch**.

No primeiro caso base temos um vocabulário inicial com apenas 2 (duas) palavras.

```python
word2index = {"hello": 0, "world": 1}
embeds = nn.Embedding(2, 5) # 2 words in vocab, 5 dimensional embeddings
```
Vamos supor que queremos adicionar a palavra `keyboard` ao nosso dicionário, de acordo com abordagem apresentada, nos matemos os pesos de `hello` e `world` e inicializamos `keyboard` aleatoriamente.

```python
word2index = {"hello": 0, "world": 1, "keyboard": 2} # updated vocabulary
concat_embeds = torch.FloatTensor([
hello_embed.detach().numpy()[0], # old embed
world_embed.detach().numpy()[0], # old embed
np.random.rand(5) # new embed initialized randomly
])
embeds = nn.Embedding.from_pretrained(concat_embeds) # 3 words in vocab, 5 dimensional embeddings
```

#### 2.3. Experiments

Com o objetivo de avaliar as duas abordagens apresentadas, os autores implementaram dois modelos bases para teste. O primeiro modelo **Bi-NMT** é treinado do zero para cada conjunto L (source ⇔ target). O segundo modelo **M-NM** concatena o conjunto de todos os pares de linguagem L₁ ... L_n e também é treinado do zero.

A imagem abaixo apresenta o conjunto de pares de linguagens utilizadas para o treinamento.

Experiments

#### 2.4. Results

![Image](resources/ResultGrowAdapted.png)

### 3. Trivial Transfer Learning for Low-Resource Neural Machine Translation

#### Authors
Tom Kocmi e Ondrej Bojar

#### Abstract

**Transfer learning** has been proven as an effective technique for neural machine translation under **low-resource conditions**. Existing methods require a common target language, language relatedness, or specific training tricks and regimes. We present a **simple transfer learning method**, where we **first train a “parent” model for a high-resource language pair and then continue the training on a low-resource pair only by replacing the training corpus**. This “child” model performs significantly better than the baseline trained for low-resource pair only. We are the first to show this for targeting different languages, and we observe the improvements even for unrelated languages with different alphabets.

#### 3.1. Paper Goals

Explorar técnica de **Transfer Learning** para melhorar o desempenho da tradução em linguagens *low-resource*.

#### 3.2. Approach

Como os próprios autores definem, o metódo utilizado é extremamente simples: Inicia o treinamento com um par de linguagem pai L₁ (high-resource) e após algumas *epochs* altera o corpus para outro par de linguagem filha L₂ (low-resource) sem resetar nenhum (hiper)parâmetros.

#### 3.3. Experiments

Experiments

Os autores utilizaram diferentes pares de linguagem realizando o treinamento dos modelos de 3 (três) formas:

1. **Parent (only)** - Treinamento comum apenas utilizando um par de linguagem L₁.
2. **Child (only)** - Treinamento comum apenas utilizando um par de linguagem L₂.
3. **Transfer** - Abordagem proposta no artigo.

#### 3.4. Results

Experiments

### 4. Multi-Round Transfer Learning for Low-Resource NMT Using Multiple High-Resource Languages

#### Authors

Mieradilijiang Maimaiti, Yang Liu, Huanbo Luan, Maosong

#### Abstract

**Neural machine translation (NMT)** has made remarkable progress in recent years, but the performance of NMT suffers from a data sparsity problem since large-scale parallel corpora are only readily available for **high-resource languages (HRLs)**. In recent days, **transfer learning (TL)** has been used widely in **low-resource languages (LRLs)** machine translation, while TL is becoming one of the vital directions for addressing the data sparsity problem in low-resource NMT. As a solution, a transfer learning method in NMT is generally obtained via initializing the **low-resource model (child)** with the **high-resource model (parent)**. However, leveraging the original TL to low-resource models is neither able to make full use of highly related multiple HRLs nor to receive different parameters from the same parents. **In order to exploit multiple HRLs effectively**, we present a language-independent and **straight forward multi-round transfer learning (MRTL)** approach to low-resource NMT. Besides, with the **intention of reducing the differences between high-resource and low-resource languages at the character level**, we introduce a unified transliteration method for various language families, which are both semantically and syntactically highly analogous with each other. Experiments on low-resource datasets show that our approaches are effective, significantly outperform the state-of-the-artmethods, and yield improvements of **up to 5.63 BLEU points**.

#### 4.1. Paper Goals

Utilizar a técnica de **Transfer Learning** com múltiplas **High-Resource Language (HRLs)** para **Low-Resource Languages (LRLs)** com objetivo de maximizar o desempenho da tradução e treinamento.

#### 4.2. Approach

Os autores apresentam duas abordagens complementares com objetivo transferir o "conhecimento" aprendido de uma **HRL pai para LRL filha**.

Experiments

1. **Unified Transliteration** - Observa as similaridades entre as palavras dos pares de linguagem L₃ → L₂ (linguagem pai) e L₁ → L₂ (linguagem filha) para inicialização de θ_{L₁ → L₂} (parâmetros relacionados a L₁ → L₂).

2. **Multi-Round Transfer Learning (MRTL)** - Dado o par de linguagem L₁ → L₂, podemos inicializar θ_{L₁ → L₂} (parâmetros relacionados a L₁ → L₂) com uma linguagem pai L₃ → L₂ que por sua vez pode ser inicializada outra linguagem L_k+1 → L₂. Ou seja, uma cadeia de *transfer learning*.

![Image](resources/a.png)

Experiments

#### 4.3. Experiments

Para os experimentos os autores utilizam a arquitetura **Transformer** presente no framework **PyTorch**.

Os experimentos são feitos levando consideração diferentes iterações de *Transfer Learning* sendo representado **R = N**, onde **N** representa o número de vezes que o processo foi feito.

#### 4.4. Results

Experiments

### 5. Transfer Learning across Languages from Someone Else’s NMT Model

#### Authors

Tom Kocmi e Ondrej Bojar

#### Abstract
**Neural machine translation** is demanding in
terms of training time, hardware resources,
size, and quantity of parallel sentences. We
propose a simple **transfer learning** method to
recycle already trained models for different
language pairs with no need for modifications in model architecture, hyper-parameters,
or vocabulary. We achieve better translation
quality and **shorter convergence times than
when training from random initialization**. To
show the applicability of our method, we recycle a Transformer model trained by different
researchers for translating English-to-Czech
and used it to seed models for seven language
pairs. Our translation models are **significantly
better** even when the **re-used model’s language
pair is not linguistically related to the child
language pair**, especially for **low-resource languages**. Our approach needs **only one pretrained model** for all transferring to all various languages pairs. Additionally, we improve
this approach with a simple **vocabulary transformation**. We analyze the behavior of transfer learning to understand the gains from unrelated languages.

#### 5.1. Paper Goals

Explorar técnica de **Transfer Learning** metódo que utiliza os pesos de uma rede já treinada com par de linguagem totalmente diferente sem alterações na arquitetura, hiperparâmetros e vocabulário.

#### 5.2. Approach

Os autores utilizam um metódo chamado **Direct Transfer** onde o voculabulário e hiperparâmetros do modelo do pai são mantidos no treinamento do modelo filho. Além disso, Tom Kocmi e Ondrej Bojar apresentam uma algoritmo de **transformação de vocabulário** que busca manter as palavras comuns nos vocabulários do pai e filho.

Experiments

#### 5.3. Experiments

Os autores comparam 3 (três) abordagens diferentes que são chamadas:

1. **Baseline** - Treinamento comum, sem utilizar *transfer-learning*.
2. **Direct transfer** - Treinamento utilizando o vocubulário e hiperparâmetros do modelo pai.
3. **Transformed vocab** - Treinamento utilizando hiperparâmetros da linguagem pai porém o vocabulário é adaptado para o modelo do filho.

#### 5.4. Results

Experiments

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/alvesmarcos/research-thesis

Awesome Lists containing this project

README